# Full text of "Constraint databases : first international symposium, CDB 2004, Paris, France, June 12-13, 2004 : proceedings"

## See other formats

LNCS 3074 I Bart Kuijpers Peter Revesz (Eds.) Constraint Databases First International Symposium, CDB 2004 Paris, France, June 2004 Proceedings Springer 3074 Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen Editorial Board Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos New York University, NY, USA Doug Tygar University of California, Berkeley, CA, USA Moshe Y. Vardi Rice University, Houston, IX, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany Bart Kuijpers Peter Revesz (Eds.) Constraint Databases First International Symposium, CDB 2004 Paris, France, June 12-13, 2004 Proceedings 4^ Springer Volume Editors Bart Kuijpers University of Limburg (LUC) Research Group on Theoretical Computer Science WNI Universitaire Campus, 3590 Diepenbeek, Belgium E-mail: bart.kuijpers@luc.ac.be Peter Revesz University of Nebraska-Lincoln Computer Science and Engineering 214 Ferguson Hall, Lincoln, NE 68588-0115, USA E-mail: revesz@cse.unl.edu Library of Congress Control Number: 2004106087 CR Subject Classification (1998): H.2, H.3 ISSN 0302-9743 ISBN 3-540-22126-3 Springer- Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer- Verlag. Violations are liable to prosecution under the German Copyright Law. Springer- Verlag is a part of Springer Science+Business Media springeronline .com © Springer-Verlag Berlin Heidelberg 2004 Printed in Germany Typesetting: Camera-ready by author, data conversion by Olgun Computergrafik Printed on acid-free paper SPIN: 1 1009566 06/3 142 5 4 3 2 1 0 Preface The first International Symposium on the Applications of Constraint Databases (CDB2004) took place in Paris, France, on June 12-13, 2004, just before the ACM SIGMOD and PODS conferences. Since the publication of the paper “Constraint Query Languages” by Kanel- lakis, Kuper and Revesz in 1990, the last decade has seen a growing interest in constraint database theory, query evaluation, and applications, reflected in a variety of conferences, journals, and books. Constraint databases have proven to be extremely flexible and adoptable in environments that relational database systems cannot serve well, such as geographic information systems and bioinfor- matics. This symposium brought together people from several diverse areas all con- tributing to the practice and the application of constraint databases. It was a continuation and extension of previous workshops held in Friedrichshafen, Ger- many (1995), Cambridge, USA (1996), Delphi, Greece (1997), and Seattle, USA (1998) as well as of the work in the comprehensive volume “Constraint Databa- ses” edited by G. Kuper, L. Libkin and J. Paredaens (2000) and the textbook “Introduction to Constraint Databases” by P. Revesz (2002). The aim of the symposium was to open new and future directions in con- straint database research; to address constraints over domains other than the reals; to contribute to a better implementation of constraint database systems, in particular of query evaluation; to address efficient quantifier elimination; and to describe applications of constraint databases. The technical program of the symposium consisted of 10 technical papers and an invited paper as well as additional invited talks by Leonid Libkin and Andreas Podelski. The papers collected in these proceedings were selected by the program committee from a total of 29 submissions, and they were presented in five sessions, as described below. Efficient query evaluation. Joos Heintz (invited speaker) and Bart Kuijpers address the difficulty of the effective evaluation of first-order queries, usually involving some form of quantifier elimination, and discuss various aspects that influence the efficiency of the evaluation of queries expressible in first-order logic over the reals. The importance of data structures and their effect on the complex- ity of quantifier-elimination is emphasized and a novel data model that supports data exploration and visualization as well as efficient query evaluation is pro- posed. Finally, they show that a particular kind of sample point query cannot be evaluated in polynomial time. Spatial and spatio-temporal data. Spatial databases is a common appli- cation area of constraint databases. In recent years spatio-temporal data have often been modeled using constraints. We have three technical papers on this topic. VI Preface — Lixin Li, Youming Li and Reinhard Piltner propose a new spatio-temporal interpolation method for 3-D space and 1-D time geographic data, based on shape functions. Instead of only manipulating the time dimension as in the earlier ST product and tetrahedral methods, their new method takes the original approach of combining 2-D shape functions in the (x, y) domain with the (z,t) domain shape functions. — Floris Geerts deals with the representation of moving objects in databases. Moving objects are usually represented, when possible, through explicit de- scriptions of their trajectories. The author proposes instead a new data model based on encoding their equations of motion, more specifically by differential equations. He also discusses a query language for this data model. — Sofie Haesevoets describes a triangle-based logic in which queries that are invariant under affinities of the ambient space can be formulated. She charac- terizes the expressive power of this logic and shows it to be equivalent to the affine-generic fragment of first-order logic over the reals. She also presents algorithms for computing an affine-invariant triangulation and covering. Applications. Looking at specific applications is important for two reasons. First, they reveal the possibilities of constraint database applications, often ap- plications that could not be done in relational database systems. Second, they test the limits of the current constraint data model and query language proposals and thereby stimulate their further extensions. The following specific applica- tions raise important issues and provide big challenges to researchers for the future. — Maria Teresa Gomez Lopez, Rafale Ceballos Guerrero, Rafael Martinez Gasca and Carmelo del Valle Sevilla apply constraint databases in the determina- tion of potential minimal conflicts, which can be further used for polynomial model-based diagnosis. — Viswanathan Ramanathan and Peter Revesz apply constraint databases to the genome map assembly problem. The genome map assembly problem is the problem of reconstructing the entire genome sequence of an organism based on overlapping fragments of the genome. They look at several algo- rithms for this problem. Using extensive computer experiments, they show that their constraint automaton, which can be solved using a constraint database system, solves the genome map assembly problem computation- ally more efficiently than the common alternative solution based on overlap multigraphs. Even more surprisingly, the average case running time of their solution increases only linearly while the running time of the other solution increases exponentially with the size of real genome data input. — Carson Kai-Sang Leung proposes a new dynamic FP-Tree mining algorithm to mine frequent itemsets satisfying succinct constraints. The proposed al- gorithm is dynamic, such that the constraints can be changed during the mining process. Based on a classification of constraints this paper describes the cases of relaxing and tightening constraints and evaluation results show the effectiveness of this approach. Preface VII Query optimization. Query optimization is the concern of making the evalua- tion of queries computationally efficient in space and time. These techniques are essential elements for the implementation of constraint database systems. We had two papers in this area. — Jan Chomicki discusses the problem of semantic query optimization for pref- erence queries and treats this problem as a constraint reasoning problem. His techniques make use of integrity constraints, and make it possible to remove redundant occurrences of the winnow operator resulting in a more efficient algorithm for the computation of winnow. The paper also investigates the problem of propagating integrity constraints. — Anagh Lai and Berthe Y. Choueiry consider the important problem of effi- cient join computation during query evaluation. They model the join com- putation in relational databases as a constraint satisfaction problem, which they solve using their technique called dynamic bundling. With dynamic bundling the join computation can be performed with major savings in space and time. The future of constraint databases. Implementation of constraint databa- ses is, of course, a major practical concern. While there are several prototype systems developed at universities and research laboratories, such as the C 3 , the DEDALE and the MLPQ systems, there are still no commercial implementa- tions of constraint databases. However, this situation may change in the future, as explained in the following two papers. — Dina Goldin describes how constraints can be eliminated from constraint databases, in the sense of reducing them to as simple a representation as used in relational database systems and geographic information systems. She proposes a 3-tier architecture for constraint databases, with an abstract layer for the infinite relational extent of the data and a concrete layer that admits both constraint-based and geometry-based representations of spatio- temporal data. — Mengchui Cai, from the DB2 group at the IBM Silicon Valley Laboratory, presents a way of integrating constraint databases into relational database systems. His main insight is that existing relational database systems can be extended by special functions that call a constraint relational engine at the appropriate places within an extended SQL query, while the constraint data itself can be represented within specialized relational tables. This proposal may lead to a practical and seemless way of integrating constraint data with relational data. This symposium would have been impossible without the help and effort of many people. The editors would like to thank the program committee for the se- lection of the papers and the local organizers, in particular Irene Guessarian, for the arrangements in Paris. We especially would like to thank Sofie Haesevoets for managing the conference Web site and many other practical arrangements, and Floris Geerts for advertising the symposium and composing these proceedings. VIII Preface The organizers are extremely grateful for the financial support given by Gen- eral Eleftherios and Argyroula Kanellakis, the University of Limburg (LUC) and the University of Nebraska-Lincoln. We would explicitly like to thank the Universite Pierre et Marie Curie, Paris 6, for hosting the symposium. We are pleased to bring to the reader these symposium proceedings, which reflect major recent advances in the field of constraint databases. We were also glad to see the symposium bring together many researchers in the field of con- straint databases for a fruitful exchange of ideas. We also remembered those who due to their untimely death could not attend the symposium, including Paris Kanellakis and his family. Finally, we look forward to a continued growth in the field and to future symposium events. June 2004 Bart Kuijpers and Peter Revesz Conference Organization Chairs Bart Kuijpers Peter Revesz Limburgs Universitair Centrum, Belgium University of Nebraska-Lincoln, USA Program Committee Saugata Basu Alex Brodsky Jan Chomicki Berthe Choueiry Giorgio Delzanno Floris Geerts Marc Giusti Dina Goldin Stephane Grumbach Joxan Jaffar Manolis Koubarakis Stephan Kreutzer Bart Kuijpers Gabriel Kuper Zoe Lacroix Lixin Li Jan Paredaens Peter Revesz Philippe Rigaux Kai-Uwe Sattler Jianwen Su David Toman Jan Van den Bussche Dirk Van Gucht Nicolai Vorobjov Mark Wallace Georgia Tech, USA George Mason University, USA SUNY at Buffalo, USA University of Nebraska-Lincoln, USA Universita di Genova, Italy University of Helsinki, Finland CNRS, Ecole Poly technique, Paris, France University of Connecticut, USA INRIA, Paris, France National University of Singapore, Singapore Technical University of Crete, Greece Humboldt Universitat, Berlin, Germany Limburgs Universitair Centrum, Belgium Universita di Trento, Italy Arizona State University, USA Georgia Southern University, USA Universiteit Antwerpen, Belgium University of Nebraska-Lincoln, USA Universite Paris Sud, France Technische Universitat Ilmenau, Germany University of California at Santa Barbara, USA University of Waterloo, Canada Limburgs Universitair Centrum, Belgium Indiana University, USA University of Bath, UK Imperial College, London, UK External Reviewers Xiang Fu, Cagdas Gerede, Sofie Haesevoets, Anagh Lai, Hoda Mokhtar Publicity and Proceedings Chair Floris Geerts University of Helsinki, Finland Table of Contents Efficient Query Evaluation Constraint Databases, Data Structures and Efficient Query Evaluation .... 1 Joos Heintz and Bart Kuijpers Spatial and Spatio-Temporal Data A New Shape Function Based Spatiotemporal Interpolation Method 25 Lixin Li, Youming Li, and Reinhard Piltner Moving Objects and Their Equations of Motion 40 Floris Geerts A Triangle-Based Logic for Affine-Invariant Querying of Two-Dimensional Spatial Data 52 Sofie Haesevoets Applications Applying Constraint Databases in the Determination of Potential Minimal Conflicts to Polynomial Model-Based Diagnosis 74 Maria Teresa Gomez Lopez, Rafael Ceballos Guerrero, Rafael Martinez Gasca, and Carmelo del Valle Sevilla Constraint Database Solutions to the Genome Map Assembly Problem ... 88 Viswanathan Ramanathan and Peter Revesz Dynamic FP-Tree Based Mining of Frequent Patterns Satisfying Succinct Constraints 112 Carson Kai-Sang Leung Query Optimization Semantic Optimization of Preference Queries 128 Jan Chomicki Constraint Processing Techniques for Improving Join Computation: A Proof of Concept 143 Anagh Lai and Berthe Y. Choueiry XII Table of Contents The Future of Constraint Databases Taking Constraints out of Constraint Databases 161 Dina Q. Goldin Integrating Constraint and Relational Database Systems 173 Mengchu Cai Author Index 181 Constraint Databases, Data Structures and Efficient Query Evaluation* Joos Heintz 1,2,3 and Bart Kuijpers 4 1 Universitad de Buenos Aires Dcpartamento de Computation Ciudad Universitaria, Pabellon I, 1428 Buenos Aires, Argentina joos@dc.uba. ar 2 Consejo National de Investigaciones Cientificas y Tecnologicas (CONICET), Argentina 3 Universidad de Cantabria Facultad de Ciencias Departamento de Matematicas, Estadistica y Comptacion Avda. de los Castros s/n, E-39005 Santander, Spain 4 Limburgs Universitair Centrum Department WNI Universitaire Campus, 3590 Diepenbeek, Belgium bart . kui jpers@luc .ac.be Abstract. Constraint databases that can be described by boolean com- binations of polynomial inequalities over the reals have received ample research attention. In particular, the expressive power of first-order logic over the reals, as a constraint database query language, has been studied extensively. The difficulty of the effective evaluation of first-order queries, usually involving some form of quantifier elimination, has been largely neglected. The contribution of this paper is a discussion of various aspects that in- fluence the efficiency of the evaluation of queries expressible in first-order logic over the reals. We emphasize the importance of data structures and their effect on the complexity of quantifier-elimination. We also propose a novel data model that supports data exploration and visualization as well as efficient query evaluation. In this context, we introduce the con- cept of sample point query. Finally, we show that a particular kind of sample point query cannot be evaluated in polynomial sequential time by means of branching-parsimonious procedures. 1 Introduction and Summary The framework of constraint databases was introduced in 1990 by Kanellakis, Kuper and Revesz [26] as an extension of the relational model that allows the * Research partially supported by the following Argentinian, Belgian, German and Spanish grants: UBACyT X198, PIP CONICET 2461, FW/PA/02-EIII/007, ALA 01-E3/02 and DGCyT BFM 2000-0349. B. Kuijpers and P. Revesz (Eds.): CDB 2004, LNCS 3074, pp. 1—24, 2004. (c) Springer- Verlag Berlin Heidelberg 2004 2 Joos Heintz and Bart Kuijpers use of possibly infinite, but first-order definable relations rather than just fi- nite relations. It provides an elegant and powerful model for applications that deal with infinite sets of points in some real affine space IR™. In the setting of the constraint model, infinite relations are finitely represented as boolean com- binations of polynomial equalities and inequalities, which we interpret, in this paper, over the real and exceptionally over the complex numbers. The case of the interpretation over the real numbers has applications in spatial databases [31]. Various aspects of the constraint model are well-studied by now (for an overview of research results we refer to [28] and the textbook [33]). The relational calculus augmented with polynomial constraints, or equivalently, first-order logic over the reals augmented with relation predicates to address the database re- lations Ri, . . . ,R S , F0(+, x, <, 0, 1, i?i, . . . , R s ) for short, is the standard first- order query language for constraint databases. The expressive power of first- order logic over the reals, as a constraint database query language, has been studied extensively. However, the difficulty of the efficient evaluation of first- order queries, usually involving some form of quantifier elimination, has been largely neglected. The existing constraint database systems are based on gen- eral purpose quantifier-elimination algorithms and are, in most cases, restricted to work with linear data, i.e. , they use first-order logic over the reals without multiplication [28, Part IV]. The intrinsic inefficiency of quantifier elimination represents a bottle-neck for real-world implementations of constraint database systems. General purpose elimination algorithms (such as, e.g., [6, 12, 18, 23, 32]) and standard data struc- tures prevent query evaluation to become efficient. The fact that the knapsack problem can be formulated in this setting indicates that geometric elimination may be intrinsically hard. Another example for this complexity phenomenon is given by the permanent of a generic n x n matrix, which appears as the output of a suitable first-order query (see [22] for details on both examples). In the literature of constraint databases, the data model proposed to de- scribe geometric figures in 1R™ is based on quantifier-free first-order formulas over the reals [28, 33]. The data structures needed to implement this data model are left largely unspecified. It is widely understood that these formulas should be represented by explicitly giving disjunctive normal forms using dense or sparse encoding of polynomials. However, disjunctive normal forms may be unnecessar- ily large and the sparse representation of elimination polynomials may be very inefficient. For example, the sparse representation of the determinant of a generic n x n matrix contains n\ terms. On the other hand, the determinant can be rep- resented by an 0(n 3 ) arithmetic boolean circuit (with divisions) which describes the Gaussian elimination algorithm. This suggests the use of alternative data structures for the representation of the classical data model of constraint data- bases. Indeed, the use of arithmetic boolean circuits as alternative data structure allows the design of a new generation of elimination algorithms which produce an exponential time complexity gain compared to the most efficient algorithms using traditional data structures (see [36] for a definition of the notion of arith- metic boolean circuit and [22] for this kind of complexity results). Nevertheless, Constraint Databases, Data Structures and Efficient Query Evaluation 3 in terms of the syntactical size of the input, the worst-case sequential time com- plexity of the elimination of a single existential quantifier block by means of the new algorithms remains still exponential. However, when we measure the input in a semantic (i.e., geometric) way, as is achieved, e.g., by the system degree , elimination of a single existential quantifier block becomes polynomial in this quantity (see e.g. [3,4,13,15]). Unfortunately, this does not suffice for the design of algorithms able to evaluate purely existential queries in sequential time which is polynomial in the number of bounded variables. In fact, an non- polynomial lower bound for the underlying elimination problem can be deduced from the P]r NP® conjecture in the Blum-Shub-Smale complexity model over the real numbers [8] . Another shortcoming of the classical data model for constraint databases is that it does not support data exploration and local visualization. Indeed, a quantifier-free formula in disjunctive normal form, describing the output of a query, allows the answering of, for instance, the membership question, but it is does not allow an easy exhibition of the output, by, e.g., the production of sample points , or, for low dimensions, a visualization of the output. To increase the tangibility of the output, we suggest considering a new type of query that produces sample points. Furthermore, it could be desirable to support an ex- ploration of the neighborhood of such a sample point. This could be achieved by representing the output by a cell decomposition consisting of cells which are non-empty open subsets of smooth real varieties. In this way, starting from any sample point, its neighborhood within its cell may be explored. In this sense, we propose a novel data model for constraint databases, consisting of smooth cells accompanied by sample points. The known most efficient elimination pro- cedures produce naturally such output representations, a typical example being CAD [12]. Therefore, when constraint database theory invokes quantifier elimina- tion in query evaluation, it should also incorporate these features of the existing elimination algorithms. In this context, we extend the concept of sample point query to queries that give rationally parameterized families of polynomial functions as output. Such queries will be called extended sample point queries. The main outcome of the paper is a proof that extended sample point queries, associated to first-order for- mulas containing a fixed number of quantifier alternations, cannot be evaluated in polynomial sequential time by so-called “branching-parsimonious algorithms” . This lower bound result suggest that further research on the complexity of query evaluation in constraint database theory should be directed towards the identi- fication of database and query classes that have a strongly improved complexity behavior. As a pattern for the development of such a theory, we may consider a new type of elimination algorithms which are based on the notion of system degree and use non-conventional data structures (see [2-4,13,15,17,20,21,24, 30,34]). This paper introduces a number of new concepts for constraint database the- ory that sometimes require certain notions from algebraic complexity theory, algebraic geometry and commutative algebra. These notions can be found in