Skip to main content

Full text of "Constraint databases : first international symposium, CDB 2004, Paris, France, June 12-13, 2004 : proceedings"

See other formats


LNCS 3074 



I Bart Kuijpers 

Peter Revesz (Eds.) 



Constraint 

Databases 

First International Symposium, CDB 2004 
Paris, France, June 2004 
Proceedings 



Springer 





3074 



Lecture Notes in Computer Science 

Commenced Publication in 1973 
Founding and Former Series Editors: 

Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen 

Editorial Board 

Takeo Kanade 

Carnegie Mellon University, Pittsburgh, PA, USA 
Josef Kittler 

University of Surrey, Guildford, UK 
Jon M. Kleinberg 

Cornell University, Ithaca, NY, USA 
Friedemann Mattern 

ETH Zurich, Switzerland 
John C. Mitchell 

Stanford University, CA, USA 
Moni Naor 

Weizmann Institute of Science, Rehovot, Israel 
Oscar Nierstrasz 

University of Bern, Switzerland 
C. Pandu Rangan 

Indian Institute of Technology, Madras, India 
Bernhard Steffen 

University of Dortmund, Germany 
Madhu Sudan 

Massachusetts Institute of Technology, MA, USA 
Demetri Terzopoulos 

New York University, NY, USA 
Doug Tygar 

University of California, Berkeley, CA, USA 
Moshe Y. Vardi 

Rice University, Houston, IX, USA 
Gerhard Weikum 

Max-Planck Institute of Computer Science, Saarbruecken, Germany 



Bart Kuijpers Peter Revesz (Eds.) 



Constraint 

Databases 



First International Symposium, CDB 2004 
Paris, France, June 12-13, 2004 
Proceedings 



4^ Springer 



Volume Editors 



Bart Kuijpers 

University of Limburg (LUC) 

Research Group on Theoretical Computer Science WNI 
Universitaire Campus, 3590 Diepenbeek, Belgium 
E-mail: bart.kuijpers@luc.ac.be 

Peter Revesz 

University of Nebraska-Lincoln 
Computer Science and Engineering 
214 Ferguson Hall, Lincoln, NE 68588-0115, USA 
E-mail: revesz@cse.unl.edu 



Library of Congress Control Number: 2004106087 



CR Subject Classification (1998): H.2, H.3 
ISSN 0302-9743 

ISBN 3-540-22126-3 Springer- Verlag Berlin Heidelberg New York 



This work is subject to copyright. All rights are reserved, whether the whole or part of the material is 
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, 
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication 
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, 
in its current version, and permission for use must always be obtained from Springer- Verlag. Violations are 
liable to prosecution under the German Copyright Law. 

Springer- Verlag is a part of Springer Science+Business Media 

springeronline .com 

© Springer-Verlag Berlin Heidelberg 2004 
Printed in Germany 

Typesetting: Camera-ready by author, data conversion by Olgun Computergrafik 
Printed on acid-free paper SPIN: 1 1009566 06/3 142 5 4 3 2 1 0 



Preface 



The first International Symposium on the Applications of Constraint Databases 
(CDB2004) took place in Paris, France, on June 12-13, 2004, just before the 
ACM SIGMOD and PODS conferences. 

Since the publication of the paper “Constraint Query Languages” by Kanel- 
lakis, Kuper and Revesz in 1990, the last decade has seen a growing interest 
in constraint database theory, query evaluation, and applications, reflected in a 
variety of conferences, journals, and books. Constraint databases have proven 
to be extremely flexible and adoptable in environments that relational database 
systems cannot serve well, such as geographic information systems and bioinfor- 
matics. 

This symposium brought together people from several diverse areas all con- 
tributing to the practice and the application of constraint databases. It was a 
continuation and extension of previous workshops held in Friedrichshafen, Ger- 
many (1995), Cambridge, USA (1996), Delphi, Greece (1997), and Seattle, USA 
(1998) as well as of the work in the comprehensive volume “Constraint Databa- 
ses” edited by G. Kuper, L. Libkin and J. Paredaens (2000) and the textbook 
“Introduction to Constraint Databases” by P. Revesz (2002). 

The aim of the symposium was to open new and future directions in con- 
straint database research; to address constraints over domains other than the 
reals; to contribute to a better implementation of constraint database systems, 
in particular of query evaluation; to address efficient quantifier elimination; and 
to describe applications of constraint databases. 

The technical program of the symposium consisted of 10 technical papers 
and an invited paper as well as additional invited talks by Leonid Libkin and 
Andreas Podelski. The papers collected in these proceedings were selected by 
the program committee from a total of 29 submissions, and they were presented 
in five sessions, as described below. 

Efficient query evaluation. Joos Heintz (invited speaker) and Bart Kuijpers 
address the difficulty of the effective evaluation of first-order queries, usually 
involving some form of quantifier elimination, and discuss various aspects that 
influence the efficiency of the evaluation of queries expressible in first-order logic 
over the reals. The importance of data structures and their effect on the complex- 
ity of quantifier-elimination is emphasized and a novel data model that supports 
data exploration and visualization as well as efficient query evaluation is pro- 
posed. Finally, they show that a particular kind of sample point query cannot 
be evaluated in polynomial time. 

Spatial and spatio-temporal data. Spatial databases is a common appli- 
cation area of constraint databases. In recent years spatio-temporal data have 
often been modeled using constraints. We have three technical papers on this 
topic. 



VI 



Preface 



— Lixin Li, Youming Li and Reinhard Piltner propose a new spatio-temporal 
interpolation method for 3-D space and 1-D time geographic data, based 
on shape functions. Instead of only manipulating the time dimension as in 
the earlier ST product and tetrahedral methods, their new method takes 
the original approach of combining 2-D shape functions in the (x, y) domain 
with the (z,t) domain shape functions. 

— Floris Geerts deals with the representation of moving objects in databases. 
Moving objects are usually represented, when possible, through explicit de- 
scriptions of their trajectories. The author proposes instead a new data model 
based on encoding their equations of motion, more specifically by differential 
equations. He also discusses a query language for this data model. 

— Sofie Haesevoets describes a triangle-based logic in which queries that are 
invariant under affinities of the ambient space can be formulated. She charac- 
terizes the expressive power of this logic and shows it to be equivalent to the 
affine-generic fragment of first-order logic over the reals. She also presents 
algorithms for computing an affine-invariant triangulation and covering. 

Applications. Looking at specific applications is important for two reasons. 
First, they reveal the possibilities of constraint database applications, often ap- 
plications that could not be done in relational database systems. Second, they 
test the limits of the current constraint data model and query language proposals 
and thereby stimulate their further extensions. The following specific applica- 
tions raise important issues and provide big challenges to researchers for the 
future. 

— Maria Teresa Gomez Lopez, Rafale Ceballos Guerrero, Rafael Martinez Gasca 
and Carmelo del Valle Sevilla apply constraint databases in the determina- 
tion of potential minimal conflicts, which can be further used for polynomial 
model-based diagnosis. 

— Viswanathan Ramanathan and Peter Revesz apply constraint databases to 
the genome map assembly problem. The genome map assembly problem is 
the problem of reconstructing the entire genome sequence of an organism 
based on overlapping fragments of the genome. They look at several algo- 
rithms for this problem. Using extensive computer experiments, they show 
that their constraint automaton, which can be solved using a constraint 
database system, solves the genome map assembly problem computation- 
ally more efficiently than the common alternative solution based on overlap 
multigraphs. Even more surprisingly, the average case running time of their 
solution increases only linearly while the running time of the other solution 
increases exponentially with the size of real genome data input. 

— Carson Kai-Sang Leung proposes a new dynamic FP-Tree mining algorithm 
to mine frequent itemsets satisfying succinct constraints. The proposed al- 
gorithm is dynamic, such that the constraints can be changed during the 
mining process. Based on a classification of constraints this paper describes 
the cases of relaxing and tightening constraints and evaluation results show 
the effectiveness of this approach. 



Preface 



VII 



Query optimization. Query optimization is the concern of making the evalua- 
tion of queries computationally efficient in space and time. These techniques are 
essential elements for the implementation of constraint database systems. We 
had two papers in this area. 

— Jan Chomicki discusses the problem of semantic query optimization for pref- 
erence queries and treats this problem as a constraint reasoning problem. His 
techniques make use of integrity constraints, and make it possible to remove 
redundant occurrences of the winnow operator resulting in a more efficient 
algorithm for the computation of winnow. The paper also investigates the 
problem of propagating integrity constraints. 

— Anagh Lai and Berthe Y. Choueiry consider the important problem of effi- 
cient join computation during query evaluation. They model the join com- 
putation in relational databases as a constraint satisfaction problem, which 
they solve using their technique called dynamic bundling. With dynamic 
bundling the join computation can be performed with major savings in space 
and time. 

The future of constraint databases. Implementation of constraint databa- 
ses is, of course, a major practical concern. While there are several prototype 
systems developed at universities and research laboratories, such as the C 3 , the 
DEDALE and the MLPQ systems, there are still no commercial implementa- 
tions of constraint databases. However, this situation may change in the future, 
as explained in the following two papers. 

— Dina Goldin describes how constraints can be eliminated from constraint 
databases, in the sense of reducing them to as simple a representation as 
used in relational database systems and geographic information systems. 
She proposes a 3-tier architecture for constraint databases, with an abstract 
layer for the infinite relational extent of the data and a concrete layer that 
admits both constraint-based and geometry-based representations of spatio- 
temporal data. 

— Mengchui Cai, from the DB2 group at the IBM Silicon Valley Laboratory, 
presents a way of integrating constraint databases into relational database 
systems. His main insight is that existing relational database systems can be 
extended by special functions that call a constraint relational engine at the 
appropriate places within an extended SQL query, while the constraint data 
itself can be represented within specialized relational tables. This proposal 
may lead to a practical and seemless way of integrating constraint data with 
relational data. 

This symposium would have been impossible without the help and effort of 
many people. The editors would like to thank the program committee for the se- 
lection of the papers and the local organizers, in particular Irene Guessarian, for 
the arrangements in Paris. We especially would like to thank Sofie Haesevoets for 
managing the conference Web site and many other practical arrangements, and 
Floris Geerts for advertising the symposium and composing these proceedings. 



VIII Preface 



The organizers are extremely grateful for the financial support given by Gen- 
eral Eleftherios and Argyroula Kanellakis, the University of Limburg (LUC) and 
the University of Nebraska-Lincoln. 

We would explicitly like to thank the Universite Pierre et Marie Curie, Paris 
6, for hosting the symposium. 

We are pleased to bring to the reader these symposium proceedings, which 
reflect major recent advances in the field of constraint databases. We were also 
glad to see the symposium bring together many researchers in the field of con- 
straint databases for a fruitful exchange of ideas. We also remembered those who 
due to their untimely death could not attend the symposium, including Paris 
Kanellakis and his family. Finally, we look forward to a continued growth in the 
field and to future symposium events. 

June 2004 Bart Kuijpers and Peter Revesz 



Conference Organization 



Chairs 




Bart Kuijpers 
Peter Revesz 


Limburgs Universitair Centrum, Belgium 
University of Nebraska-Lincoln, USA 


Program Committee 

Saugata Basu 
Alex Brodsky 
Jan Chomicki 
Berthe Choueiry 
Giorgio Delzanno 
Floris Geerts 
Marc Giusti 
Dina Goldin 
Stephane Grumbach 
Joxan Jaffar 
Manolis Koubarakis 
Stephan Kreutzer 
Bart Kuijpers 
Gabriel Kuper 
Zoe Lacroix 
Lixin Li 
Jan Paredaens 
Peter Revesz 
Philippe Rigaux 
Kai-Uwe Sattler 
Jianwen Su 
David Toman 
Jan Van den Bussche 
Dirk Van Gucht 
Nicolai Vorobjov 
Mark Wallace 


Georgia Tech, USA 

George Mason University, USA 

SUNY at Buffalo, USA 

University of Nebraska-Lincoln, USA 

Universita di Genova, Italy 

University of Helsinki, Finland 

CNRS, Ecole Poly technique, Paris, France 

University of Connecticut, USA 

INRIA, Paris, France 

National University of Singapore, Singapore 
Technical University of Crete, Greece 
Humboldt Universitat, Berlin, Germany 
Limburgs Universitair Centrum, Belgium 
Universita di Trento, Italy 
Arizona State University, USA 
Georgia Southern University, USA 
Universiteit Antwerpen, Belgium 
University of Nebraska-Lincoln, USA 
Universite Paris Sud, France 
Technische Universitat Ilmenau, Germany 
University of California at Santa Barbara, USA 
University of Waterloo, Canada 
Limburgs Universitair Centrum, Belgium 
Indiana University, USA 
University of Bath, UK 
Imperial College, London, UK 


External Reviewers 





Xiang Fu, Cagdas Gerede, Sofie Haesevoets, Anagh Lai, Hoda Mokhtar 



Publicity and Proceedings Chair 

Floris Geerts University of Helsinki, Finland 



Table of Contents 



Efficient Query Evaluation 

Constraint Databases, Data Structures and Efficient Query Evaluation .... 1 

Joos Heintz and Bart Kuijpers 

Spatial and Spatio-Temporal Data 

A New Shape Function Based Spatiotemporal Interpolation Method 25 

Lixin Li, Youming Li, and Reinhard Piltner 

Moving Objects and Their Equations of Motion 40 

Floris Geerts 

A Triangle-Based Logic for Affine-Invariant Querying 

of Two-Dimensional Spatial Data 52 

Sofie Haesevoets 

Applications 

Applying Constraint Databases in the Determination 

of Potential Minimal Conflicts to Polynomial Model-Based Diagnosis 74 

Maria Teresa Gomez Lopez, Rafael Ceballos Guerrero, 

Rafael Martinez Gasca, and Carmelo del Valle Sevilla 

Constraint Database Solutions to the Genome Map Assembly Problem ... 88 
Viswanathan Ramanathan and Peter Revesz 

Dynamic FP-Tree Based Mining 

of Frequent Patterns Satisfying Succinct Constraints 112 

Carson Kai-Sang Leung 

Query Optimization 

Semantic Optimization of Preference Queries 128 

Jan Chomicki 

Constraint Processing Techniques for Improving Join Computation: 

A Proof of Concept 143 

Anagh Lai and Berthe Y. Choueiry 



XII 



Table of Contents 



The Future of Constraint Databases 



Taking Constraints out of Constraint Databases 161 

Dina Q. Goldin 

Integrating Constraint and Relational Database Systems 173 

Mengchu Cai 

Author Index 181 



Constraint Databases, Data Structures 
and Efficient Query Evaluation* 

Joos Heintz 1,2,3 and Bart Kuijpers 4 

1 Universitad de Buenos Aires 
Dcpartamento de Computation 

Ciudad Universitaria, Pabellon I, 1428 Buenos Aires, Argentina 
joos@dc.uba. ar 

2 Consejo National de Investigaciones 
Cientificas y Tecnologicas (CONICET), Argentina 
3 Universidad de Cantabria 
Facultad de Ciencias 

Departamento de Matematicas, Estadistica y Comptacion 
Avda. de los Castros s/n, E-39005 Santander, Spain 
4 Limburgs Universitair Centrum 
Department WNI 

Universitaire Campus, 3590 Diepenbeek, Belgium 
bart . kui jpers@luc .ac.be 



Abstract. Constraint databases that can be described by boolean com- 
binations of polynomial inequalities over the reals have received ample 
research attention. In particular, the expressive power of first-order logic 
over the reals, as a constraint database query language, has been studied 
extensively. The difficulty of the effective evaluation of first-order queries, 
usually involving some form of quantifier elimination, has been largely 
neglected. 

The contribution of this paper is a discussion of various aspects that in- 
fluence the efficiency of the evaluation of queries expressible in first-order 
logic over the reals. We emphasize the importance of data structures and 
their effect on the complexity of quantifier-elimination. We also propose 
a novel data model that supports data exploration and visualization as 
well as efficient query evaluation. In this context, we introduce the con- 
cept of sample point query. Finally, we show that a particular kind of 
sample point query cannot be evaluated in polynomial sequential time 
by means of branching-parsimonious procedures. 



1 Introduction and Summary 

The framework of constraint databases was introduced in 1990 by Kanellakis, 
Kuper and Revesz [26] as an extension of the relational model that allows the 

* Research partially supported by the following Argentinian, Belgian, German and 
Spanish grants: UBACyT X198, PIP CONICET 2461, FW/PA/02-EIII/007, ALA 
01-E3/02 and DGCyT BFM 2000-0349. 



B. Kuijpers and P. Revesz (Eds.): CDB 2004, LNCS 3074, pp. 1—24, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 



2 



Joos Heintz and Bart Kuijpers 



use of possibly infinite, but first-order definable relations rather than just fi- 
nite relations. It provides an elegant and powerful model for applications that 
deal with infinite sets of points in some real affine space IR™. In the setting of 
the constraint model, infinite relations are finitely represented as boolean com- 
binations of polynomial equalities and inequalities, which we interpret, in this 
paper, over the real and exceptionally over the complex numbers. The case of the 
interpretation over the real numbers has applications in spatial databases [31]. 

Various aspects of the constraint model are well-studied by now (for an 
overview of research results we refer to [28] and the textbook [33]). The relational 
calculus augmented with polynomial constraints, or equivalently, first-order logic 
over the reals augmented with relation predicates to address the database re- 
lations Ri, . . . ,R S , F0(+, x, <, 0, 1, i?i, . . . , R s ) for short, is the standard first- 
order query language for constraint databases. The expressive power of first- 
order logic over the reals, as a constraint database query language, has been 
studied extensively. However, the difficulty of the efficient evaluation of first- 
order queries, usually involving some form of quantifier elimination, has been 
largely neglected. The existing constraint database systems are based on gen- 
eral purpose quantifier-elimination algorithms and are, in most cases, restricted 
to work with linear data, i.e. , they use first-order logic over the reals without 
multiplication [28, Part IV]. 

The intrinsic inefficiency of quantifier elimination represents a bottle-neck 
for real-world implementations of constraint database systems. General purpose 
elimination algorithms (such as, e.g., [6, 12, 18, 23, 32]) and standard data struc- 
tures prevent query evaluation to become efficient. The fact that the knapsack 
problem can be formulated in this setting indicates that geometric elimination 
may be intrinsically hard. Another example for this complexity phenomenon is 
given by the permanent of a generic n x n matrix, which appears as the output 
of a suitable first-order query (see [22] for details on both examples). 

In the literature of constraint databases, the data model proposed to de- 
scribe geometric figures in 1R™ is based on quantifier-free first-order formulas 
over the reals [28, 33]. The data structures needed to implement this data model 
are left largely unspecified. It is widely understood that these formulas should be 
represented by explicitly giving disjunctive normal forms using dense or sparse 
encoding of polynomials. However, disjunctive normal forms may be unnecessar- 
ily large and the sparse representation of elimination polynomials may be very 
inefficient. For example, the sparse representation of the determinant of a generic 
n x n matrix contains n\ terms. On the other hand, the determinant can be rep- 
resented by an 0(n 3 ) arithmetic boolean circuit (with divisions) which describes 
the Gaussian elimination algorithm. This suggests the use of alternative data 
structures for the representation of the classical data model of constraint data- 
bases. Indeed, the use of arithmetic boolean circuits as alternative data structure 
allows the design of a new generation of elimination algorithms which produce 
an exponential time complexity gain compared to the most efficient algorithms 
using traditional data structures (see [36] for a definition of the notion of arith- 
metic boolean circuit and [22] for this kind of complexity results). Nevertheless, 



Constraint Databases, Data Structures and Efficient Query Evaluation 



3 



in terms of the syntactical size of the input, the worst-case sequential time com- 
plexity of the elimination of a single existential quantifier block by means of 
the new algorithms remains still exponential. However, when we measure the 
input in a semantic (i.e., geometric) way, as is achieved, e.g., by the system 
degree , elimination of a single existential quantifier block becomes polynomial 
in this quantity (see e.g. [3,4,13,15]). Unfortunately, this does not suffice for 
the design of algorithms able to evaluate purely existential queries in sequential 
time which is polynomial in the number of bounded variables. In fact, an non- 
polynomial lower bound for the underlying elimination problem can be deduced 
from the P]r NP® conjecture in the Blum-Shub-Smale complexity model over 
the real numbers [8] . 

Another shortcoming of the classical data model for constraint databases 
is that it does not support data exploration and local visualization. Indeed, a 
quantifier-free formula in disjunctive normal form, describing the output of a 
query, allows the answering of, for instance, the membership question, but it 
is does not allow an easy exhibition of the output, by, e.g., the production of 
sample points , or, for low dimensions, a visualization of the output. To increase 
the tangibility of the output, we suggest considering a new type of query that 
produces sample points. Furthermore, it could be desirable to support an ex- 
ploration of the neighborhood of such a sample point. This could be achieved 
by representing the output by a cell decomposition consisting of cells which are 
non-empty open subsets of smooth real varieties. In this way, starting from any 
sample point, its neighborhood within its cell may be explored. In this sense, 
we propose a novel data model for constraint databases, consisting of smooth 
cells accompanied by sample points. The known most efficient elimination pro- 
cedures produce naturally such output representations, a typical example being 
CAD [12]. Therefore, when constraint database theory invokes quantifier elimina- 
tion in query evaluation, it should also incorporate these features of the existing 
elimination algorithms. 

In this context, we extend the concept of sample point query to queries that 
give rationally parameterized families of polynomial functions as output. Such 
queries will be called extended sample point queries. The main outcome of the 
paper is a proof that extended sample point queries, associated to first-order for- 
mulas containing a fixed number of quantifier alternations, cannot be evaluated 
in polynomial sequential time by so-called “branching-parsimonious algorithms” . 
This lower bound result suggest that further research on the complexity of query 
evaluation in constraint database theory should be directed towards the identi- 
fication of database and query classes that have a strongly improved complexity 
behavior. As a pattern for the development of such a theory, we may consider 
a new type of elimination algorithms which are based on the notion of system 
degree and use non-conventional data structures (see [2-4,13,15,17,20,21,24, 
30,34]). 

This paper introduces a number of new concepts for constraint database the- 
ory that sometimes require certain notions from algebraic complexity theory, 
algebraic geometry and commutative algebra. These notions can be found in