Query Processing in Partly Inaccessible Databases
Survey
One important characteristic of distributed database management systems
(DBMS) is
that due to network or machine failure the environment may become
partitioned into sub-environments that cannot communicate with each
other. However, there are various application areas where the individual
sub-environments should remain operable in such situations . In
particular queries to the database should be processed in an
appropriate way.
To this end, the final and all intermediate results of queries in
distributed DBMS must be regarded as potentially vague or
incomplete. As a consequence, we have to deal with vague values and
vague collections during query processing. Whereas vague values have
been addressed in various papers dealing with null values, vague
collections stemming from inaccessibility represent a novel research
topic. To define a closed approach for vague collections, adequate
representations are needed. Furthermore the semantics of the operators
usually provided by query languages have to be redefined for these
representations.
To become more precise, let us consider an example. Assume a
distributed DBMS supporting
links to represent relationships between objects. Further assume the
example database sketched in the figure below. The solid arrows
between the rivers and the countries depict flows_through
relationships, while the dashed arrows depict has_capital
relationships.
The objects (rivers, countries, and capitals) are distributed
over three segments, segment 2 being currently inaccessible.
If we have to
evaluate a query in this situation the answer to this query can be
based only on the information stored on the accessible segments 1 and 3.
However, the potential error in the result of the
query should be as small as possible.
The correct answer to the query Q: Select those capitals of
countries crossed by the river Rhine or the river Rhone which have
more than 3 million inhabitants! would be the set
{Berlin, Paris}.
Due to inaccessibility, the usual inside-out
evaluation of the query, starting from the two specified rivers and
then navigating via the countries to the capitals, reaches
only Berlin as a result object. Although Paris itself is
accessible, it is - due to unreachability - missing in the
result! My approach uses a logical predicate to describe those missing
elements.
The usefulness of this logical predicate can be demonstrated as
follows.
Assume a second query R had collected the capitals
Amsterdam, Paris, and Bern. If we compute
the intersection S of Q and R, we check the three
capitals of R against the logical predicate of Q.
Because both Amsterdam and Bern have too few
inhabitants they cannot be in Q and hence not in S. For
Paris we cannot decide, if it was in Q and so it belongs to
the so-called explicit part of the upper bound of S.
To sum up,
my representation of a vague set consists of three parts:
- the explicit lower bound,
- the explicit upper bound, and
- the logical predicate.
The logical predicate is useful, for in general
there are missing elements which are nevertheless accessible. If we
had not used a logical predicate in the above example, Amsterdam and
Bern would have had to be inserted into S because of the
incompleteness of Q.
Vague multisets are handled quite analogously. In addition, in a vague
multiset the number of occurrences of an element can be vague.
Vague lists are more complicated, because when sorting a vague set
or a vague multiset, even the arising order can be vague. My approach comprises
an adequate representation for vague lists, too.
Although my concepts for the dealing with partial inaccessibility are
independent of a particular query language and even of a certain datamodel,
they are validated by being implemented into the
query language NTT, which is based upon
PCTE, the ISO standard for an open repository.
Literature relevant to the topic:
- O. Haase.
Mengenorientierte Anfragen auf partiell zugreifbaren Datenbanken.
PhD thesis, Universität Gesamthochschule Siegen, 1997.
German Text, to appear.
abstract
short version (PostScript, 378 KB)
- O. Haase and A. Henrich.
Query Processing Techniques for Partly Inaccessible Distributed
Databases.
In C. Small, P. Douglas, R. Johnson, P. King,
and N. Martin, editors,
Advances in Databases; Proceedings of the 15th British National
Conference on Databases, BNCOD 15, volume 1271 of Lecture Notes in
Computer Science, pages 123 - 125, London, UK, July 1997. Springer.
- O. Haase and A. Henrich.
A Hybrid Representation of Vague Collections for Distributed Object
Management Systems.
Submitted for publication.
abstract
- O. Haase and A. Henrich.
Error Propagation in Distributed Databases.
In N. Pissinou, A. Silberschatz, E. K. Park, and K. Makki, editors,
Proceedings of the Fourth International Conference on Information and
Knowledge Management (CIKM'95), pages 387-394. Baltimore, USA, November 28
- December 2 1995. ACM press.
abstract
- O. Haase and A. Henrich.
Error Propagation in Distributed Databases - Mathematical
Foundations.
Technical Report 95/6, Universität - GHS - Siegen, D-57068
Siegen, 1995.
For further information, comments, etc. please contact
me by email
or have a look at
my homepage.