An expressive model for comparing tree-structured
Technical Report, Stanford University Database
We study the problem of comparing tree-structured data. This problem has applications in several diverse
fields, including molecular biology, natural language processing, image and pattern matching, program
analysis, document management, and change management in databases [SZ90, NBR88, WZJS94, WCM+94,
Yan91, WSC+97, C3]. Our interest in this problem stems from implementing a subscription service that
detects database changes by periodically querying an external database and checking the results for new
changes [CGL+97]. For example, consider a Web site listing local entertainment events [EG]. Our prototype
allows a user to subscribe to certain changes e.g., the cancellation or rescheduling of an event. The prototype
periodically queries the event listings, and uses a tree-differencing algorithm to detect changes in the results.
Our goal is to find a compact representation of the changes between two trees. If we use linear edit
scripts, our goal is to find a "minimum-cost" script that transforms the first tree into one that is isomorphic
to the second. We assign costs to operations and look for a minimum-cost script to ensure that the script
does not do more work than needed. Unfortunately, this traditional method of describing changes using
linear edit scripts has several problems when used with subtree operations such as moves and copies.