Für eine korrekte Darstellung dieser Seite benötigen Sie einen XHTML-standardkonformen Browser, der die Darstellung von CSS-Dateien zulässt.

. .

The SiDiff Project

Today it is hard to imagine industrial workflows without technical documents, particularly in the planning phases of software-intensive systems. Examples of technical documents are CAD documents, VLSI documents, or numerous variants of software specification documents, e.g. UML specifications, Matlab/Simulink documents or domain-specific documents.

Services of Model Repositories

Complex technical documents are developed by teams in projects over long periods of time and are collected in repositories. Documents exist in multiple versions, which may be revisions or variants. This leads to several practical problems:
  • Differencing and Merging: Developers must be able to see the difference between two documents. Since many people work on the same models, team work needs to be supported. Synchronisation and merging of the changes made by different persons becomes crucial for successful development.
  • Analysis of Document Histories: Large systems often have variants in practical use, e.g. old versions which cannot be upgraded, or customer-specific variants. All variants must be maintained in parallel. If a defect is detected in one variant the same defect is likely to occur in other variants, thus one has to search for this defect in all other variants.
  • Clone Detection: The architecture of large systems typically degenerates during their evolution, and the cost of further maintenance increases significantly. One typical problem is the emergence of clones, i.e. identical of similar parts of the system, which should be unified and merged. This problem is particularly important for system families.
All the above practical problems require tools which are able to compare documents of document parts. Difference tools such as the GNU diff work properly on textual documents and source code, but they fail to work correctly with technical documents which are conceptually a graph and which typically have a graphical representation.

The SiDiff project aims at building difference and analysis tools for technical documents which are conceptually a graph. This goal poses several challenges regarding technical problems as well as research issues.

Similarity-based Differencing of Models

A range of technologies are available for differencing models. The technologies differ in their required prerequisites, efficiency, and quality of computed differences. One example is a closed development environment in which all model editors and other tools which modify models assign and maintain a persistent identifier at each model element. In such a context, one can efficiently compute differences on the basis of persistent identifiers. However, one cannot compare models which are not version of one another, and the quality of computed differences cannot be controlled or even measured.

In contrast to this, SiDiff is a meta model independent approach to model comparison. It is primarily based on the notion of similarity between model elements, but covers other approaches like id-based or signature-based model comparison as well. The main advantage of SiDiff is that it offers a highly configurable environment and is therefore easily adaptable to any model type. The intention of this paper is to give the reader an overview of the basic concepts behind SiDiff. So this text makes no claim of completeness. The reader should be aware that additional functionality and optimizations exist within the framework which are not mentioned here. Furthermore the text abstracts any concrete implementation concepts and technical details as long as the comprehensibility is not endangered.

Application Domains

The SiDiff differencing engine has been used for computing differences of models in serveral application domains, including
  • different types of UML diagrams used in software development
  • Matlab/Simulink models used in the development embedded systems
  • metabolic reaction chains used to model chemical processes in bio-informatics
All model types mentioned above can be considered as graphs, but their structure differs in many obvious or subtle details. Nonetheless, they are similiar enough to make a generic approach reasonable: The SiDiff differencing engine is a generic, highly configurable algorithm.

The generic algorithm must be configured for each type of model. A configuration consists of a transformations from the original document into an internal structure, the definition of the similarity of elements, a specification of the output, and further details.

More details about the features of SiDiff, informations on some of the currently available configurations, and a small collection of examples can be found on separate pages: