The SiDiff Project
Today it is hard to imagine industrial workflows without technical documents, particularly in the planning phases of software-intensive systems. Examples of technical documents are CAD documents, VLSI documents, or numerous variants of software specification documents, e.g. UML specifications, Matlab/Simulink documents or domain-specific documents.
Complex technical documents are developed by teams in projects over long periods of time and are collected in repositories. Documents exist in multiple versions, which may be revisions or variants. This leads to several practical problems:
- Differencing and Merging: Developers must be able to see the difference between two documents. Since many people work on the same models, team work needs to be supported. Synchronisation and merging of the changes made by different persons becomes crucial for successful development.
- Analysis of Document Histories: Large systems often have variants in practical use, e.g. old versions which cannot be upgraded, or customer-specific variants. All variants must be maintained in parallel. If a defect is detected in one variant the same defect is likely to occur in other variants, thus one has to search for this defect in all other variants
- Clone Detection: The architecture of large systems typically degenerates during their evolution, and the cost of further maintenance increases significantly. One typical problem is the emergence of clones, i.e. identical of similar parts of the system, which should be unified and merged. This problem is particularly important for system families.
The SiDiff project aims at building difference and analysis tools for technical documents that have a graphical representation. This goal poses several challenges regarding technical problems as well as research issues.
We have investigated the problem of computing differences of documents in serveral application domains, including
- different types of UML diagrams used in software development
- Matlab/Simulink models used in the development of embedded systems
- metabolic reaction chains used to model chemical processes in bio-informatics
The generic algorithm must be configured for each type of technical document. A configuration consists of a transformations from the original document into an internal structure, the definition of the similarity of elements, a specification of the output, and further details.
More details about the features of currently available configurations, and a small collection of examples can be found on separate pages:
- Examples and Screenshots
- description of UML configurations
- description of the Simulink configuration
- Difference Viewer Plugin for FUJABA
We are cooperating with several external partners which use SiDiff technologies to solve problems in different application domains:


