The SiDiff ComponentsThe general approach of comparing two models is as follows. At first the models are usually available as textual or binary documents, often specified within XMI or a proprietarian vendor format. In case they do not directly exist as instances of EMF/Ecore based domain models, SiDiff reads and transforms the given input documents for further processing. Once the models are accessible for the SiDiff kernel they can be compared and the resulting difference information is visualized to the user or used as input for further applications. The picture below shows the process of a typical model comparison as well as the main components that are involved. The workflow and the single components will be described in more detail in the next paragraphs.
The SiDiff Workflow
AnnotationBefore the actual comparison processes is triggered, the models to be compared can be enriched with additional information by the means of annotations. Every element of the model is visited once and one or more annotations are computed and attached to the element. Which annotations are computed is defined within a configuration file. This mechanism can be used to calculate specific static, derived information about elements that can be used in the comparison workflow, like software metrics, hash-values and path-information.
MatchingIn the second step of the model comparison correspondences between elements are established. SiDiff is fully able to support the three major state of the art matching strategies, i.e. ID-based, signature-based and similarity-based matching. It is also possible to combine the different approaches where applicable, e.g. using a id-based matching first and apply a signature based matching on the remaining unmatched model elements. This allows to capitalize on adequate features of a given meta model without sacrificing any applicability. The different strategies are now discussed in more detail.
This matching approach is the most trivial and fastest. Every nodes has by definition a persistent and unique ID and nodes that have the same ID are deemed corresponding and therefore are matched. It has to be noted that not every domain and tool is designed to support such IDs, This approach also usually fails when tools from different vendors are used in the development process of the models.
Signature Based Matching
In some application contexts the single elements do not have a unique ID, but share characteristics that can be used as a definite discriminator and thus signature based matching can be applied. The signature of each element is usually computed during the annotation phase in form of a hash-value over the relevant characteristics. In case that the signature can be used like an unique ID elements that share the same signature are matched at once. If the signature is only an necessary implication for a possible matching it can be used used to downsize the search space significantly and only promising elements undergo any further similarity analyses.
Similarity Based Matching
Besides the already mentiones and rather trivial approaches to model comparison, SiDiff is also capable to establish matches between elements based on the notion of similarity. This is an advantage because not every context of application offers the potential to use the previously mentioned simplifications to establish correspondes between elements. Within a similarity-based approach, the similarity between two elements is usually determined by either local attributes or by elements in the near proximity, e.g. referenced- or child-elements. SiDiff offers a wide range of functions for computing similarities values based on such characteristics. New functions can be added easily if an application context warrants it.
Difference EngineThe difference engine takes the two input models and the computed correspondences to create an unified XML-document. This document can be seen as the output of the comparison. It consists of all elements of the two original models (corresponding elements are only contained once), as well as the difference information that was computed during the comparison process. The difference between two documents can be easily deduced:
- structural difference: Elements that have no correspondence in the document, i.e. insertion and deletions
- attribute difference: Two elements that correspond, but differ in their attribute values
- reference difference: Two elements that correspond, but differ in their references
- move difference: Two elements that correspond, but have different parent elements