Smart DifferencerTM tools

Developers frequently need to determine differences between various versions of text files comprising an application system's source code. Among others, such differences facilitate reviewing, debugging, and testing newly changed source code. Ideally, developers would like to be told about differences in terms that make sense with respect to the type of source code and its constructs, e.g., "delete statement", "insert expression", "move block", "rename identifier".

Conventional differencing tools (e.g., diff) compute differences based on source lines of text, using line-based models of editing like "insert line", "delete line", or "replace line". These tools are very useful for arbitrary text, but are not cognizant of the structure of the programming language in which source code is written. When used on source code, this often causes the reported differences not to obey the boundaries of the underlying language constructs, e.g. a fragment of a statement or the suffix of one statment and a prefix of another statement may be reported as a change based on accidental organization of these into lines. Worse, simple reformatting or changes in comments will result in lots of apparent changes without any actual semantic impact on the source code. This is conceptually jarring to the developer, who thinks of changes in terms of program structures and abstract editing operations manipulating such structures.

The Smart Differencer

The SD SmartDifferencer shows the differences between two versions of source code in terms of abstract editing operations applied to programming language constructs. The language constructs are discovered by parsing the code using a production language parser (and depending on language, determining scopes and symbol tables). Editing operations include insert, delete, copy, merge, and rename (globally, across a scope, or pointwise). Language constructs include primitives like identifiers, numbers, string literals, etc., as well as compound phrases like declarations, statements, expressions, etc. The editing operations are not bound to source lines but may only affect part of a line. They ignore comments, irrelevant whitespace, and actual formatting of numbers (radix, leading zeros) and string literals (equivalent escape sequences, etc.). Note that whitespace within string literals is considered as relevant and taken into account when determining differences.

By default, the SmartDifferencer produces output intended for consumption by a developer, This output is kept compact by summarizing multiple adjacent edits of the same kind into a single edit. Each of these edits is followed by the actual program fragments involved in the respective edit.

Alternatively or in addition, the SmartDifferencer can produce output intended for consumption by another tool, e.g. by a display tool visually displaying the compared source codes to a developer with the changes being highlighted by different colors depending on the kind of the respective edit. This output includes further details of the edits that are usually not of direct interest to a developer.

Benefits include enhancement of developer productivity both individually and during code reviews by suppressing semantically irrelevant changes like formatting, comments, whitespace, representation of numbers and strings, etc. focusing the developer's attention on changes that are semantically coherent with respect to the language and thus meaningful, and describing differences in terms of edits over the underlying language constructs. Integration of such a differencer into a source code control system will also aid developers.

What about Semantic Differencing?

Everybody wants a semantic differencing tool, that determines if two programs do the same thing, or if not, where they differ. It is a wonderful concept, but impossible to implement fully in practice due to the Halting Problem, which fundamentally says it is impossible to analyze arbitrary computer computer code, let alone compare two blocks to see if they do the same thing.

What can be done is to provide some interesting approximations of semantic differencing, and the Smart Difference does that. First, it uses the language structure to compare code; if the language structures match, they likely do the same thing although it is easy to construct counter examples that fail due to context. The renamed-identifier check is semantic; it is the case that changing the name of an identifier consistently within the identifier scope in most languages has not impact at all by design. The SmartDifferencer tool doesn't quite do this; it verifies that an identifier is renamed consistently with a block but not necessarily a scope. We expect future versions to do this accurately and report such changes. Often, the order of declarations in a file (e.g., members of a Java class) have no semantic impact, and the ideal Semantic Differencer would ignore such reshuffling or report such changes as "moved but no semantic import". The current versions do not but we expect future versions to do so.

Smart Differencer Typical Features

  • Compares two files for a specific language
  • Understands target language syntax precisely:
    • Whitespace and comments (ignored)
    • Keywords and identifiers
    • Integer and floating values and their equivalent but variant possible spellings
    • Strings and their equivalents according to escaping conventions
    • Full syntax structure of the language (using DMS language Front Ends)
  • Output in terms of language syntax elements: statements, expressions, blocks, identifiers
  • More succinct output than a conventional string diff tool, with coherent explanations and code display rather than simple string dumps
  • Detects consistent renaming within a block of code
  • Generates deltas in two forms:
    • Human readable format, showing location of deltas type (language nonterminals), locations (line,column) and before and after text
    • Summary form, showing just succint summary of delta types and locations

Available for the Following Languages

Download an evaluation copy

Unusual Requirements?

Is your language not listed? Does it run in an unusual environment, or you have some custom need? SD can configure a Smart Differencer tool for you! These tools are based on DMS, and inherit DMS's language agility and scalability.

Other Tools

Semantic Designs offers a variety of other software tools.

For more information: [email protected]    Follow us at Twitter: @SemanticDesigns


Smart Differencer tools