DMS Domain Specifications

Program analysis and transformation tools, to be very general, must apply to a wide variety of languages. Since it is impractical to build "all languages" into a single tool, one must be able to specify a particular language (and even dialect) to such a tool quickly for it to be widely useful.

The DMS Software Reengineering Toolkit is designed to allow the "domain" (language) engineer specify those languages elements quickly and accurately, so that she may spend most of her attention on the actual program analysis or transformation of interest.

These pages discuss such specifications in some detail, and show them applied to Nicholas Wirth's Oberon language as an example. This will provide the would-be DMS domain engineer a feel for DMS (Yes, SD has Tools for Oberon based on the definitions show here). The tutorials and reference documentation provided with DMS itself is far more extensive and detailed.

For a simpler but holistic view of DMS domain specifications working together, see Algebra as a DMS Domain.

DMS Domain Definition Elements Necessary

The following formal descriptions are minimally needed for the DMS Software Reengineering Toolkit to parse and analyze a programming language:

Implementing these for DMS is quite similar to using a traditional parser generator. It is much easier because DMS offers more sophisticated means for specifying lexers, and simpler yet more powerful means for specifying grammars backed up by an GLR parsing engine, which avoids most of the headaches associated with traditional parsing engines. In addition, with just a DMS grammar, the domain engineer gets (abstract syntax) "trees for free"; other parsing enginers requires the domain engineer to specify in excruciating detail how to build trees. This makes the DMS engineer much more productive, if she is starting from scratch.

As a practical matter, if one is to analyze or manipulate source code, one must provide for Life After Parsing. This means (optionally but strongly encouraged) providing definitions to DMS for the following:

  • A Prettyprinter: this defines how to print an instance AST as valid/comilable source text, complete with comments
  • Static Analysis via Attribute Grammars: how to specify information collection across an AST easily
  • Symbol Tables: building a mapping from identifiers to their definition sites, types, and usage instances
  • Control Flow Analysis: constructing a control flow graph for the micro-semantics of the program
  • Data Flow Analysis: determining how data flows across the program, controlled by the control flow graph. One may need special support to handle indirect references.
  • Source to Source Rewrites: definining transformations over trees in terms of surface syntax familiar to the programmer
  • Data Flow Pattern Matching: writing surface syntax pattern that can match dataflows rather than syntax

DMS is available with sets of definitions for the above for a wide variety of languages.

For more information: [email protected]    Follow us at Twitter: @SemanticDesigns

DMS Domains