DMS Symbol Tables

The DMS Software Reengineering Toolkit is designed to allow the "domain" (language) engineer specify those languages quickly and accurately, so that she may spend most of her attention on the actual program analysis or transformation of interest.

At DMS Domains, we give some background on the necessary elements to build a DMS language processing domain. We have already described how to easily specify a grammar to DMS, automatically acquiring a parser and syntax tree builder. Usually a symbol table is needed to be able to semantically interpret the resulting tree.

DMS provide sophisticated symbol-table support facilities for constructing, navigating, inspecting, and updating scopes for

  • global and local symbols, with classic lexical scoping rules
  • namespace trees
  • inherited, overloaded and other language-dependent name lookup rules
The facilities provide complete support for languages as complex as C++17.

Scopes and Symbol Spaces

DMS provides a parallel-safe symbol table package, implemented in PARLANSE, that supports a set of related symbol spaces, which collectively represent how names are interpreted across a program. Members of the set are often linked by various references to one another. The complete set of symbol spaces is the "symbol table".

Each symbol space is a mapping from Uncode-based identifiers to arbitrary domain-specific associated information.

This information may be:

  • A domain-specific type or set of types
  • A constant or a program element (function, subroutine, ...) associated with the identifier
  • A reference (refers in the diagram) to another symbol space
  • Lists of mentions of the symbol in an AST
  • Other useful information needed to support analysis or transformation of the language
Optionally associated with each symbol space as a whole are:
  • A PARLANSE procedure for finding an identifier in the symbol space. A default procedure simply finds the corresponding entry in the symbol space, using a scalable hash table lookup of the identifier. Custom domain-specific procedures can implement overload and/or other complex lookups inside the symbol space. (Our C++ name resolver uses this capability heavily).
  • A set of references to parent symbol spaces with integer priorities; this supports handling multiple inheritance. Classic lexical scoping rules are realized by providing the "lexical scope" symbol space with a reference to a single parent symbol space (with arbitary priority)
  • An arbitrary search behavior procedure to continue an identifier lookup search if an identifier is not found in the symbol space. A standard lexical-scope-search behavior is used to implement lexical nesting using the unique parent space reference captured with a plain lexical scope. Other custom search behaviors can realize extremely complex lookups.

A scope is a set of program locations which have the same interpretation of identifiers. Often these correspond to source code blocks (and corresponding specific subtrees in the AST that represent those blocks). For each scope, DMS associates a single symbol space to represent the mapping. The symbol space itself may contain the complete map; more often, the transitive closure of the symbol space and those symbol spaces determined by its associated search behavior procedure define the complete mapping.

Scope Definition and Symbol Lookup

How a scope is defined is completely a function of the target domain. This is implemented as a custom PARLANSE function that maps an AST node (representing a program location) to a symbol space that represents the scope. Identifier lookup then uses the symbol-space search behavior procedure for searching that space or any associated spaces to either find a matching identifier, or declare that no such identifier is found.

It is straightfoward to connect DMS Rewrite Rules to PARLANSE symbol lookup procedures, allowing rewrite rules to consult type or other information associated with a rule-matched identifier.

An example symbol table for C++

This diagram shows a small C++ function on the left, and the set of related symbol spaces on the right.

C++ has a variety of other symbol spaces and search behaviors to realize the complexities of C++ overloads and Koenig lookup. This specific example is taken from our C++ Front End.

Symbol Table Construction

DMS symbol tables are relatively easily constructed during a name resolution step that follows parsing, using DMS Attribute Grammars. Each attribute grammar rule accepts a reference to a "current" symbol space from its parent, can look up identifiers in that space, updates that symbol space by side effect, and/or creates child symbol spaces to pass down to its children, as appropriate for name resolution for the target language.

The parallel-safe nature of DMS symbol tables allows one to often harness parallelism of attribute grammars to construct the symbol table in parallel. Treating symbol spaces and entries as events allows symbol table construction and lookup to occur on demand while other computations are occurring, even if the information is not yet available.

Next: DMS Rewrite Rules

For more information: [email protected]    Follow us at Twitter: @SemanticDesigns

DMS
Symbol Tables