C++ Parser (Front End)

The C++ parser (front end) enables the construction of C++ custom compilers, analysis tools, or source transformation tools. It is a member of SD's family of language front ends, based on first-class infrastructure (DMS) for implementing such custom tools. The C++ front end includes:

  • Full Lexical analysis
    • Characters sets include ASCII, ISO 8859-1, UTF-8, UTF-16, Shift-JIS and many other standard and Microsoft character encodings
    • Conversion of literal values (numbers, escaped strings) into native values to enable easy computation over literal values
    • String literals represented internally in Unicode to support 16-bit characters
  • Explicit grammar directly implements the following standards and dialects:
    • ISO/IEC 14882:1998 standard
    • C++11 (14882:2011 revised standard)
    • C++14 (ISO/IEC 14882:2014 standard)
    • C++17 (draft ISO/IEC 14882:2017 standard)
    • Option for Microsoft Visual6 C++ dialect
    • Option for Microsoft Managed C++ dialect
    • Option for Microsoft XBOX 360 C++ dialect
    • Option for Microsoft VS 2013 C++ dialect
    • Option for GNU C++ dialects (GCC2/GCC3/GCC4/GCC5.0 with vector extensions)
    • Option for SystemC 2.1
    • Pragma handling for OpenMP versions 2, 3, 4.5
  • Preprocessor support
    • Controllable include directory paths
    • Option to fully expand preprocessor directives
    • Option to parse include files for definitions
    • Option to parse preserving preprocessor conditional directives, macros and include directives
  • Automatic construction of complete abstract syntax tree
    • Capture of comments and formats (shape) of literal values
    • Capture of ambiguous parses during parsing
    • Ability to parse large systems of files into same workspace, enabling interprocedural and cross-file analysis/transformation
    • Ability to parse different languages into same workspace, enabling cross-language analysis/transformation
  • Facilities to process syntax trees
    • Complete procedural API to visit/query/update/construct/print syntax trees
    • Source regeneration by prettyprinting and/or fidelity printing of syntax trees with comments and lexical formats
    • Automatically generated source-to-source transformation system
    • Ability to define custom attribute-grammar-based analyzers
  • Name and Type resolution
    • Type representation system for all C++ types defined
    • All identifiers resolved to their C++-defined type and stored in symbol tables
    • Automatic deletion of erroneous alternatives of ambiguous parses
  • Control Flow Extraction
    • Includes constructors, destructors, (assignment) operators and user-defined conversions
    • Includes explicit throw, catch (but not yet implicit exceptions)
  • Function-level Data Flow Analysis
    • Use-def chains
    • Local must-alias analysis for pointers
  • Transformation
  • Flexibility
    • Complete access to underlying DMS capabilities
    • Means to manage multiple language dialects with highly shared common core
    • Available as source code to enable complete customization
    • Robustness due to careful testing and application across many customers

Many of these facilities come as a consistent consequence of the front end being built to top of DMS.

Parsing

We provide here an example of parsing an extremely difficult bit of code: one of the many dreaded C++ most vexing parses. Here is an example; what is difficult at parse time is determining whether the line in the main program is declaration or an expression.

template<bool> struct a_t;

template<> struct a_t<true> {
   template<int> struct b {};
};

template<> struct a_t<false> {
   enum { b };
};

typedef a_t<sizeof(void*)==sizeof(int)> a;

enum { c, d };
int main() {
    a::b<c>d; // declaration or expression?
}
Classic C++ parsers tangle parsing and symbol table building to try to resolve this during parsing, which is why this is vexing. DMS's C++ front end isolates name resolution from parsing. It instead produces the following parse DAG, which represents the multiple syntactic interpretations of that line as shared subtrees (see lower right corner with crossed arcs). The C++ name resolver later builds symbol tables by walking the DAG, propagating constraints across the DAG and deleting the incorrectly-typed subtree, producing the final correct AST.

Control and Data Flow Analysis

Reasoning about C++ programs realistically requires control and data flow anlaysis of the code. The DMS C++ Front End builds fine-grain control and data flow graphs and makes them available to a tool builder. The control flow graphs include all semantic primitives of the C++ program, broken out as seperate elements, including (nested) constructor/destructor calls, and the traditional flow through statements and exceptions. It is easy to navigate back and forth from the AST to the control flow graph. The data flow tracks values from point of definition to used, and is attached as additional arcs to the (semantic) control flow nodes produce or consume them.

You can see an (SVG) example of the control and dataflow for the following program:

struct T {
   T();
   T(const T&);
   virtual ~T();
   T &operator=(const T&);
   int a, b;

   void foo(T *t) throw() {
     this->a = t->b;
   }
   
};

T t1o, t2o;
int x;

int main() throw() {
   T *t1 = &t1o;
   T *t2 = &t2o;

   t2->b = 4;

   t1->foo(t2); // writes t1->a; reads t2->b

   x = t1->a + t1->b + t2->a + t2->b;
}
Orange nodes are the control flow nodes, orange arcs are control flow arcs, dashed brown arcs are expression value flows, and blue arcs are data flow arcs. The altenating regions (tan and white) are regions.

Side effects analysis

The C++ front end provides deep analysis capabilities that can be used to support custom tools. Knowing the side effects of a method is extremely useful when reasoning about a program. The following code sample shows the kind of complexity possible in such an analysis: The side effects are computed by the C++ front end and stored as inspectable data structures associated with the various semantic elements of the control flow graph. A debugging utility can dump them. The side effects reported for foo are:

Parameters: 2
  Location:
    Path: <This>
  Location:
    Path: t#16
Reads: 1
  Location:
    Path: ?{::T}.b
    Pointed to by value number: 0
      Location:
        Path: t#16
Writes: 1
  Location:
    Path: ?{::T}.a
    Pointed to by value number: 0
      Location:
        Path: <This>
i.e. foo> reads t->b and writes this->a (with t being an explicit parameter and this being an implicit object parameter).

Tools for C++

Here are some sample tools (many offered by SD as products) built using the C++ front end:

Your organization may use DMS with the C++ front end to implement and deploy your own custom tools. The sample tools can be obtained in source form as part of the C++ front end for customization. Semantic Designs is also willing to build custom tools under contract.

For more information: [email protected]    Follow us at Twitter: @SemanticDesigns

C++ Parser
Front End