DMS PrettyPrinters (aka "Trees to Text")

The DMS Software Reengineering Toolkit is designed to allow the "domain" (language) engineer specify those languages quickly and accurately, so that she may spend most of her attention on the actual program analysis or transformation of interest.

At DMS Domains, we give some background on the necessary elements to build a DMS language processing domain. We have already described how to easily specify a grammar to DMS, automatically acquiring a parser and syntax tree builder. Other parts of DMS allow the engineer to transform the tree... at which point there is a need to regenerate (prettyprint) valid, compilable source text from that modified tree.

On this page, we show how prettyprinting is specified and executed by DMS.

We use Nicholas Wirth's Oberon language as an example, as it is a real, practical language yet simple enough so the entire parser and prettyprinter definitions can be easily be displayed and understood here.

PrettyPrinting Concepts [1]

The problem of converting an AST back into source code is generally called "prettyprinting". There are two subtle variations: regenerating the text matching the original as much as possible (fidelity printing), and explicitly-controlled(nice prettyprinting), which generates nicely formatted text. How text is printed depends on whether the original programmres will be working on the regenerated code (they often want fidelity printing) or not (at which point any nice prettyprinting is good for the next person who must read the code).

To do prettyprinting well requires usually more information than a classic parser collects. Parsers that collect enough information to do this well (as DMS does) are re-engineering parsers; virtually none of the classic parser generators collect or store the necessary information for good prettyprintig.

Tree Visitor Paradigm

The fundamental way prettyprinting is accomplished when there is an AST, is by walking the AST (classically called Visitor pattern) and generating text based on the AST node content begin visited. The basic approach node visitor is: call children nodes left-to-right (assuming that's the order of the original text) to generate the text they represent, interspersing additional text as appropriate for this AST node type. To prettyprint a curly-braced block of statements requires the following PrettyPrintBlock visitor, and a visitor for the statements that are part of the block:


 PrettyPrintBlock(Node):
     Print("{"}; PrintNewline();
     Call PrettyPrint(Node.children[1]); // prints out statements in block
     Print("}"); PrintNewline();
     return;

 PrettyPrintStatements(Node):
     do i=1,number_of_children
         // print one statement:
         Call PrettyPrint(Node.children[i]); Print(";"); PrintNewline();
     endo
     return;

Note that this spits out text on the as the tree is visited in-order.

Printing terminals

There's a number of details that need to be managed. A key problem is that for any artifact to be printed, there may be multiple valid ways to print it. The prettyprinter needs to choose the best way, balancing readability and preservation of formats. For end-users to accept prettyprinted results, the tool may have to preserve some properties of the source text that one wouldn't normally think to store. DMS does this by capturing (in ways defined by the domain engineer in the lexer), format (or shape) properties of the original lexeme, such as character case, number radix, etc. and preserving this formatting information in the constructed (or modified!) AST; this formatting information is then used to shape the prettyprinted result. (DMS can resort to the original text in the original file if necessary, but that's very expensive). Here are some specific issues:

  • Keywords: Sometimes keywords have multiple spellings or abbreviations, or allow different alphabetic case. One may be able to choose a canonical spelling/case, or, depending on the audience, one may have to preserve information to enable regeneration of the original keyword.
  • Literal values: For AST nodes carrying literal values, one has to regenerate the literal value. This is harder than it looks if the answer is to be accurate.
    • Numbers: Printing floating point numbers without losing any precision is difficult, and scientists hate it when the value of Pi gets damaged even in the last digit. Often one must regenerate the radix of the number; coders having entered a number as a hex literal are not happy when you regenerate the decimal equivalent even though it means exactly the same thing. Even leading digits and the presence of an a egregious + may be necessary to preserve.
    • Strings: For string literals, one must often regenerate the original quotes and the string literal content, including escape sequences for characters that have to be escaped. (For PHP, which quote was used, and determines which characters in the string literal has to be escaped; PHP strings have lot of other interesting formatting quirks because they are really best treated as implicit string concatenation expressions).
    • Identifiers: Modern languages such as Java and C# allow arbitrary Unicode identifiers. When prettyprinting such identifiers to non-Unicode encodings, they must be properly escaped. Case differences in variable names may matter or not, depending on the type of the variable PHP allows funny characters in different type of identifiers (e.g., "$") but it isn't always there (see $ variables embedded in literal strings).
  • Spacing: some languages require whitespace in critical places. The two tokens ABC17 42 better not be printed as ABC1742, but it is generally ok for the tokens ABC17 ( 42 ) to be printed as ABC17(42). One way to solve this problem is to put a space wherever it is legal, but recipients of such code often don't like the result: too much whitespace. This is not an issue if all one is going to do with the code is compile it.
  • Newlines: languages that allow arbitrary whitespace can technically be regenerated as a single line of text. People hate this, even if you are going to compile the result; sometimes you have to look at the generated code and this makes it impossible. So you need a way to introduce newlines for AST nodes representing major language elements (statements, blocks, methods, classes, etc.). This isn't usually hard; when visiting a node representing such a construct, print out the construct and append a newline.
  • Comments: Most standard parsers throw comments away completely. This is unacceptabel for a prettyprinted answer of transformed code. DMS captures comments in the AST, so that AST transformations can inspect/generate comments too.
  • Fidelity Data: Often people want the original layout formatting of code where possible. To do this the tool must store column-number information for concrete tokens, and have prettyprinting rules about when to use that column-number data to position prettyprinted text where in the same column when possible, and what to do if the so-far-prettyprinted line is filled past that column.
All of this "extra" information is collected by a good reenginering parser. Conventional parsers usually don't collect any of it, which makes printing acceptable ASTs difficult. DMS provides specific support for capturing format information at lexing-time, and inspecting format information at terminal prettyprinting time. No other program generation tool we are familiar with does this.

A more principled approach distinguishes prettyprinting whose purpose is nice formatting, from fidelity printing whose purpose is to regenerate the text to match the original source to a maximal extent. It should be clear that at the level of the terminals, you pretty much want fidelity printing. Depending on your purpose, you can pretty print with nice formatting, or fidelity printing. The strategy DMS uses as default is to do fidelity printing where the AST hasn't been changed, and prettyprinting where it has (because often the change machinery doesn't have any information about column numbers or number radixes, etc.). The transformations stamp the AST nodes that are newly generated as "no fidelity data present".

Printing Code Blocks as Boxes

An organized approach to prettyprinting nicely is to understand that virtually all text-based programming language are rendered nicely in terms of rectangular blocks of text. (Knuth's TeX document generator has this idea, too). If you have some set of text boxes representing pieces of the regenerated code (e.g., primitive boxes generated directly for the terminal tokens), you can then imagine operators for composing those boxes: Horizontal composition (stack one box to the right of another), Vertical (stack boxes on top of each other; this in effect replaces printing newlines), Indent (Horizontal composition with a box of blanks), etc. Then you can construct your prettyprinter by building and composing text boxes:


 PrettyPrintBlock:
     Box1=PrimitiveBox("{");
     Box2=PrimitiveBox("}");
     ChildBox=PrettyPrint(Node.children[1]); // gets box for statements in block
     ResultBox=VerticalBox(Box1,Indent(3,ChildBox),Box2);
     return ResultBox;

PrettyPrintStatements:
     ResultBox=EmptyBox();
     do i=1,number_of_children
         ResultBox=
           VerticalBox(ResultBox,
                       HorizontalBox(PrettyPrint(Node.children[i])),PrimitiveBox(";"));
     enddo
     return;

The real value in this is any node can compose the text boxes produced by its children in arbitrary order with arbitrary intervening text. You can rearrange huge blocks of text this way (imagine VBox'ing the methods of class in method-name order). No text is spit out as encountered; only when the root is reached, or some AST node where it is known that all the children boxes have been generated correctly.

The DMS PrettyPrinter Box Language

Our DMS Software Reengineering Toolkit uses this approach to prettyprint all the languages it can parse (including PHP, Java, C#, etc.). Instead of attaching the box computations to AST nodes via visitors, we use box computations in a domain-specific text-box notation:

  • H(...) for Horizontal boxes (with H* meaning "applied to all RHS elements")
  • V(....) for vertical boxes (with V* meaning "applied to all RHS elements")
  • I(...) for indented boxes)
Such box computations are attached directly to the grammar rules, allowing us to succinctly express the grammar (parser) and the prettyprinter ("anti-parser") in one place. The prettyprinter box rules are compiled automatically by DMS into a visitor producing a text stream. The prettyprinter machinery has to be smart enough to understand how comments play into this, and that's frankly a bit arcane but you only have to do it once. An DMS example:

block = '{' statements '}' ; -- grammar rule to recognize block of statements
<<PrettyPrinter>>: { V('{',I(statements),'}'); };

You can see a bigger example of how this is done at Simple Language Example using DMS showing how grammar rules and prettyprinting rules are combined. The PHP Front looks like this but its a lot bigger, obviously. A more complex way to do prettyprinting is to build a syntax-directed translator (means, walk the tree and build text or other data structures in tree-visted order) to produce text-boxes in a special text-box AST. The text-box AST is then prettyprinted by another tree walk, but the actions for it are basically trivial: print the text boxes. See this technical paper: Pretty-printing for software reengineering

An additional point: you can of course go build all this machinery yourself. But the same reason that you choose to use a parser generator (its a lot of work to make one, and that work doesn't contribute to your goal in an interesting way) is the same reason you want to choose an off-the-shelf prettyprinter generator. There are lots of parser generators around. Not many prettyprinter generators. [DMS is one of the few that has both built in.]

A Grammar for Wirth's Oberon
with PrettyPrinter declarations

Here is the complete grammar for the Oberon language, covering the original design by Wirth and the 2007 dialect. The standard DMS approach is to builds parser and prettyprinter at the same time, weaving the parsing and prettying rules. The grammar show previously left these prettyprinting rules out.

Explanation of the prettyprinter aspects can be found below.


--   Oberon.atg: DMS Grammar Definition for Oberon
--   Copyright (C) 2014-2015 Semantic Designs, Inc.; All Rights Reserved
--

PRETTYPRINTER
{
    -- Do not put spaces between these sets of symbols while emitting the refactored code.
	separator { string, ident, integer, hexint, real, 'NIL' } none { '(', ')' };
    optional separator { '(', ')' } none { string, ident, integer, hexint, real, 'NIL' };
    separator * none { ',', '.', ';' };
    separator { '.' } none *;

#IF Oberon
    separator { charconst } none { '(', ')' };
    separator { '(', ')' } none { charconst };
#ELSIF Oberon07
    separator { 'TRUE', 'FALSE' } none { '(', ')' };
    separator { '(', ')' } none { 'TRUE', 'FALSE' };
#ELSE
    #FAIL
#ENDIF

    -- Hex integers require custom formatting.
    lexeme hexint function "FormatOberonHexNatural";
}

------------------------------------------------------------------------------------------
---(1)------------ Module
------------------------------------------------------------------------------------------

module  =  'MODULE' ident ';' DeclarationSequence 'END' ident '.' ;
    <<PrettyPrinter>>: { V(H('MODULE', ident, ';'), DeclarationSequence,
                                 H('END', ident, '.')); }
module  =  'MODULE' ident ';' DeclarationSequence ProgramBody ident '.' ;
    <<PrettyPrinter>>: { V(H('MODULE', ident, ';'), DeclarationSequence,
                                 H(ProgramBody, ident, '.')); }
module  =  'MODULE' ident ';' 'IMPORT' ImportList ';' DeclarationSequence 'END'
                                 ident '.' ;
    <<PrettyPrinter>>: { V(H('MODULE', ident, ';'), H('IMPORT', ImportList, ';'),
                                 DeclarationSequence, H('END', ident, '.')); }
module  =  'MODULE' ident ';' 'IMPORT' ImportList ';' DeclarationSequence ProgramBody
                                 ident '.' ;
    <<PrettyPrinter>>: { V(H('MODULE', ident, ';'), H('IMPORT', ImportList, ';'),
                                 DeclarationSequence, H(ProgramBody, ident, '.')); }

ProgramBody  =  'BEGIN' StatementSequence 'END' ;
    <<PrettyPrinter>>: { V(NL<1>, 'BEGIN', I(StatementSequence), 'END'); }

ImportList  =  import ;
ImportList  =  ImportList ',' import ;
    <<PrettyPrinter>>: { H*;  }

import  =  ident ;
import  =  ident ':=' ident ;
    <<PrettyPrinter>>: { H*;  }

#IF Oberon
    -- Original version of the language as it was introduced in 1986.

    DeclarationSequence  =  ConstTypeVarDeclSequence ProcedureDeclSequence;
        <<PrettyPrinter>>: { V*;  }

    ConstTypeVarDeclSequence  =  ;
    ConstTypeVarDeclSequence  =  ConstTypeVarDeclSequence 'CONST' ConstantDeclarationList ;
        <<PrettyPrinter>>: { V(ConstTypeVarDeclSequence, 'CONST', I(ConstantDeclarationList)); }
    ConstTypeVarDeclSequence  =  ConstTypeVarDeclSequence 'TYPE' TypeDeclarationList     ;
        <<PrettyPrinter>>: { V(ConstTypeVarDeclSequence, 'TYPE', I(TypeDeclarationList)); }
    ConstTypeVarDeclSequence  =  ConstTypeVarDeclSequence 'VAR' VariableDeclarationList ;
        <<PrettyPrinter>>: { V(ConstTypeVarDeclSequence, 'VAR', I(VariableDeclarationList)); }

    ProcedureDeclSequence  =  ;
    ProcedureDeclSequence  =  ProcedureDeclSequence ForwardDeclaration   ';' ;
        <<PrettyPrinter>>: { V(ProcedureDeclSequence, NL<1>, H(ForwardDeclaration, ';')); }
    ProcedureDeclSequence  =  ProcedureDeclSequence ProcedureDeclaration ';' ;
        <<PrettyPrinter>>: { V(ProcedureDeclSequence, NL<1>, H(ProcedureDeclaration, ';')); }

#ELSIF Oberon07
    -- Dialect of the original language. Revision 1.10.2013/10.3.2014.

    DeclarationSequence  =  ;
    DeclarationSequence  =  ConstTypeVarDeclaration ;
    DeclarationSequence  =  DeclarationSequence ProcedureDeclaration ';' ;
        <<PrettyPrinter>>: { V(DeclarationSequence, NL<1>, H(ProcedureDeclaration, ';')); }

    ConstTypeVarDeclaration  =  'VAR' VariableDeclarationList ;
        <<PrettyPrinter>>: { V('VAR', I(VariableDeclarationList)); }
    ConstTypeVarDeclaration  =  'TYPE' TypeDeclarationList ;
        <<PrettyPrinter>>: { V('TYPE', I(TypeDeclarationList)); }
    ConstTypeVarDeclaration  =  'TYPE' TypeDeclarationList 'VAR' VariableDeclarationList ;
        <<PrettyPrinter>>: { V('TYPE', I(TypeDeclarationList), 'VAR',
                                   I(VariableDeclarationList)); }
    ConstTypeVarDeclaration  =  'CONST' ConstantDeclarationList ;
        <<PrettyPrinter>>: { V('CONST', I(ConstantDeclarationList)); }
    ConstTypeVarDeclaration  =  'CONST' ConstantDeclarationList 'VAR' VariableDeclarationList ;
        <<PrettyPrinter>>: { V('CONST', I(ConstantDeclarationList), 'VAR',
                                   I(VariableDeclarationList)); }
    ConstTypeVarDeclaration  =  'CONST' ConstantDeclarationList 'TYPE'
                                   TypeDeclarationList ;
        <<PrettyPrinter>>: { V('CONST', I(ConstantDeclarationList), 'TYPE',
                                   I(TypeDeclarationList)); }
    ConstTypeVarDeclaration  =  'CONST' ConstantDeclarationList 'TYPE' TypeDeclarationList
                                  'VAR' VariableDeclarationList ;
        <<PrettyPrinter>>: { V('CONST', I(ConstantDeclarationList), 'TYPE',
                                  I(TypeDeclarationList), 'VAR', I(VariableDeclarationList)); }

#ELSE
    #FAIL
#ENDIF

------------------------------------------------------------------------------------------
---(2)------------ Declarations
------------------------------------------------------------------------------------------

ConstantDeclarationList  =  ;
ConstantDeclarationList  =  ConstantDeclarationList IdentDef
                                         '=' ConstantExpression ';' ;
    <<PrettyPrinter>>:  { V(ConstantDeclarationList, H(IdentDef,
                                         '=', ConstantExpression, ';')); }

#IF Oberon
    TypeDeclarationList  =  ;
    TypeDeclarationList  =  TypeDeclarationList IdentDef '=' Type ';' ;
        <<PrettyPrinter>>:  { V(TypeDeclarationList, H(IdentDef, '=',
                                         Type, ';')); }
#ELSIF Oberon07
    TypeDeclarationList  =  ;
    TypeDeclarationList  =  TypeDeclarationList IdentDef '=' StructType ';' ;
        <<PrettyPrinter>>:  { V(TypeDeclarationList, H(IdentDef, '=',
                                         StructType, ';')); }
#ELSE
    #FAIL
#ENDIF

Type  =  QualIdent  ;
Type  =  StructType ;

StructType  =  ArrayType     ;
StructType  =  RecordType    ;
StructType  =  PointerType   ;
StructType  =  ProcedureType ;

ArrayType  =  'ARRAY' LengthList 'OF' Type ;
    <<PrettyPrinter>>:  { H*; }

LengthList  =  ConstantExpression ;
LengthList  =  LengthList ',' ConstantExpression ;
    <<PrettyPrinter>>:  { H*; }

RecordType  =  'RECORD' 'END' ;
    <<PrettyPrinter>>:  { H*; }
RecordType  =  'RECORD' FieldListSequence 'END' ;
    <<PrettyPrinter>>:  { V('RECORD', I(FieldListSequence), 'END'); }
RecordType  =  'RECORD' '(' BaseType ')' 'END' ;
    <<PrettyPrinter>>:  { H('RECORD', SP<1>, '(', BaseType, ')', 'END'); }
RecordType  =  'RECORD' '(' BaseType ')' FieldListSequence 'END' ;
    <<PrettyPrinter>>:  { V(H('RECORD', SP<1>, '(', BaseType, ')'),
                                 I(FieldListSequence), 'END'); }

BaseType  =  QualIdent ;

FieldListSequence  =  FieldList ;
FieldListSequence  =  FieldListSequence ';' FieldList ;
    <<PrettyPrinter>>:  { V(H(FieldListSequence, ';'), FieldList); }

FieldList  =  FieldNameList ':' Type ;
    <<PrettyPrinter>>:  { H*; }

FieldNameList  =  IdentDef ;
FieldNameList  =  FieldNameList ',' IdentDef ;
    <<PrettyPrinter>>:  { H*; }

PointerType  =  'POINTER' 'TO' Type ;
    <<PrettyPrinter>>:  { H*; }

ProcedureType  =  'PROCEDURE' ;
ProcedureType  =  'PROCEDURE' FormalParameters ;
    <<PrettyPrinter>>:  { H*; }

VariableDeclarationList  =  ;
VariableDeclarationList  =  VariableDeclarationList VariableDefList ':' Type ';' ;
    <<PrettyPrinter>>:  { V(VariableDeclarationList, H(VariableDefList, ':',
                                     Type, ';')); }

VariableDefList  =  IdentDef ;
VariableDefList  =  VariableDefList ',' IdentDef ;
    <<PrettyPrinter>>:  { H*; }

QualIdent  =  ident ;
QualIdent  =  ident '.' ident ;
    <<PrettyPrinter>>:  { H*; }

IdentDef  =  ident ;
IdentDef  =  ident '*' ;
    <<PrettyPrinter>>:  { H*; }

------------------------------------------------------------------------------------------
---(3)------------ Procedures
------------------------------------------------------------------------------------------

ProcedureDeclaration  =  ProcedureHeading ';' ProcedureBody ident ;
    <<PrettyPrinter>>: { V(H(ProcedureHeading, ';'), H(ProcedureBody, ident)); }

ProcedureHeading  =  'PROCEDURE' IdentDef ;
    <<PrettyPrinter>>:  { H*; }
ProcedureHeading  =  'PROCEDURE' IdentDef FormalParameters ;
    <<PrettyPrinter>>:  { H*; }

#IF Oberon
    ProcedureHeading  =  'PROCEDURE' '*' IdentDef ;
       <<PrettyPrinter>>:  { H*; }
    ProcedureHeading  =  'PROCEDURE' '*' IdentDef FormalParameters ;
       <<PrettyPrinter>>:  { H*; }
#ELSIF Oberon07
#ELSE
    #FAIL
#ENDIF

FormalParameters  =  '(' ')' ;
    <<PrettyPrinter>>:  { H*; }
FormalParameters  =  '(' ')' ':' QualIdent ;
    <<PrettyPrinter>>:  { H*; }
FormalParameters  =  '(' FormalParameterList ')' ;
    <<PrettyPrinter>>:  { H*; }
FormalParameters  =  '(' FormalParameterList ')' ':' QualIdent ;
    <<PrettyPrinter>>:  { H*; }

FormalParameterList  =  FormalParameterSection ;
FormalParameterList  =  FormalParameterList ';' FormalParameterSection ;
    <<PrettyPrinter>>:  { H*; }

FormalParameterSection  =  FormalParameterNameList ':' FormalParameterType ;
    <<PrettyPrinter>>:  { H*; }
FormalParameterSection  =  'VAR' FormalParameterNameList ':' FormalParameterType ;
    <<PrettyPrinter>>:  { H*; }

FormalParameterNameList  =  ident ;
FormalParameterNameList  =  FormalParameterNameList ',' ident ;
    <<PrettyPrinter>>:  { H*; }

FormalParameterType  =  QualIdent ;
FormalParameterType  =  ArrayOfSequence QualIdent ;
    <<PrettyPrinter>>:  { H*; }

#IF Oberon
    FormalParameterType  =  ProcedureType ;
    FormalParameterType  =  ArrayOfSequence ProcedureType ;
        <<PrettyPrinter>>:  { H*; }
#ELSIF Oberon07
#ELSE
    #FAIL
#ENDIF

ArrayOfSequence  =  'ARRAY' 'OF' ;
    <<PrettyPrinter>>:  { H*; }
ArrayOfSequence  =  ArrayOfSequence 'ARRAY' 'OF' ;
    <<PrettyPrinter>>:  { H*; }

ProcedureBody  =  DeclarationSequence 'END' ;
    <<PrettyPrinter>>: { V(DeclarationSequence, 'END'); }
ProcedureBody  =  DeclarationSequence 'BEGIN' StatementSequence 'END' ;
    <<PrettyPrinter>>: { V(DeclarationSequence, 'BEGIN', I(StatementSequence), 'END'); }

#IF Oberon
    ForwardDeclaration  =  'PROCEDURE' '^' ident ;
        <<PrettyPrinter>>:  { H*; }
    ForwardDeclaration  =  'PROCEDURE' '^' ident FormalParameters ;
        <<PrettyPrinter>>:  { H*; }
    ForwardDeclaration  =  'PROCEDURE' '^' ident '*' ;
        <<PrettyPrinter>>:  { H*; }
    ForwardDeclaration  =  'PROCEDURE' '^' ident '*' FormalParameters ;
        <<PrettyPrinter>>:  { H*; }
#ELSIF Oberon07
    ProcedureBody  =  DeclarationSequence 'RETURN' Expression 'END' ;
        <<PrettyPrinter>>: { V(DeclarationSequence, H('RETURN', Expression), 'END'); }
    ProcedureBody  =  DeclarationSequence 'BEGIN' StatementSequence 'RETURN'
                                     Expression 'END' ;
        <<PrettyPrinter>>: { V(DeclarationSequence, 'BEGIN', I(StatementSequence),
                                     H('RETURN', Expression), 'END'); }
#ELSE
    #FAIL
#ENDIF

------------------------------------------------------------------------------------------
---(4)------------ Statements
------------------------------------------------------------------------------------------

StatementSequence  =  Statement ;
StatementSequence  =  StatementSequence ';' Statement ;
    <<PrettyPrinter>>:  { V(H(StatementSequence, ';'), Statement); }

Statement  =  ;
Statement  =  Assignment      ;
Statement  =  ProcedureCall   ;
Statement  =  IfStatement     ;
Statement  =  CaseStatement   ;
Statement  =  WhileStatement  ;
Statement  =  RepeatStatement ;

#IF Oberon
	Statement  =  LoopStatement ;
	Statement  =  WithStatement ;
	Statement  =  'EXIT'   ;
	Statement  =  'RETURN' ;
	Statement  =  'RETURN' Expression ;
	   <<PrettyPrinter>>:  { H*; }
#ELSIF Oberon07
	Statement  =  ForStatement ;
#ELSE
    #FAIL
#ENDIF

Assignment  =  designator ':=' Expression ;
    <<PrettyPrinter>>:  { H*; }

ProcedureCall  =  designator ;
ProcedureCall  =  designator ActualParameters ;
    <<PrettyPrinter>>:  { H*; }

IfStatement  =  'IF' Expression 'THEN' StatementSequence 'END' ;
    <<PrettyPrinter>>: { V(H('IF', Expression, 'THEN'), I(StatementSequence), 'END'); }
IfStatement  =  'IF' Expression 'THEN' StatementSequence 'ELSE' StatementSequence 'END' ;
    <<PrettyPrinter>>: { V(H('IF', Expression, 'THEN'), I(StatementSequence), 'ELSE',
                                     I(StatementSequence), 'END'); }
IfStatement  =  'IF' Expression 'THEN' StatementSequence ElseIfSequence 'END' ;
    <<PrettyPrinter>>: { V(H('IF', Expression, 'THEN'), I(StatementSequence),
                                     ElseIfSequence, 'END'); }
IfStatement  =  'IF' Expression 'THEN' StatementSequence ElseIfSequence 'ELSE'
                                     StatementSequence 'END' ;
    <<PrettyPrinter>>: { V(H('IF', Expression, 'THEN'), I(StatementSequence), ElseIfSequence,
                                     'ELSE', I(StatementSequence), 'END'); }

ElseIfSequence  =  'ELSIF' Expression 'THEN' StatementSequence ;
    <<PrettyPrinter>>: { V(H('ELSIF', Expression, 'THEN'), I(StatementSequence)); }
ElseIfSequence  =  ElseIfSequence 'ELSIF' Expression 'THEN' StatementSequence ;
    <<PrettyPrinter>>: { V(ElseIfSequence, H('ELSIF', Expression, 'THEN'),
                                     I(StatementSequence)); }

CaseStatement  =  'CASE' Expression 'OF' CaseSequence 'END' ;
    <<PrettyPrinter>>:  { V(H('CASE', Expression, 'OF'), I(CaseSequence), 'END'); }

#IF Oberon
	CaseStatement  =  'CASE' Expression 'OF' CaseSequence 'ELSE' StatementSequence 'END' ;
        <<PrettyPrinter>>:  { V(H('CASE', Expression, 'OF'), I(CaseSequence), 'ELSE',
                                     I(StatementSequence), 'END'); }
#ELSIF Oberon07
#ELSE
    #FAIL
#ENDIF

CaseSequence  =  CaseCase ;
CaseSequence  =  CaseSequence '|' CaseCase ;
    <<PrettyPrinter>>:  { V(CaseSequence, H('|', CaseCase)); }

CaseCase  =  ;
CaseCase  =  CaseLabelList ':' StatementSequence ;
    <<PrettyPrinter>>:  { V(H(CaseLabelList, ':'), I(StatementSequence)); }

CaseLabelList  =  CaseLabels ;
CaseLabelList  =  CaseLabels ',' CaseLabels ;
    <<PrettyPrinter>>:  { H*; }

CaseLabels  =  label ;
CaseLabels  =  label '..' label ;
    <<PrettyPrinter>>:  { H*; }

#IF Oberon
	label  =  ConstantExpression ;
#ELSIF Oberon07
	label  =  integer ;
    label  =  hexint  ;
	label  =  string  ;
	label  =  ident   ;
#ELSE
    #FAIL
#ENDIF

WhileStatement  =  'WHILE' Expression 'DO' StatementSequence 'END' ;
    <<PrettyPrinter>>:  { V(H('WHILE', Expression, 'DO'), I(StatementSequence), 'END'); }

#IF Oberon
#ELSIF Oberon07
	WhileStatement  =  'WHILE' Expression 'DO' StatementSequence ElseIfWhileSequence 'END' ;
        <<PrettyPrinter>>:  { V(H('WHILE', Expression, 'DO'), I(StatementSequence),
                                 ElseIfWhileSequence, 'END'); }
	ElseIfWhileSequence  =  'ELSIF' Expression 'DO' StatementSequence ;
        <<PrettyPrinter>>:  { V(H('ELSIF', Expression, 'DO'), I(StatementSequence)); }
	ElseIfWhileSequence  =  ElseIfWhileSequence 'ELSIF' Expression 'DO' StatementSequence ;
        <<PrettyPrinter>>:  { V(ElseIfWhileSequence, H('ELSIF', Expression, 'DO'),
                                 I(StatementSequence)); }
#ELSE
    #FAIL
#ENDIF

RepeatStatement  =  'REPEAT' StatementSequence 'UNTIL' Expression ;
    <<PrettyPrinter>>:  { V('REPEAT', I(StatementSequence), H('UNTIL', Expression)); }

#IF Oberon
	LoopStatement  =  'LOOP' StatementSequence 'END' ; 
        <<PrettyPrinter>>:  { V('LOOP', I(StatementSequence), 'END'); }
	WithStatement  =  'WITH' QualIdent ':' QualIdent 'DO' StatementSequence 'END' ;
        <<PrettyPrinter>>:  { V(H('WITH', QualIdent, ':', QualIdent, 'DO'),
                                 I(StatementSequence), 'END'); }
#ELSIF Oberon07
	ForStatement   =  'FOR' ident ':=' Expression 'TO' Expression
                                 'DO' StatementSequence 'END' ;
        <<PrettyPrinter>>:  { V(H('FOR', ident, ':=', Expression, 'TO', Expression, 'DO'),
                                 I(StatementSequence), 'END'); }
	ForStatement   =  'FOR' ident ':=' Expression 'TO' Expression 'BY' ConstantExpression
                                 'DO' StatementSequence 'END' ;
        <<PrettyPrinter>>:  { V(H('FOR', ident, ':=', Expression, 'TO', Expression, 'BY',
                                 ConstantExpression, 'DO'), I(StatementSequence), 'END'); }
#ELSE
    #FAIL
#ENDIF

------------------------------------------------------------------------------------------
---(5)------------ Expressions
------------------------------------------------------------------------------------------

Expression  =  SimpleExpression ;
Expression  =  Expression RelationOperator SimpleExpression ;
    <<PrettyPrinter>>:  { H*; }

ConstantExpression  =  Expression ;

RelationOperator  =  '='  ;
RelationOperator  =  '#'  ;
RelationOperator  =  '<'  ;
RelationOperator  =  '<=' ;
RelationOperator  =  '>'  ;
RelationOperator  =  '>=' ;
RelationOperator  =  'IN' ;
RelationOperator  =  'IS' ;

SimpleExpression  =  term ;
SimpleExpression  =  '+' term ;
    <<PrettyPrinter>>:  { H*; }
SimpleExpression  =  '-' term ;
    <<PrettyPrinter>>:  { H*; }
SimpleExpression  =  SimpleExpression AddOperator term ;
    <<PrettyPrinter>>:  { H*; }

AddOperator  =  '+'  ;
AddOperator  =  '-'  ;
AddOperator  =  'OR' ;

term  =  factor ;
term  =  term MulOperator factor ;
    <<PrettyPrinter>>:  { H*; }

MulOperator  =  '*'   ;
MulOperator  =  '/'   ;
MulOperator  =  'MOD' ;
MulOperator  =  'DIV' ;
MulOperator  =  '&'   ;

factor  =  number ;
factor  =  string ;
factor  =  'NIL'  ;

#IF Oberon
    factor  =  charconst ;
#ELSIF Oberon07
    factor  =  'TRUE'  ;
    factor  =  'FALSE' ;
#ELSE
    #FAIL
#ENDIF

factor  =  Set ;
factor  =  designator ;
factor  =  designator ActualParameters ;
    <<PrettyPrinter>>:  { H*; }
factor  =  '(' Expression ')' ;
    <<PrettyPrinter>>:  { H*; }
factor  =  '~' factor ;
    <<PrettyPrinter>>:  { H*; }

number  =  integer ;
number  =  hexint ;
number  =  real ;

Set  =  '{' '}' ;
    <<PrettyPrinter>>:  { H*; }
Set  =  '{' SetElementList '}' ;
    <<PrettyPrinter>>:  { H*; }

SetElementList  = SetElement ;
SetElementList  = SetElementList ',' SetElement ;
    <<PrettyPrinter>>:  { H*; }

SetElement  =  Expression ;
SetElement  =  Expression '..' Expression ;
    <<PrettyPrinter>>:  { H*; }

designator  =  QualIdent ;
designator  =  designator '.' ident ;
    <<PrettyPrinter>>:  { H*; }
designator  =  designator '[' IndexList ']' ;
    <<PrettyPrinter>>:  { H*; }
designator  =  designator '(' QualIdent ')' ;
    <<PrettyPrinter>>:  { H*; }
designator  =  designator '^' ;
    <<PrettyPrinter>>:  { H*; }

IndexList  =  Expression ;
IndexList  =  IndexList ',' Expression ;
    <<PrettyPrinter>>:  { H*; }

ActualParameters  =  '(' ')' ;
    <<PrettyPrinter>>:  { H*; }
ActualParameters  =  '(' ExpressionList ')' ;
    <<PrettyPrinter>>:  { H*; }

ExpressionList  =  Expression ;
ExpressionList  =  ExpressionList ',' Expression ;
    <<PrettyPrinter>>:  { H*; }

Description of the Oberon PrettyPrinting Grammar

The grammar rules here are identical to those bare DMS grammar for Oberon. For each grammar rule, a prettyprinting rule <<PrettyPrinter:>> { ... } has been added where needed (unit productions don't generally need rules; they print the same box as their right hand side).

The PRETTYPRINTER declaration at the top of the grammar file tells the prettyprinter whether to generate whitespace between various terminals of the grammar. (Note that grammar preprocessor conditionals can apply to prettyprinter declarations.)

It is instructive to examine the prettyprinting rule for while in detail. This rule combines the while and do keyword with the condition in one line, indents the body of the loop, and vertially alignes the while clause with the end keyword.

PrettyPrinting Terminals

A lexer converts characters into tokens. A prettyprinter, in converting trees back into text, must convert leaf nodes into text. Non-value carrying leaves (produced from #token 'foo' ... have a text equivalent of foo; the DMS PrettyPrinting engineautomatically handles such cases as a default.

Where the spelling of the token might be different than its quoted name, or where the token carries some kind of (binary) literal value, the corresponding string must be regenerated. In the same way that DMS Lexers use custom code to convert recognized text to a binary value, DMS PrettyPrinters use custom code convert binary values carried on leaves back to text. In general, one may write an arbitrary procedure. DMS provides, through a prettyprinter library, standard procedures to convert standard binary data types with format information collected by the lexer, to produce a nicely formatted string.

The DMS lexer for Oberon collects hexadecimal numbers, and converts them to binary. The custom code for the Oberon prettyprinter converts the binary number back to string using this custom code:


(define HexadecimalToString
   (lambda Literal-prettyprinter-function-type
      (let (;; (= [literal Graph:HGHandling:NaturalAndFormat]
                  (Graph:HGHandling:GetNaturalAndFormat token))
               (= [format LiteralFormat:NaturalLiteralFormat]
                  (LiteralFormat:ExpandCompactToNaturalLiteralFormat literal:Format))
               (= [result string] `')
           );;
           (value (;; (= result
                        (PrettyPrinterFormattingSupport:FormatNaturalToString (. format)
                        literal:Value
                        PrettyPrinterFormattingSupport:NotACharacter
                        PrettyPrinterFormattingSupport:NotACharacter))=
		      (ifthen (== format:Radix 16)
			  (;; (ifthen (~ (&& (>= (ordinal result:1) "0")
                                             (<= (ordinal result:1) "9")))
				      (= result (concatenate `0') result)
			      )ifthen
			      (return (append result "H"))
                          );;
                      )ifthen
                  );;
                  result
              )case
           )value
      )let
   )lambda
)define

Example: PrettyPrinting Oberon Source

DMS can parse the following badly formatted Oberon source file, using the Oberon DMS Parser, and produce an internal AST.

    MODULE
Buffer; CONST N
    = 100; VAR nonempty*, nonfull*
  : BOOLEAN; in, out, n: INTEGER; buf: ARRAY N OF INTEGER; PROCEDURE Put(x
: INTEGER);  BEGIN (* Procedure Body *) IF n < N THEN
      buf[in] := x; in := (in+1) MOD N;   INC(n); nonfull := n < N; nonempty := TRUE
    END  END put;  (* End of
               the procedure *) PROCEDURE Get*(VAR
         x: INTEGER); BEGIN  IF n > 0 THEN x := buf[out]; out :=
 (out+1) MOD N;  DEC(n); nonempty := n > 0;
      nonfull := TRUE  END END
  get; BEGIN
  n := 0; in := 0; out := 0; nonempty := FALSE; nonfull := TRUE END Buffer.

DMS can prettyprint that AST using the Oberon PrettyPrinter to produce the following legal Oberon output: This is essentially the same case as prettyprinting an AST that has been generated or transformed ASTs, and contains no useful positioning information.


MODULE Buffer;
CONST
  N = 100;
VAR
  nonempty *, nonfull * : BOOLEAN;
  in, out, n : INTEGER;
  buf : ARRAY N OF INTEGER;

PROCEDURE Put(x : INTEGER);
BEGIN
  (* Procedure Body *)
  IF n < N THEN
    buf [ in ] := x;
    in := (in + 1) MOD N;
    INC(n);
    nonfull := n < N;
    nonempty := TRUE
  END
END put;

    (* End of
the procedure *)
PROCEDURE Get * ( VAR x : INTEGER);
BEGIN
  IF n > 0 THEN
    x := buf [ out ];
    out := (out + 1) MOD N;
    DEC(n);
    nonempty := n > 0;
    nonfull := TRUE
  END
END get;

BEGIN
  n := 0;
  in := 0;
  out := 0;
  nonempty := FALSE;
  nonfull := TRUE
END Buffer.

Next: DMS Attribute Grammars

Note: text in this section is an enhanced version of that originally published by Semantic Designs/Ira Baxter at StackOverflow. SD claim copyrights, and thus the right to use it here with arbitrary changes of our choice.

For more information: [email protected]    Follow us at Twitter: @SemanticDesigns

DMS
Parsers