DMS Lexer Specifications

This page shows how DMS specifies language lexical syntax, and gets a full, efficient lexer for arbitrary character encodings, producing a stream of lexemes, for that language, with not much effort. This is just part of what is needed to describe a language to DMS.

While one can build software manipulation tools using just lexemes, normally the lexeme stream is fed to a DMS-based parser defined by a grammar over the lexemes.

We use Nicholas Wirth's Oberon language as an example, as it is a real, practical language yet simple enough so the entire lexer can be easily displayed and understood here. More complex languages obviously have more complex lexers, and some real languages such as C++ have remarkably complex lexers, even with DMS.

Character Encodings

DMS operates on streams of Unicode characters. File open/read logic for DMS handles implicit conversion from various character encodings (7-Bit ASCII; ISO-8859-1; UTF-8; Shift-JIS; EBCDIC, etc.) to Unicode so that lexical definitions are purely in terms of Unicode characters. This easily handles legacy languages, as well as modern languages with national character sets quite easily. The specification of the character encoding of a file is done outside the lexer so is not addressed further, here. (See a Character Encodings for what DMS can presently handle out of the box.)

DMS Token Syntax

DMS defines tokens (or lexemes) which make up the elements of a desired input language. Tokens are defined by

  • a token name,
  • a regular expression over Unicode that determines the allowable sequence of characters for the token,
  • an internal binary representation for the token value (if any), enabling efficient processing of values
  • a conversion procedure to generate that internal representation (if any)

Token definitions are provided in the format:

   #token token_name [internal_representation] "regular_expression"
      << conversion_code >>

DMS regular expressions are fairly traditional but also allow macro substititions; there is an extensive library of macros defining various useful subsets of Unicode.

By convention, DMS domain design methodology strongly suggests using token names directly from any available standards document for the target language. We have done that for Oberon (e.g., ident).

Literal tokens, those that have essentially only one spelling, are by convention named 'token_spelling', e.g., '+', '(' and even 'then'. This is both easy to read, and gives the prettyprinter the default needed to print the token at a later time. Really simple tokens are written giving the token name and a regexp specifying the shape of the token:


#token 'and'   "and"
#token '+'     "\+"
#token circularintegral "\u2A15" --  ⨕  circular integral character

If a token has variant spellings which are important, they can be captured as (type-declared) value attached to the token. To allow DMS to be efficient at scale, these values are represented as native computer binary values, and can be BOOLEAN, CHARACTER (any 16 bit Unicode "code point"), INTEGER or NATURAL (up to 32 bits), RATIONAL (with infinite precision numerators and denomoninators) or STRING. While a regexp is used to detect the token, custom PARLANSE code is used to convert the recognized text into the corresponding binary value. Collecting a token value also collects its format ("shape of the literal"), enabling the token string to be reconstructed from the the binary value later. The process of scanning the recognized string and collecting the format is generally pretty standard; DMS offers a variety of binary conversion routines to enable this to be specified easily in most cases:


#token identifier [STRING] " (#[0-9])*"
   <<
    (ConvertTokenStringToString (. ?:Lexeme:Literal:String:Value) ? 0 0)
    (= ?:Lexeme:Literal:String:Format 0)
  >>

Unlike traditional lexers, DMS retains comments; this is necessary if one is to transform source code and avoid offending its original author(s) or owners. Special tokens distinguish whitespace (skip, preskip and postskip) and comments (precomment and postcomment). The lexer handles the recognition and capture of these largely automatically, once specified. These make it easy for the domain engineer to specify what is completely uninteresting whitespace, and what are comments, for capture and processing by later DMS stages such as parsing.

Note how we defined ident using a DMS library-provided definition to match our desired text model. Such tokens need code to convert the characters comprising the instance token into machine-native data structures such as binary numbers, float values, and/or actual character strings. The conversion code must also capture lexical formatting data (leading zero count on decimal numbers, number radix) that will be needed to regenerate text for that lexeme by the prettyprinter. The conversion code is written in DMS's PARLANSE programming language. Note that a library of predefined conversions is defined by DMS, that covers most interesting cases.

Regular expressions are used to define valid Unicode character sequence for tokens, and whitespace, pre- and post- comments. They include character specification by writing the character directly; the entire Unicode characterset is mostly directly available this way. Some characters (the regexp operators following) need to be escaped with \ to be specified as literals.

One can also specify character sets in [...] or their negation ~[...]. A common idiom is to write [aA] to match a letter regardless of capitialization. Character sets can be composed by conjunction (&) or disjunction (#). Lexical definition expressions can be compose by the usual set of regexp operators: sequence (via juxtaposition), intersection &&, optionality ?, alternation |(...) operators found in traditional lexers.

Unusually a ~~ operator provides is a "regular expression" complement operator: ~~X matches any string, that X would not match. This turns out to be remarkably useful especially in conjunction with intersection &&. One can write a regexp that picks up too much, and "subtract away" the inconvenient part. One can define identifiers that don't contain the substring "hello" quite easily.

Lexical Syntax for Wirth's Oberon

This section contains the untouched definition of Oberon lexical syntax for DMS. There is some discussion of details following this definition.


--   Oberon.lex: Lexical Specification for Oberon.
--   Copyright (C) 2014-2015 Semantic Designs; All Rights Reserved
--   Oberon programming language.
--   Started in October 2014 by KK.
--

%%

#macro  letter    "[\_A-Za-z]"
#macro  digit     "[0-9]"
#macro  hexDigit  "<digit>|[A-Fa-f]"

%% Main -- The name of the lexer mode is Main.

-- Skip this type of white space in front of any lexeme.
#skip  "[\s\t\r\n]+"

-- Scan the (* ... *) style comments and associate them with the next lexeme.
#precomment "\(\* (~~(~[]* \*\) ~[]*)) \*\)"
#preskip    "[\s\t\r\n]+"

#IF Oberon
    -- This is the original version of the language.
    #token CharConst  [CHARACTER]    "(\"~[\"\r\n]\")|(<digit>(<hexDigit>*)X)"
       <<
	  (= ?:Lexeme:Literal:Character:Value 
             (ConvertTokenStringToCharacter ? 1))
	  (= ?:Lexeme:Literal:Character:Format 0)
       >>
    #token string     [STRING]    "(\"~[\"\r\n]*\")"
       <<
	  (ConvertTokenStringToString 
            (. ?:Lexeme:Literal:String:Value) ? 0 0)
	  (= ?:Lexeme:Literal:String:Format 0)
       >>
#ELSIF Oberon07
    -- Obreron07 dialect.
    #token string     [STRING]    "(\"~[\"\r\n]*\")|(<digit>(<hexDigit>*)X)"
       <<
	  (ConvertTokenStringToString
              (. ?:Lexeme:Literal:String:Value) ? 0 0)
	  (= ?:Lexeme:Literal:String:Format 0)
       >>
#ELSE
    #FAIL
#ENDIF

#token  '.'     "\."
#token  ':'     "\:"
#token  ','     "\,"
#token  ';'     "\;"
#token  '('     "\("
#token  ')'     "\)"
#token  '['     "\["
#token  ']'     "\]"
#token  '{'     "\{"
#token  '}'     "\}"

#token  '~'     "\~"

#token  '*'     "\*"
#token  '/'     "\/"
#token  '&'     "\&"
#token  '+'     "\+"
#token  '-'     "\-"

#token  '='     "\="
#token  '#'     "\#"
#token  '<'     "\<"
#token  '>'     "\>"
#token  '<='    "\<\="
#token  '>='    "\>\="

#token  '^'     "\^"
#token  '|'     "\|"
#token  '..'    "\.\."
#token  ':='    "\:\="

#token  'ARRAY'     "ARRAY"
#token  'BEGIN'     "BEGIN"
#token  'CASE'      "CASE"
#token  'CONST'     "CONST"
#token  'DIV'       "DIV"
#token  'DO'        "DO"
#token  'ELSE'      "ELSE"
#token  'ELSIF'     "ELSIF"
#token  'END'       "END"
#token  'IF'        "IF"

#token  'IMPORT'    "IMPORT"
#token  'IN'        "IN"
#token  'IS'        "IS"
#token  'MOD'       "MOD"
#token  'MODULE'    "MODULE"
#token  'NIL'       "NIL"
#token  'OF'        "OF"
#token  'OR'        "OR"
#token  'POINTER'   "POINTER"
#token  'PROCEDURE' "PROCEDURE"
#token  'RECORD'    "RECORD"
#token  'REPEAT'    "REPEAT"
#token  'RETURN'    "RETURN"

#token  'THEN'      "THEN"
#token  'TO'        "TO"
#token  'TYPE'      "TYPE"
#token  'UNTIL'     "UNTIL"
#token  'VAR'       "VAR"
#token  'WHILE'     "WHILE"

#IF Oberon
	-- This is the original version of the language.
	#token  'EXIT'     "EXIT"
	#token  'LOOP'     "LOOP"
	#token  'WITH'     "WITH"
#ELSIF Oberon07
	-- Obreron07 dialect.
	#token  'BY'       "BY"
	#token  'FALSE'    "FALSE"
	#token  'FOR'      "FOR"
	#token  'TRUE'     "TRUE"
#ELSE
    #FAIL
#ENDIF

-- These definitions should go after definitions of the keywords.
#token ident   [STRING]    "<letter>(<letter>|<digit>)*"
   <<
      (ConvertTokenStringToString (. ?:Lexeme:Literal:String:Value) ? 0 0)
      (= ?:Lexeme:Literal:String:Format 0)
   >>
#token integer [INTEGER]   "<digit>+"
   <<
      (local
	 [format LiteralFormat:IntegerLiteralFormat]
	 (;; 
	     (= ?:Lexeme:Literal:Integer:Value
                (ConvertDecimalTokenStringToInteger (. format) ? 0 0))
	     (= ?:Lexeme:Literal:Integer:Format
                (LiteralFormat:MakeCompactIntegerLiteralFormat format))
	 );;
      )local
   >>
#token integer [INTEGER]   "<digit>(<hexDigit>*)H"
   <<
      (local
	 [format LiteralFormat:IntegerLiteralFormat]
	 (;; 
	     (= ?:Lexeme:Literal:Integer:Value
                (ConvertHexadecimalTokenStringToInteger (. format) ? 0 1))
	     (= ?:Lexeme:Literal:Integer:Format
                (LiteralFormat:MakeCompactIntegerLiteralFormat format))
	 );;
      )local
   >>
#token real    [FLOAT]     "<digit>+\.<digit>*{(E|D){\+|\-}<digit>+}"
   <<
      (local
	 [format LiteralFormat:FloatLiteralFormat]
	 (;;
	     (= ?:Lexeme:Literal:Float:Value
                (ConvertDecimalTokenStringToFloat (. format) ? 0 0))
	     (= ?:Lexeme:Literal:Float:Format 
                (LiteralFormat:MakeCompactFloatLiteralFormat format))
	 );;
      )local
   >>
%%
%%

Some additional details

The following discusses the various parts of the lexical description in the order they are provided in the Oberon lexical specification.

  • Comment lines start with --.
  • %% by itself marks the global declarations part of the lexical description. Any macros or other defintions provided in the global declarations are visible in other lexical modes.
  • #macro name defines a regular expression macro. The macro body may be used by writing ";ltname" in other token or macro definitions. Defining macros for common character classes such as letter and digii is typical of a DMS lexical specification.
  • %%name (e.g, %% Main) introduces a named mode lexical mode, in which one set of tokens is defined. A DMS Lexer may have multiple modes to handle different parts of the language. Modern languages such as Java (and Oberon) need only one mode, but other "modern" languages such as C++ need a variety of modes to handle issues such as preprocessor scanning, user-defined literals, etc. Mode names are used to transition from one mode to another in an action; a sophisticated use of mode names occurs when code DMS rewrite rules, where it is sometimes necessary to start a parser "in the middle" of a program and thus in some odd lexical mode. In the absence of any other indication, the first mode encountered in a lexical specification is assumed to be the mode in which the lexer starts when provided a source file.
  • #skip provides a definition of whitespace for a mode. Any character sequence recognized is simply discarded.
  • #precomment defines what comments look like, if they are allowed to precede a token. DMS insists that comments belong to some token, either as pre- or post- (token) comments. There may be many of each on a single token. Oberon allows only (*...*) style comments, but they can be anywhere, including before a token.
  • #preskip defines what whitespace is allowed between precomments.
  • #postskip defines what whitespace is allowed between postcomments.
  • #postcomment defines what comments look like following (trailing) a token. We have chosen to allow exactly 1 Oberon-style comment as a trailing comment. This works pretty well in practice.
  • #IF, #ELSE, #ENDIF are compile-time directives that enable or disable parts of the lexical specification. For DMS, thiese are used to control the aspects of a lexer that vary between dialects of a domain. In our Oberon example, we conditionally select either the original version of Oberon, or the 2007 revision (Oberon07). #FAIL causes the lexer generator to object if some other dialect is specified..

Given a sequence of token definitions, the constructed lexer will claim a token match for the longest character sequence matched by any regexp used by those token definitions. If two token definitions have identical longest matches, the definition which occurs earlier in lexical specification is selected. It is common to define the same token multiple times, each with different regexps, to define variant spellings requiring different conversions (e.g., integer token declarations above)

Additional lexical capabilities

A simple example is not sufficient to show the full power of DMS lexers. Here we breifly sketch additional capabilities.

  • #ifnotoken enables checks for character sequences not defined by any regexp.
  • #ifendstream enables checks for end of stream or end of file
  • Token string suffix rejection: Custom PARLANSE action code can reject any suffix of a character string recognized by a token regexp. This allows the lexer to "look ahead" ahead to make decisions.
  • Token string extension: Custom PARLANSE action code can extend the characters collected by a regexp match.
  • Token redefinition: Custom PARLANSE code can decide that the declared token is not the one seen, and issue a different token instead
  • Parser state queries: Custom PARLANSE code may inquire what tokens a parser might desire next, and select a token accordingingly. This allows easy implementation of keywords-in-context: one defines an identifier token, checks the parser status, and if the parser wants a keyword, uses token redefinition to emit the keyword instead.
  • Multiple token generation: Sometimes it is simply unclear locally which token should be chosen. In this case, the lexer may emit several different tokens, all interpreted as having identical places in the input stream. DMS's GLR parser can then try all the alternatives and choose the one that works best. This makes it easy to handle local ambiguity at the lexical level.
  • Stacked Input Streams: Many languages have the concept of include files. DMS Lexers can stack a new input stream, process its content, and have the old restored automatically when the new stream reaches EOF.
  • Mode transitions: (gotoname ?), (gosubname ?), (goback ?) are special PARLANSE actions that enable transfers between lexical modes. The lexer modes themselves form a push-down automaton. This enables one to lex a nested construct such as a regexp or an expression found inside a string literal.

It is perhaps unsurprising that the DMS Lexical description is parsed by DMS, using a lexical description of itself. DMS is remarkably meta/recursive in its construction.

Oberon lexing example

We show a small Oberon program and the lexemes produced by a DMS lexical specification debugging tool. What follows is an Oberon source file:


(* Buffer.oberon

*)
MODULE Buffer;

  CONST N = 100;
  VAR nonempty*, nonfull*: BOOLEAN;
      in, out, n: INTEGER;
      buf: ARRAY N OF INTEGER;

  PROCEDURE Put(x: INTEGER);
  BEGIN
    (* Procedure Body *)
    IF n < N THEN
      buf[in] := x; in := (in+1) MOD N;
      INC(n); nonfull := n < N; nonempty := TRUE
    END
  END put;  (* End of
               the procedure *)

  PROCEDURE Get*(VAR x: INTEGER);
  BEGIN
    IF n > 0 THEN
      x := buf[out];
      out := (out+1) MOD N;
      DEC(n);
      nonempty := n > 0;
      nonfull := TRUE
    END
  END get;

BEGIN
  n := 0; in := 0; out := 0;
  nonempty := FALSE; nonfull := TRUE
END Buffer.

What follows is the trace of the DMS lexical scanner, constructed from only the above specification, applied to this file. This output is used by the domain engineer to check that his lexer is operating properly. It interleaves

  • original source lines
  • lexical mode switches (used in a trivial way by the Oberon lexer
  • detected tokens, their actual source line positions
  • any errors detected during the lexing process (none in this example)
In practice, a stream of structures containing these tokens is passed by the construction DMS lexer, directly to a parser to be using in constructing an AST.


Lexer Stream Display 1.5.1
!! Lexer:ResetLexicalModeStack
!! after Lexer:PushLexicalMode:
Lexical Mode Stack:
1 Main
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 1: (* Buffer.oberon
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 2: 
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 3: *)
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 4: MODULE Buffer;
!! Lexeme @ Line 4 Col 1 ELine 4 ECol 7 Token 43: 'MODULE' [VOID]=0000
  <<< PreComments:
Comment 1 Type 1 Line 1 Column 1 `(* Buffer.oberon~#d~~#a~~#d~~#a~*)'
!! Lexeme @ Line 4 Col 8 ELine 4 ECol 14 Token 62: ident [STRING]=`Buffer'
!! Lexeme @ Line 4 Col 14 ELine 4 ECol 15 Token 6: ';' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 5: 
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 6:   CONST N = 100;
!! Lexeme @ Line 6 Col 3 ELine 6 ECol 8 Token 32: 'CONST' [VOID]=0000
!! Lexeme @ Line 6 Col 9 ELine 6 ECol 10 Token 62: ident [STRING]=`N'
!! Lexeme @ Line 6 Col 11 ELine 6 ECol 12 Token 19: '=' [VOID]=0000
!! Lexeme @ Line 6 Col 13 ELine 6 ECol 16 Token 63: integer [INTEGER]=+100
!! Lexeme @ Line 6 Col 16 ELine 6 ECol 17 Token 6: ';' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 7:   VAR nonempty*, nonfull*: BOOLEAN;
!! Lexeme @ Line 7 Col 3 ELine 7 ECol 6 Token 56: 'VAR' [VOID]=0000
!! Lexeme @ Line 7 Col 7 ELine 7 ECol 15 Token 62: ident [STRING]=`nonempty'
!! Lexeme @ Line 7 Col 15 ELine 7 ECol 16 Token 14: '*' [VOID]=0000
!! Lexeme @ Line 7 Col 16 ELine 7 ECol 17 Token 5: ',' [VOID]=0000
!! Lexeme @ Line 7 Col 18 ELine 7 ECol 25 Token 62: ident [STRING]=`nonfull'
!! Lexeme @ Line 7 Col 25 ELine 7 ECol 26 Token 14: '*' [VOID]=0000
!! Lexeme @ Line 7 Col 26 ELine 7 ECol 27 Token 4: ':' [VOID]=0000
!! Lexeme @ Line 7 Col 28 ELine 7 ECol 35 Token 62: ident [STRING]=`BOOLEAN'
!! Lexeme @ Line 7 Col 35 ELine 7 ECol 36 Token 6: ';' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 8:       in, out, n: INTEGER;
!! Lexeme @ Line 8 Col 7 ELine 8 ECol 9 Token 62: ident [STRING]=`in'
!! Lexeme @ Line 8 Col 9 ELine 8 ECol 10 Token 5: ',' [VOID]=0000
!! Lexeme @ Line 8 Col 11 ELine 8 ECol 14 Token 62: ident [STRING]=`out'
!! Lexeme @ Line 8 Col 14 ELine 8 ECol 15 Token 5: ',' [VOID]=0000
!! Lexeme @ Line 8 Col 16 ELine 8 ECol 17 Token 62: ident [STRING]=`n'
!! Lexeme @ Line 8 Col 17 ELine 8 ECol 18 Token 4: ':' [VOID]=0000
!! Lexeme @ Line 8 Col 19 ELine 8 ECol 26 Token 62: ident [STRING]=`INTEGER'
!! Lexeme @ Line 8 Col 26 ELine 8 ECol 27 Token 6: ';' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 9:       buf: ARRAY N OF INTEGER;
!! Lexeme @ Line 9 Col 7 ELine 9 ECol 10 Token 62: ident [STRING]=`buf'
!! Lexeme @ Line 9 Col 10 ELine 9 ECol 11 Token 4: ':' [VOID]=0000
!! Lexeme @ Line 9 Col 12 ELine 9 ECol 17 Token 29: 'ARRAY' [VOID]=0000
!! Lexeme @ Line 9 Col 18 ELine 9 ECol 19 Token 62: ident [STRING]=`N'
!! Lexeme @ Line 9 Col 20 ELine 9 ECol 22 Token 45: 'OF' [VOID]=0000
!! Lexeme @ Line 9 Col 23 ELine 9 ECol 30 Token 62: ident [STRING]=`INTEGER'
!! Lexeme @ Line 9 Col 30 ELine 9 ECol 31 Token 6: ';' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 10: 
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 11:   PROCEDURE Put(x: INTEGER);
!! Lexeme @ Line 11 Col 3 ELine 11 ECol 12 Token 48: 'PROCEDURE' [VOID]=0000
!! Lexeme @ Line 11 Col 13 ELine 11 ECol 16 Token 62: ident [STRING]=`Put'
!! Lexeme @ Line 11 Col 16 ELine 11 ECol 17 Token 7: '(' [VOID]=0000
!! Lexeme @ Line 11 Col 17 ELine 11 ECol 18 Token 62: ident [STRING]=`x'
!! Lexeme @ Line 11 Col 18 ELine 11 ECol 19 Token 4: ':' [VOID]=0000
!! Lexeme @ Line 11 Col 20 ELine 11 ECol 27 Token 62: ident [STRING]=`INTEGER'
!! Lexeme @ Line 11 Col 27 ELine 11 ECol 28 Token 8: ')' [VOID]=0000
!! Lexeme @ Line 11 Col 28 ELine 11 ECol 29 Token 6: ';' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 12:   BEGIN
!! Lexeme @ Line 12 Col 3 ELine 12 ECol 8 Token 30: 'BEGIN' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 13:     (* Procedure Body *)
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 14:     IF n < N THEN
!! Lexeme @ Line 14 Col 5 ELine 14 ECol 7 Token 38: 'IF' [VOID]=0000
  <<< PreComments:
Comment 1 Type 1 Line 13 Column 5 `(* Procedure Body *)'
!! Lexeme @ Line 14 Col 8 ELine 14 ECol 9 Token 62: ident [STRING]=`n'
!! Lexeme @ Line 14 Col 10 ELine 14 ECol 11 Token 21: '<' [VOID]=0000
!! Lexeme @ Line 14 Col 12 ELine 14 ECol 13 Token 62: ident [STRING]=`N'
!! Lexeme @ Line 14 Col 14 ELine 14 ECol 18 Token 52: 'THEN' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 15:       buf[in] := x; in := (in+1) MOD N;
!! Lexeme @ Line 15 Col 7 ELine 15 ECol 10 Token 62: ident [STRING]=`buf'
!! Lexeme @ Line 15 Col 10 ELine 15 ECol 11 Token 9: '[' [VOID]=0000
!! Lexeme @ Line 15 Col 11 ELine 15 ECol 13 Token 62: ident [STRING]=`in'
!! Lexeme @ Line 15 Col 13 ELine 15 ECol 14 Token 10: ']' [VOID]=0000
!! Lexeme @ Line 15 Col 15 ELine 15 ECol 17 Token 28: ':=' [VOID]=0000
!! Lexeme @ Line 15 Col 18 ELine 15 ECol 19 Token 62: ident [STRING]=`x'
!! Lexeme @ Line 15 Col 19 ELine 15 ECol 20 Token 6: ';' [VOID]=0000
!! Lexeme @ Line 15 Col 21 ELine 15 ECol 23 Token 62: ident [STRING]=`in'
!! Lexeme @ Line 15 Col 24 ELine 15 ECol 26 Token 28: ':=' [VOID]=0000
!! Lexeme @ Line 15 Col 27 ELine 15 ECol 28 Token 7: '(' [VOID]=0000
!! Lexeme @ Line 15 Col 28 ELine 15 ECol 30 Token 62: ident [STRING]=`in'
!! Lexeme @ Line 15 Col 30 ELine 15 ECol 31 Token 17: '+' [VOID]=0000
!! Lexeme @ Line 15 Col 31 ELine 15 ECol 32 Token 63: integer [INTEGER]=+1
!! Lexeme @ Line 15 Col 32 ELine 15 ECol 33 Token 8: ')' [VOID]=0000
!! Lexeme @ Line 15 Col 34 ELine 15 ECol 37 Token 42: 'MOD' [VOID]=0000
!! Lexeme @ Line 15 Col 38 ELine 15 ECol 39 Token 62: ident [STRING]=`N'
!! Lexeme @ Line 15 Col 39 ELine 15 ECol 40 Token 6: ';' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 16:       INC(n); nonfull := n < N; nonempty := TRUE
!! Lexeme @ Line 16 Col 7 ELine 16 ECol 10 Token 62: ident [STRING]=`INC'
!! Lexeme @ Line 16 Col 10 ELine 16 ECol 11 Token 7: '(' [VOID]=0000
!! Lexeme @ Line 16 Col 11 ELine 16 ECol 12 Token 62: ident [STRING]=`n'
!! Lexeme @ Line 16 Col 12 ELine 16 ECol 13 Token 8: ')' [VOID]=0000
!! Lexeme @ Line 16 Col 13 ELine 16 ECol 14 Token 6: ';' [VOID]=0000
!! Lexeme @ Line 16 Col 15 ELine 16 ECol 22 Token 62: ident [STRING]=`nonfull'
!! Lexeme @ Line 16 Col 23 ELine 16 ECol 25 Token 28: ':=' [VOID]=0000
!! Lexeme @ Line 16 Col 26 ELine 16 ECol 27 Token 62: ident [STRING]=`n'
!! Lexeme @ Line 16 Col 28 ELine 16 ECol 29 Token 21: '<' [VOID]=0000
!! Lexeme @ Line 16 Col 30 ELine 16 ECol 31 Token 62: ident [STRING]=`N'
!! Lexeme @ Line 16 Col 31 ELine 16 ECol 32 Token 6: ';' [VOID]=0000
!! Lexeme @ Line 16 Col 33 ELine 16 ECol 41 Token 62: ident [STRING]=`nonempty'
!! Lexeme @ Line 16 Col 42 ELine 16 ECol 44 Token 28: ':=' [VOID]=0000
!! Lexeme @ Line 16 Col 45 ELine 16 ECol 49 Token 61: 'TRUE' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 17:     END
!! Lexeme @ Line 17 Col 5 ELine 17 ECol 8 Token 37: 'END' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 18:   END put;  (* End of
!! Lexeme @ Line 18 Col 3 ELine 18 ECol 6 Token 37: 'END' [VOID]=0000
!! Lexeme @ Line 18 Col 7 ELine 18 ECol 10 Token 62: ident [STRING]=`put'
!! Lexeme @ Line 18 Col 10 ELine 18 ECol 11 Token 6: ';' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 19:                the procedure *)
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 20: 
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 21:   PROCEDURE Get*(VAR x: INTEGER);
!! Lexeme @ Line 21 Col 3 ELine 21 ECol 12 Token 48: 'PROCEDURE' [VOID]=0000
  <<< PreComments:
Comment 1 Type 1 Line 18 Column 13 `(* End of~#d~~#a~               the procedure *)'
!! Lexeme @ Line 21 Col 13 ELine 21 ECol 16 Token 62: ident [STRING]=`Get'
!! Lexeme @ Line 21 Col 16 ELine 21 ECol 17 Token 14: '*' [VOID]=0000
!! Lexeme @ Line 21 Col 17 ELine 21 ECol 18 Token 7: '(' [VOID]=0000
!! Lexeme @ Line 21 Col 18 ELine 21 ECol 21 Token 56: 'VAR' [VOID]=0000
!! Lexeme @ Line 21 Col 22 ELine 21 ECol 23 Token 62: ident [STRING]=`x'
!! Lexeme @ Line 21 Col 23 ELine 21 ECol 24 Token 4: ':' [VOID]=0000
!! Lexeme @ Line 21 Col 25 ELine 21 ECol 32 Token 62: ident [STRING]=`INTEGER'
!! Lexeme @ Line 21 Col 32 ELine 21 ECol 33 Token 8: ')' [VOID]=0000
!! Lexeme @ Line 21 Col 33 ELine 21 ECol 34 Token 6: ';' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 22:   BEGIN
!! Lexeme @ Line 22 Col 3 ELine 22 ECol 8 Token 30: 'BEGIN' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 23:     IF n > 0 THEN
!! Lexeme @ Line 23 Col 5 ELine 23 ECol 7 Token 38: 'IF' [VOID]=0000
!! Lexeme @ Line 23 Col 8 ELine 23 ECol 9 Token 62: ident [STRING]=`n'
!! Lexeme @ Line 23 Col 10 ELine 23 ECol 11 Token 22: '>' [VOID]=0000
!! Lexeme @ Line 23 Col 12 ELine 23 ECol 13 Token 63: integer [INTEGER]=+0
!! Lexeme @ Line 23 Col 14 ELine 23 ECol 18 Token 52: 'THEN' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 24:       x := buf[out];
!! Lexeme @ Line 24 Col 7 ELine 24 ECol 8 Token 62: ident [STRING]=`x'
!! Lexeme @ Line 24 Col 9 ELine 24 ECol 11 Token 28: ':=' [VOID]=0000
!! Lexeme @ Line 24 Col 12 ELine 24 ECol 15 Token 62: ident [STRING]=`buf'
!! Lexeme @ Line 24 Col 15 ELine 24 ECol 16 Token 9: '[' [VOID]=0000
!! Lexeme @ Line 24 Col 16 ELine 24 ECol 19 Token 62: ident [STRING]=`out'
!! Lexeme @ Line 24 Col 19 ELine 24 ECol 20 Token 10: ']' [VOID]=0000
!! Lexeme @ Line 24 Col 20 ELine 24 ECol 21 Token 6: ';' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 25:       out := (out+1) MOD N;
!! Lexeme @ Line 25 Col 7 ELine 25 ECol 10 Token 62: ident [STRING]=`out'
!! Lexeme @ Line 25 Col 11 ELine 25 ECol 13 Token 28: ':=' [VOID]=0000
!! Lexeme @ Line 25 Col 14 ELine 25 ECol 15 Token 7: '(' [VOID]=0000
!! Lexeme @ Line 25 Col 15 ELine 25 ECol 18 Token 62: ident [STRING]=`out'
!! Lexeme @ Line 25 Col 18 ELine 25 ECol 19 Token 17: '+' [VOID]=0000
!! Lexeme @ Line 25 Col 19 ELine 25 ECol 20 Token 63: integer [INTEGER]=+1
!! Lexeme @ Line 25 Col 20 ELine 25 ECol 21 Token 8: ')' [VOID]=0000
!! Lexeme @ Line 25 Col 22 ELine 25 ECol 25 Token 42: 'MOD' [VOID]=0000
!! Lexeme @ Line 25 Col 26 ELine 25 ECol 27 Token 62: ident [STRING]=`N'
!! Lexeme @ Line 25 Col 27 ELine 25 ECol 28 Token 6: ';' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 26:       DEC(n);
!! Lexeme @ Line 26 Col 7 ELine 26 ECol 10 Token 62: ident [STRING]=`DEC'
!! Lexeme @ Line 26 Col 10 ELine 26 ECol 11 Token 7: '(' [VOID]=0000
!! Lexeme @ Line 26 Col 11 ELine 26 ECol 12 Token 62: ident [STRING]=`n'
!! Lexeme @ Line 26 Col 12 ELine 26 ECol 13 Token 8: ')' [VOID]=0000
!! Lexeme @ Line 26 Col 13 ELine 26 ECol 14 Token 6: ';' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 27:       nonempty := n > 0;
!! Lexeme @ Line 27 Col 7 ELine 27 ECol 15 Token 62: ident [STRING]=`nonempty'
!! Lexeme @ Line 27 Col 16 ELine 27 ECol 18 Token 28: ':=' [VOID]=0000
!! Lexeme @ Line 27 Col 19 ELine 27 ECol 20 Token 62: ident [STRING]=`n'
!! Lexeme @ Line 27 Col 21 ELine 27 ECol 22 Token 22: '>' [VOID]=0000
!! Lexeme @ Line 27 Col 23 ELine 27 ECol 24 Token 63: integer [INTEGER]=+0
!! Lexeme @ Line 27 Col 24 ELine 27 ECol 25 Token 6: ';' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 28:       nonfull := TRUE
!! Lexeme @ Line 28 Col 7 ELine 28 ECol 14 Token 62: ident [STRING]=`nonfull'
!! Lexeme @ Line 28 Col 15 ELine 28 ECol 17 Token 28: ':=' [VOID]=0000
!! Lexeme @ Line 28 Col 18 ELine 28 ECol 22 Token 61: 'TRUE' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 29:     END
!! Lexeme @ Line 29 Col 5 ELine 29 ECol 8 Token 37: 'END' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 30:   END get;
!! Lexeme @ Line 30 Col 3 ELine 30 ECol 6 Token 37: 'END' [VOID]=0000
!! Lexeme @ Line 30 Col 7 ELine 30 ECol 10 Token 62: ident [STRING]=`get'
!! Lexeme @ Line 30 Col 10 ELine 30 ECol 11 Token 6: ';' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 31: 
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 32: BEGIN
!! Lexeme @ Line 32 Col 1 ELine 32 ECol 6 Token 30: 'BEGIN' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 33:   n := 0; in := 0; out := 0;
!! Lexeme @ Line 33 Col 3 ELine 33 ECol 4 Token 62: ident [STRING]=`n'
!! Lexeme @ Line 33 Col 5 ELine 33 ECol 7 Token 28: ':=' [VOID]=0000
!! Lexeme @ Line 33 Col 8 ELine 33 ECol 9 Token 63: integer [INTEGER]=+0
!! Lexeme @ Line 33 Col 9 ELine 33 ECol 10 Token 6: ';' [VOID]=0000
!! Lexeme @ Line 33 Col 11 ELine 33 ECol 13 Token 62: ident [STRING]=`in'
!! Lexeme @ Line 33 Col 14 ELine 33 ECol 16 Token 28: ':=' [VOID]=0000
!! Lexeme @ Line 33 Col 17 ELine 33 ECol 18 Token 63: integer [INTEGER]=+0
!! Lexeme @ Line 33 Col 18 ELine 33 ECol 19 Token 6: ';' [VOID]=0000
!! Lexeme @ Line 33 Col 20 ELine 33 ECol 23 Token 62: ident [STRING]=`out'
!! Lexeme @ Line 33 Col 24 ELine 33 ECol 26 Token 28: ':=' [VOID]=0000
!! Lexeme @ Line 33 Col 27 ELine 33 ECol 28 Token 63: integer [INTEGER]=+0
!! Lexeme @ Line 33 Col 28 ELine 33 ECol 29 Token 6: ';' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 34:   nonempty := FALSE; nonfull := TRUE
!! Lexeme @ Line 34 Col 3 ELine 34 ECol 11 Token 62: ident [STRING]=`nonempty'
!! Lexeme @ Line 34 Col 12 ELine 34 ECol 14 Token 28: ':=' [VOID]=0000
!! Lexeme @ Line 34 Col 15 ELine 34 ECol 20 Token 59: 'FALSE' [VOID]=0000
!! Lexeme @ Line 34 Col 20 ELine 34 ECol 21 Token 6: ';' [VOID]=0000
!! Lexeme @ Line 34 Col 22 ELine 34 ECol 29 Token 62: ident [STRING]=`nonfull'
!! Lexeme @ Line 34 Col 30 ELine 34 ECol 32 Token 28: ':=' [VOID]=0000
!! Lexeme @ Line 34 Col 33 ELine 34 ECol 37 Token 61: 'TRUE' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 35: END Buffer.
!! Lexeme @ Line 35 Col 1 ELine 35 ECol 4 Token 37: 'END' [VOID]=0000
!! Lexeme @ Line 35 Col 5 ELine 35 ECol 11 Token 62: ident [STRING]=`Buffer'
!! Lexeme @ Line 35 Col 11 ELine 35 ECol 12 Token 3: '.' [VOID]=0000
File "C:/DMS/Domains/Oberon/Examples/Buffer.Oberon", line 36: 
!! Lexeme @ Line 36 Col 1 ELine 36 ECol 1 Token 0: EndOfFile
36 lines read.
156 lexemes processed.
0 lexical errors detected.
Exiting with final status 0.

Next: DMS Parsers.

For more information: [email protected]    Follow us at Twitter: @SemanticDesigns

DMS Lexical Specifications