Add documentation for cfg-parser.

original commit: e9c5c78468e1564bff9ca3966dfddc3cc4adc6e8
12 years ago · cafb8a5c62
parent e39cbed86c
commit cafb8a5c62
1 changed files with 170 additions and 2 deletions
--- a/collects/parser-tools/parser-tools.scrbl
+++ b/collects/parser-tools/parser-tools.scrbl
@ -4,7 +4,8 @@
                     scheme/contract
                     parser-tools/lex
                     (prefix-in : parser-tools/lex-sre)
-                     parser-tools/yacc))
+                     parser-tools/yacc
+                     parser-tools/cfg-parser))

@title{Parser Tools: @exec{lex} and @exec{yacc}-style Parsing}

@ -555,6 +556,10 @@ the right choice when using @racket[lexer] in other situations.
      @racketidfont{$}@math{i}@racketidfont{-start-pos} and
      @racketidfont{$}@math{i}@racketidfont{-end-pos}).

+      An @deftech{error production} can be defined by providing
+      a production of the form @racket[(error α)], where α is a
+      string of grammar symbols, possibly empty.
+      
      All of the productions for a given non-terminal must be grouped
      with it. That is, no @racket[non-terminal-id] may appear twice
      on the left hand side in a parser.}
@ -662,7 +667,7 @@ the right choice when using @racket[lexer] in other situations.
    The @racket[_parse] function returns the value associated with the
    parse tree by the semantic actions.  If the parser encounters an
    error, after invoking the supplied error function, it will try to
-    use error productions to continue parsing.  If it cannot, it
+    use @tech{error production}s to continue parsing.  If it cannot, it
    raises @racket[exn:fail:read].

    If multiple non-terminals are provided in @racket[start], the
@ -677,6 +682,169 @@ the right choice when using @racket[lexer] in other situations.
    place the parser into a module and compile the module to a
    @filepath{.zo} bytecode file.}

+                                            
+                                            
+                                            
+@section{Ambiguous parsing}
+
+@section-index["cfg-parser"]
+
+@defmodule[parser-tools/cfg-parser]
+
+@racketmodname[parser-tools/cfg-parser] provides another parser
+generator as an alternative to @racketmodname[parser-tools/yacc].
+Unlike @racket[parser], @racket[cfg-parser] can consume ambiguous grammars.
+Its interface is a subset of @racketmodname[parser-tools/yacc].
+
+@defform/subs[#:literals (grammar tokens start end precs src-pos
+                          suppress debug yacc-output prec)
+              (cfg-parser clause ...)
+              ([clause (grammar (non-terminal-id 
+                                 ((grammar-id ...) maybe-prec expr)
+                                 ...)
+                                ...)
+                       (tokens group-id ...)
+                       (start non-terminal-id ...)
+                       (end token-id ...)
+                       (@#,racketidfont{error} expr)
+                       (src-pos)])]{
+    Creates a parser.  The clauses may be in any order, as long as there
+    are no duplicates and all non-@italic{OPTIONAL} declarations are
+    present:
+
+    @itemize[
+
+      @item{@racketblock0[(grammar (non-terminal-id
+                                    ((grammar-id ...) maybe-prec expr)
+                                    ...)
+                                   ...)]
+
+      Declares the grammar to be parsed.  Each @racket[grammar-id] can
+      be a @racket[token-id] from a @racket[group-id] named in a
+      @racket[tokens] declaration, or it can be a
+      @racket[non-terminal-id] declared in the @racket[grammar]
+      declaration.  The @racket[expr] is a
+      ``semantic action,'' which is evaluated when the input is found
+      to match its corresponding production.
+
+      Each action is Racket code that has the same scope as its
+      parser's definition, except that the variables @racket[$1], ...,
+      @racketidfont{$}@math{i} are bound, where @math{i} is the number
+      of @racket[grammar-id]s in the corresponding production. Each
+      @racketidfont{$}@math{k} is bound to the result of the action
+      for the @math{k}@superscript{th} grammar symbol on the right of
+      the production, if that grammar symbol is a non-terminal, or the
+      value stored in the token if the grammar symbol is a terminal.
+      If the @racket[src-pos] option is present in the parser, then
+      variables @racket[$1-start-pos], ...,
+      @racketidfont{$}@math{i}@racketidfont{-start-pos} and
+      @racket[$1-end-pos], ...,
+      @racketidfont{$}@math{i}@racketidfont{-end-pos} and are also
+      available, and they refer to the position structures
+      corresponding to the start and end of the corresponding
+      @racket[grammar-symbol]. Grammar symbols defined as empty-tokens
+      have no @racketidfont{$}@math{k} associated, but do have
+      @racketidfont{$}@math{k}@racketidfont{-start-pos} and
+      @racketidfont{$}@math{k}@racketidfont{-end-pos}.
+      Also @racketidfont{$n-start-pos} and @racketidfont{$n-end-pos}
+      are bound to the largest start and end positions, (i.e.,
+      @racketidfont{$}@math{i}@racketidfont{-start-pos} and
+      @racketidfont{$}@math{i}@racketidfont{-end-pos}).
+
+      An @tech{error production} can be defined by providing
+      a production of the form @racket[(error α)], where α is a
+      string of grammar symbols, possibly empty.
+      
+      All of the productions for a given non-terminal must be grouped
+      with it. That is, no @racket[non-terminal-id] may appear twice
+      on the left hand side in a parser.}
+
+
+      @item{@racket[(tokens group-id ...)]
+
+      Declares that all of the tokens defined in each
+      @racket[group-id]---as bound by @racket[define-tokens] or
+      @racket[define-empty-tokens]---can be used by the parser in the
+      @racket[grammar] declaration.}
+
+
+      @item{@racket[(start non-terminal-id)]
+
+      Declares a starting non-terminal for the grammar.
+      
+      Note: unlike @racket[parser], @racket[cfg-parser] does not
+      currently support multiple starting non-terminals
+      for the grammar.}
+
+
+      @item{@racket[(end token-id ...)]
+
+      Specifies a set of tokens from which some member must follow any
+      valid parse.  For example, an EOF token would be specified for a
+      parser that parses entire files and a newline token for a parser
+      that parses entire lines individually.}
+
+
+      @item{@racket[(@#,racketidfont{error} expr)]
+
+      The @racket[expr] should evaluate to a function which will be
+      executed for its side-effect whenever the parser encounters an
+      error.
+
+      If the @racket[src-pos] declaration is present, the function
+      should accept 5 arguments,:
+
+      @racketblock[(lambda (tok-ok? tok-name tok-value _start-pos _end-pos) 
+                     ....)]
+
+      Otherwise it should accept 3:
+
+      @racketblock[(lambda (tok-ok? tok-name tok-value) 
+                     ....)]
+
+      The first argument will be @racket[#f] if and only if the error
+      is that an invalid token was received.  The second and third
+      arguments will be the name and the value of the token at which
+      the error was detected.  The fourth and fifth arguments, if
+      present, provide the source positions of that token.}
+
+
+      @item{@racket[(src-pos)] @italic{OPTIONAL}
+
+      Causes the generated parser to expect input in the form
+      @racket[(make-position-token _token _start-pos _end-pos)] instead
+      of simply @racket[_token].  Include this option when using the
+      parser with a lexer generated with @racket[lexer-src-pos].}
+    ]
+
+    The result of a @racket[parser] expression with one @racket[start]
+    non-terminal is a function, @racket[_parse], that takes one
+    argument.  This argument must be a zero argument function,
+    @racket[_gen], that produces successive tokens of the input each
+    time it is called.  If desired, the @racket[_gen] may return
+    symbols instead of tokens, and the parser will treat symbols as
+    tokens of the corresponding name (with @racket[#f] as a value, so
+    it is usual to return symbols only in the case of empty tokens).
+    The @racket[_parse] function returns the value associated with the
+    parse tree by the semantic actions.  If the parser encounters an
+    error, after invoking the supplied error function, it will try to
+    use @tech{error production}s to continue parsing.  If it cannot, it
+    raises @racket[exn:fail:read].
+
+    If multiple non-terminals are provided in @racket[start], the
+    @racket[parser] expression produces a list of parsing functions,
+    one for each non-terminal in the same order. Each parsing function
+    is like the result of a parser expression with only one
+    @racket[start] non-terminal,
+
+    Each time the Racket code for a @racket[cfg-parser] is compiled
+    (e.g. when a @filepath{.rkt} file containing a @racket[cfg-parser] form
+    is loaded), the parser generator is run.  To avoid this overhead
+    place the parser into a module and compile the module to a
+    @filepath{.zo} bytecode file.
+                                             }
+                                            
+                                            
@; ----------------------------------------------------------------------

@section{Converting @exec{yacc} or @exec{bison} Grammars}