Add documentation for cfg-parser.

original commit: e9c5c78468e1564bff9ca3966dfddc3cc4adc6e8
12 years ago · cafb8a5c62
parent e39cbed86c
commit cafb8a5c62
1 changed files with 170 additions and 2 deletions
--- a/collects/parser-tools/parser-tools.scrbl
+++ b/collects/parser-tools/parser-tools.scrbl
@ -4,7 +4,8 @@
                     scheme/contract
                     parser-tools/lex
                     (prefix-in : parser-tools/lex-sre)
-                     parser-tools/yacc))
+                     parser-tools/yacc
                     parser-tools/cfg-parser))
@title{Parser Tools: @exec{lex} and @exec{yacc}-style Parsing}
@ -555,6 +556,10 @@ the right choice when using @racket[lexer] in other situations.
      @racketidfont{$}@math{i}@racketidfont{-start-pos} and
      @racketidfont{$}@math{i}@racketidfont{-end-pos}).
      An @deftech{error production} can be defined by providing
      a production of the form @racket[(error α)], where α is a
      string of grammar symbols, possibly empty.
      All of the productions for a given non-terminal must be grouped
      with it. That is, no @racket[non-terminal-id] may appear twice
      on the left hand side in a parser.}
@ -662,7 +667,7 @@ the right choice when using @racket[lexer] in other situations.
    The @racket[_parse] function returns the value associated with the
    parse tree by the semantic actions.  If the parser encounters an
    error, after invoking the supplied error function, it will try to
-    use error productions to continue parsing.  If it cannot, it
+    use @tech{error production}s to continue parsing.  If it cannot, it
    raises @racket[exn:fail:read].
    If multiple non-terminals are provided in @racket[start], the
@ -677,6 +682,169 @@ the right choice when using @racket[lexer] in other situations.
    place the parser into a module and compile the module to a
    @filepath{.zo} bytecode file.}
@section{Ambiguous parsing}
@section-index["cfg-parser"]
@defmodule[parser-tools/cfg-parser]
@racketmodname[parser-tools/cfg-parser] provides another parser
 generator as an alternative to @racketmodname[parser-tools/yacc].
 Unlike @racket[parser], @racket[cfg-parser] can consume ambiguous grammars.
 Its interface is a subset of @racketmodname[parser-tools/yacc].
@defform/subs[#:literals (grammar tokens start end precs src-pos
                          suppress debug yacc-output prec)
              (cfg-parser clause ...)
              ([clause (grammar (non-terminal-id 
                                 ((grammar-id ...) maybe-prec expr)
                                 ...)
                                ...)
                       (tokens group-id ...)
                       (start non-terminal-id ...)
                       (end token-id ...)
                       (@#,racketidfont{error} expr)
                       (src-pos)])]{
    Creates a parser.  The clauses may be in any order, as long as there
    are no duplicates and all non-@italic{OPTIONAL} declarations are
    present:
    @itemize[
      @item{@racketblock0[(grammar (non-terminal-id
                                    ((grammar-id ...) maybe-prec expr)
                                    ...)
                                   ...)]
      Declares the grammar to be parsed.  Each @racket[grammar-id] can
      be a @racket[token-id] from a @racket[group-id] named in a
      @racket[tokens] declaration, or it can be a
      @racket[non-terminal-id] declared in the @racket[grammar]
      declaration.  The @racket[expr] is a
      ``semantic action,'' which is evaluated when the input is found
      to match its corresponding production.
      Each action is Racket code that has the same scope as its
      parser's definition, except that the variables @racket[$1], ...,
      @racketidfont{$}@math{i} are bound, where @math{i} is the number
      of @racket[grammar-id]s in the corresponding production. Each
      @racketidfont{$}@math{k} is bound to the result of the action
      for the @math{k}@superscript{th} grammar symbol on the right of
      the production, if that grammar symbol is a non-terminal, or the
      value stored in the token if the grammar symbol is a terminal.
      If the @racket[src-pos] option is present in the parser, then
      variables @racket[$1-start-pos], ...,
      @racketidfont{$}@math{i}@racketidfont{-start-pos} and
      @racket[$1-end-pos], ...,
      @racketidfont{$}@math{i}@racketidfont{-end-pos} and are also
      available, and they refer to the position structures
      corresponding to the start and end of the corresponding
      @racket[grammar-symbol]. Grammar symbols defined as empty-tokens
      have no @racketidfont{$}@math{k} associated, but do have
      @racketidfont{$}@math{k}@racketidfont{-start-pos} and
      @racketidfont{$}@math{k}@racketidfont{-end-pos}.
      Also @racketidfont{$n-start-pos} and @racketidfont{$n-end-pos}
      are bound to the largest start and end positions, (i.e.,
      @racketidfont{$}@math{i}@racketidfont{-start-pos} and
      @racketidfont{$}@math{i}@racketidfont{-end-pos}).
      An @tech{error production} can be defined by providing
      a production of the form @racket[(error α)], where α is a
      string of grammar symbols, possibly empty.
      All of the productions for a given non-terminal must be grouped
      with it. That is, no @racket[non-terminal-id] may appear twice
      on the left hand side in a parser.}
      @item{@racket[(tokens group-id ...)]
      Declares that all of the tokens defined in each
      @racket[group-id]---as bound by @racket[define-tokens] or
      @racket[define-empty-tokens]---can be used by the parser in the
      @racket[grammar] declaration.}
      @item{@racket[(start non-terminal-id)]
      Declares a starting non-terminal for the grammar.
      Note: unlike @racket[parser], @racket[cfg-parser] does not
      currently support multiple starting non-terminals
      for the grammar.}
      @item{@racket[(end token-id ...)]
      Specifies a set of tokens from which some member must follow any
      valid parse.  For example, an EOF token would be specified for a
      parser that parses entire files and a newline token for a parser
      that parses entire lines individually.}
      @item{@racket[(@#,racketidfont{error} expr)]
      The @racket[expr] should evaluate to a function which will be
      executed for its side-effect whenever the parser encounters an
      error.
      If the @racket[src-pos] declaration is present, the function
      should accept 5 arguments,:
      @racketblock[(lambda (tok-ok? tok-name tok-value _start-pos _end-pos) 
                     ....)]
      Otherwise it should accept 3:
      @racketblock[(lambda (tok-ok? tok-name tok-value) 
                     ....)]
      The first argument will be @racket[#f] if and only if the error
      is that an invalid token was received.  The second and third
      arguments will be the name and the value of the token at which
      the error was detected.  The fourth and fifth arguments, if
      present, provide the source positions of that token.}
      @item{@racket[(src-pos)] @italic{OPTIONAL}
      Causes the generated parser to expect input in the form
      @racket[(make-position-token _token _start-pos _end-pos)] instead
      of simply @racket[_token].  Include this option when using the
      parser with a lexer generated with @racket[lexer-src-pos].}
    ]
    The result of a @racket[parser] expression with one @racket[start]
    non-terminal is a function, @racket[_parse], that takes one
    argument.  This argument must be a zero argument function,
    @racket[_gen], that produces successive tokens of the input each
    time it is called.  If desired, the @racket[_gen] may return
    symbols instead of tokens, and the parser will treat symbols as
    tokens of the corresponding name (with @racket[#f] as a value, so
    it is usual to return symbols only in the case of empty tokens).
    The @racket[_parse] function returns the value associated with the
    parse tree by the semantic actions.  If the parser encounters an
    error, after invoking the supplied error function, it will try to
    use @tech{error production}s to continue parsing.  If it cannot, it
    raises @racket[exn:fail:read].
    If multiple non-terminals are provided in @racket[start], the
    @racket[parser] expression produces a list of parsing functions,
    one for each non-terminal in the same order. Each parsing function
    is like the result of a parser expression with only one
    @racket[start] non-terminal,
    Each time the Racket code for a @racket[cfg-parser] is compiled
    (e.g. when a @filepath{.rkt} file containing a @racket[cfg-parser] form
    is loaded), the parser generator is run.  To avoid this overhead
    place the parser into a module and compile the module to a
    @filepath{.zo} bytecode file.
                                             }
@; ----------------------------------------------------------------------
@section{Converting @exec{yacc} or @exec{bison} Grammars}