beautiful-racket/brag/brag/brag.scrbl

#lang scribble/manual
@(require scribble/eval
          racket/date
          file/md5
          (for-label racket
                     brag/support
                     brag/lexer-support
                     brag/examples/nested-word-list
                     (only-in parser-tools/lex lexer-src-pos)
                     (only-in syntax/parse syntax-parse ~literal)))


@(define (lookup-date filename [default ""])
   (cond
     [(file-exists? filename)
      (define modify-seconds (file-or-directory-modify-seconds filename))
      (define a-date (seconds->date modify-seconds))
      (date->string a-date)]
     [else
      default]))

@(define (compute-md5sum filename [default ""])
   (cond [(file-exists? filename)
          (bytes->string/utf-8 (call-with-input-file filename md5 #:mode 'binary))]
         [else
          default]))


@title{brag: the Beautiful Racket AST Generator}
@author["Danny Yoo (95%)" "Matthew Butterick (5%)"]

@defmodulelang[brag]

@section{Quick start}

@(define my-eval (make-base-eval))
@(my-eval '(require brag/examples/nested-word-list 
                          racket/list
                          racket/match))

Suppose we're given the
following string:
@racketblock["(radiant (humble))"]


How would we turn this string into a structured value?  That is, how would we @emph{parse} it? (Let's also suppose we've never heard of @racket[read].)

First, we need to consider the structure of the things we'd like to parse. The
string above looks like a nested list of words. Good start.

Second, how might we describe this formally — meaning, in a way that a computer could understand? A common notation to describe the structure of these things is @link["http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form"]{Backus-Naur Form} (BNF). So let's try to notate the structure of nested word lists in BNF.

@nested[#:style 'code-inset]{
@verbatim{
nested-word-list: WORD
                | LEFT-PAREN nested-word-list* RIGHT-PAREN
}}

What we intend by this notation is this: @racket[nested-word-list] is either a @racket[WORD], or a parenthesized list of @racket[nested-word-list]s. We use the character @litchar{*} to represent zero or more repetitions of the previous thing. We treat the uppercased @racket[LEFT-PAREN], @racket[RIGHT-PAREN], and @racket[WORD] as placeholders for @emph{tokens} (a @tech{token} being the smallest meaningful item in the parsed string):

Here are a few examples of tokens:
@interaction[#:eval my-eval
(require brag/support)
(token 'LEFT-PAREN)
(token 'WORD "crunchy" #:span 7)
(token 'RIGHT-PAREN)]

This BNF description is also known as a @deftech{grammar}. Just as it does in a natural language like English or French, a grammar describes something in terms of what elements can fit where.

Have we made progress?  We have a valid grammar. But we're still missing a @emph{parser}: a function that can use that description to make structures out of a sequence of tokens.

Meanwhile, it's clear that we don't yet have a valid program because there's no @litchar{#lang} line. Let's add one: put @litchar{#lang brag} at the top of the grammar, and save it as a file called @filepath{nested-word-list.rkt}.

@filebox["nested-word-list.rkt"]{
@verbatim{
#lang brag
nested-word-list: WORD
                | LEFT-PAREN nested-word-list* RIGHT-PAREN
}}

Now it's a proper program. But what does it do?

@interaction[#:eval my-eval
@eval:alts[(require "nested-word-list.rkt") (void)]
parse
]

It gives us a @racket[parse] function. Let's investigate what @racket[parse]
does. What happens if we pass it a sequence of tokens?

@interaction[#:eval my-eval
             (define a-parsed-value
               (parse (list (token 'LEFT-PAREN "(")
                            (token 'WORD "some")
                            (token 'LEFT-PAREN "[") 
                            (token 'WORD "pig")
                            (token 'RIGHT-PAREN "]") 
                            (token 'RIGHT-PAREN ")"))))
             a-parsed-value]

Those who have messed around with macros will recognize this as a @seclink["stx-obj" #:doc '(lib "scribblings/guide/guide.scrbl")]{syntax object}.

@interaction[#:eval my-eval
(syntax->datum a-parsed-value)
]

That's @racket[(some [pig])], essentially.

What happens if we pass our @racket[parse] function a bigger source of tokens?

@interaction[#:eval my-eval
@code:comment{tokenize: string -> (sequenceof token-struct?)}
@code:comment{Generate tokens from a string:}
(define (tokenize s)
  (for/list ([str (regexp-match* #px"\\(|\\)|\\w+" s)])
    (match str
      ["("
       (token 'LEFT-PAREN str)]
      [")"
       (token 'RIGHT-PAREN str)]
      [else
       (token 'WORD str)])))

@code:comment{For example:}
(define token-source (tokenize "(welcome (to (((brag)) ())))"))
(define v (parse token-source))
(syntax->datum v)
]

Welcome to @tt{brag}.


@;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

@section{Introduction}

@tt{brag} is a parser generator designed to be easy
to use:

@itemize[

@item{It provides a @litchar{#lang} for writing BNF grammars.
A module written in @litchar{#lang brag} automatically generates a
parser. The output of this parser tries to follow
@link["http://en.wikipedia.org/wiki/How_to_Design_Programs"]{HTDP}
guidelines. The structure of the grammar informs the structure of the
Racket syntax objects it generates.}

@item{The language uses a few conventions to simplify the expression of
grammars. The first rule in the grammar is assumed to be the
starting production. Identifiers in @tt{UPPERCASE} are treated as
terminal tokens. All other identifiers are treated as nonterminals.}

@item{Tokenizers can be developed independently of parsers.
@tt{brag} takes a liberal view on tokens: they can be strings,
symbols, or instances constructed with @racket[token]. Tokens can optionally provide source location, in which case a syntax object generated by the parser will too.}

@item{The parser can usually handle ambiguous grammars.}

@item{It integrates with the rest of the Racket
@link["http://docs.racket-lang.org/guide/languages.html"]{language toolchain}.}

]


@subsection{Example: a small DSL for ASCII diagrams}

@margin-note{This example is
@link["http://stackoverflow.com/questions/12345647/rewrite-this-script-by-designing-an-interpreter-in-racket"]{derived from a question}  on Stack Overflow.}  

To understand @tt{brag}'s design, let's look
at a toy problem. We'd like to define a language for
drawing simple ASCII diagrams. So if we write something like this:

@nested[#:style 'inset]{
@verbatim|{
3 9 X;
6 3 b 3 X 3 b;
3 9 X;
}|}

It should generate the following picture:

@nested[#:style 'inset]{
@verbatim|{
XXXXXXXXX
XXXXXXXXX
XXXXXXXXX
   XXX   
   XXX   
   XXX   
   XXX   
   XXX   
   XXX   
XXXXXXXXX
XXXXXXXXX
XXXXXXXXX
}|}


@subsection{Syntax and semantics}

We're being somewhat casual with what we mean by the program above. Let's try to nail down some meanings. 

Each line of the program has a semicolon at the end, and describes the output of several @emph{rows} of the line drawing. Let's look at two of the lines in the example:

@itemize[
@item{@litchar{3 9 X;}: ``Repeat the following 3 times: print @racket["X"] nine times, followed by
a newline.''}

@item{@litchar{6 3 b 3 X 3 b;}: ``Repeat the following 6 times: print @racket[" "] three times, 
followed by @racket["X"] three times, followed by @racket[" "] three times, followed by a newline.''}
]

Then each line consists of a @emph{repeat} number, followed by pairs of
(number, character) @emph{chunks}. We'll assume here that the intent of the lowercased character @litchar{b} is to represent the printing of a 1-character whitespace @racket[" "], and for other uppercase letters to represent the printing of themselves.

By understanding the pieces of each line, we can more easily capture that meaning in a grammar. Once we have each instruction of our ASCII DSL in a structured format, we should be able to parse it.

Here's a first pass at expressing the structure of these line-drawing programs.

@subsection{Parsing the concrete syntax}

@filebox["simple-line-drawing.rkt"]{
@verbatim|{
#lang brag
drawing: rows*
rows: repeat chunk+ ";"
repeat: INTEGER
chunk: INTEGER STRING
}|
}

@margin-note{@secref{brag-syntax} describes @tt{brag}'s syntax in more detail.}
We write a @tt{brag} program as an BNF grammar, where patterns can be:
@itemize[
@item{the names of other rules (e.g. @racket[chunk])}
@item{literal and symbolic token names (e.g. @racket[";"], @racket[INTEGER])}
@item{quantified patterns (e.g. @litchar{+} to represent one-or-more repetitions)}
]
The result of a @tt{brag} program is a module with a @racket[parse] function
that can parse tokens and produce a syntax object as a result.

Let's exercise this function:
@interaction[#:eval my-eval
(require brag/support)
@eval:alts[(require "simple-line-drawing.rkt") 
           (require brag/examples/simple-line-drawing)]
(define stx
  (parse (list (token 'INTEGER 6) 
               (token 'INTEGER 2)
               (token 'STRING " ")
               (token 'INTEGER 3)
               (token 'STRING "X")
               ";")))
(syntax->datum stx)
]

A @emph{token} is the smallest meaningful element of a source program. Tokens can be  strings, symbols, or instances of the @racket[token] data structure. (Plus a few other special cases, which we'll discuss later.) Usually, a token holds a single character from the source program. But sometimes it makes sense to package a sequence of characters into a single token, if the sequence has an indivisible meaning.

If possible, we also want to attach source location information to each token. Why? Because this informatino will be incorporated into the syntax objects produced by @racket[parse].

A parser often works in conjunction with a helper function called a @emph{lexer} that converts the raw code of the source program into tokens. The @racketmodname[parser-tools/lex] library can help us write a position-sensitive
tokenizer:

@interaction[#:eval my-eval
(require parser-tools/lex)
(define (tokenize ip)
  (port-count-lines! ip)
  (define my-lexer
    (lexer-src-pos 
      [(repetition 1 +inf.0 numeric)
       (token 'INTEGER (string->number lexeme))]
      [upper-case
       (token 'STRING lexeme)]
      ["b"
       (token 'STRING " ")]
      [";"
       (token ";" lexeme)]
      [whitespace
       (token 'WHITESPACE lexeme #:skip? #t)]
      [(eof)
       (void)]))
  (define (next-token) (my-lexer ip))
  next-token)

(define a-sample-input-port (open-input-string "6 2 b 3 X;"))
(define token-thunk (tokenize a-sample-input-port))
@code:comment{Now we can pass token-thunk to the parser:}
(define another-stx (parse token-thunk))
(syntax->datum another-stx)
@code:comment{The syntax object has location information:}
(syntax-line another-stx)
(syntax-column another-stx)
(syntax-span another-stx)
]


Note also from this lexer example: 

@itemize[

@item{@racket[parse] accepts as input either a sequence of tokens, or a
function that produces tokens (which @racket[parse] will call repeatedly to get the next token).}

@item{As an alternative to the basic @racket[token] structure, a token can also be an instance of the @racket[position-token] structure (also found in @racketmodname[parser-tools/lex]). In that case, the token will try to derive its position from that of the position-token.}

@item{@racket[parse] will stop if it gets @racket[void] (or @racket['eof]) as a token.}

@item{@racket[parse] will skip any token that has
@racket[#:skip?] attribute set to @racket[#t]. For instance, tokens representing comments often use @racket[#:skip?].}

]


@subsection{From parsing to interpretation}

We now have a parser for programs written in this simple-line-drawing language.
Our parser will return syntax objects:

@interaction[#:eval my-eval
(define parsed-program
  (parse (tokenize (open-input-string "3 9 X; 6 3 b 3 X 3 b; 3 9 X;"))))
(syntax->datum parsed-program)
]

Better still, these syntax objects will have a predictable
structure that follows the grammar:

@racketblock[
    (drawing (rows (repeat <number>)
                   (chunk <number> <string>) ... ";")
             ...)
]

where @racket[drawing], @racket[rows], @racket[repeat], and @racket[chunk]
should be treated literally, and everything else will be numbers or strings.


Still, these syntax-object values are just inert structures. How do we
interpret them, and make them @emph{print}?  We claimed at the beginning of
this section that these syntax objects should be easy to interpret. So let's do it.

@margin-note{This is a very quick-and-dirty treatment of @racket[syntax-parse].
See the @racketmodname[syntax/parse] documentation for a gentler guide to its
features.}  Racket provides a special form called @racket[syntax-parse] in the
@racketmodname[syntax/parse] library. @racket[syntax-parse] lets us do a
structural case-analysis on syntax objects: we provide it a set of patterns to
parse and actions to perform when those patterns match.


As a simple example, we can write a function that looks at a syntax object and
says @racket[#t] if it's the literal @racket[yes], and @racket[#f] otherwise:

@interaction[#:eval my-eval
(require syntax/parse)
@code:comment{yes-syntax-object?: syntax-object -> boolean}
@code:comment{Returns true if the syntax-object is yes.}
(define (yes-syntax-object? stx)
  (syntax-parse stx
    [(~literal yes)
     #t]
    [else
     #f]))
(yes-syntax-object? #'yes)
(yes-syntax-object? #'nooooooooooo)
]

Here, we use @racket[~literal] to let @racket[syntax-parse] know that
@racket[yes] should show up literally in the syntax object. The patterns can
also have some structure to them, such as:
@racketblock[({~literal drawing} rows-stxs ...)]
which matches on syntax objects that begin, literally, with @racket[drawing],
followed by any number of rows (which are syntax objects too).


Now that we know a little bit more about @racket[syntax-parse], 
we can use it to do a case analysis on the syntax
objects that our @racket[parse] function gives us.
We start by defining a function on syntax objects of the form @racket[(drawing
rows-stx ...)].
@interaction[#:eval my-eval
(define (interpret-drawing drawing-stx)
  (syntax-parse drawing-stx
    [({~literal drawing} rows-stxs ...)

     (for ([rows-stx (syntax->list #'(rows-stxs ...))])
       (interpret-rows rows-stx))]))]

When we encounter a syntax object with @racket[(drawing rows-stx
...)], then @racket[interpret-rows] each @racket[rows-stx].

@;The pattern we
@;express in @racket[syntax-parse] above marks what things should be treated
@;literally, and the @racket[...] is a a part of the pattern matching language
@;known by @racket[syntax-parse] that lets us match multiple instances of the
@;last pattern.


Let's define @racket[interpret-rows] now:
@interaction[#:eval my-eval
(define (interpret-rows rows-stx)
  (syntax-parse rows-stx
    [({~literal rows}
      ({~literal repeat} repeat-number)
      chunks ... ";")

     (for ([i (syntax-e #'repeat-number)])
       (for ([chunk-stx (syntax->list #'(chunks ...))])
         (interpret-chunk chunk-stx))
       (newline))]))]

For a @racket[rows], we extract out the @racket[repeat-number] out of the
syntax object and use it as the range of the @racket[for] loop. The inner loop
walks across each @racket[chunk-stx] and calls @racket[interpret-chunk] on it.


Finally, we need to write a definition for @racket[interpret-chunk]. We want
it to extract out the @racket[chunk-size] and @racket[chunk-string] portions,
and print to standard output:

@interaction[#:eval my-eval
(define (interpret-chunk chunk-stx)
  (syntax-parse chunk-stx
    [({~literal chunk} chunk-size chunk-string)

     (for ([k (syntax-e #'chunk-size)])
       (display (syntax-e #'chunk-string)))]))
]


@margin-note{Here are the definitions in a single file:
@link["examples/simple-line-drawing/interpret.rkt"]{interpret.rkt}.}
With these definitions in hand, now we can pass it syntax objects 
that we construct directly by hand:

@interaction[#:eval my-eval
(interpret-chunk #'(chunk 3 "X"))
(interpret-drawing #'(drawing (rows (repeat 5) (chunk 3 "X") ";")))
]

or we can pass it the result generated by our parser:
@interaction[#:eval my-eval
(define parsed-program
  (parse (tokenize (open-input-string "3 9 X; 6 3 b 3 X 3 b; 3 9 X;"))))
(interpret-drawing parsed-program)]

And now we've got an interpreter!


@subsection{From interpretation to compilation}

@margin-note{For a gentler tutorial on writing @litchar{#lang} extensions, see:
@link["http://hashcollision.org/brainfudge"]{F*dging up a Racket}.}  (Just as a
warning: the following material is slightly more advanced, but shows how
writing a compiler for the line-drawing language reuses the ideas for the
interpreter.)

Wouldn't it be nice to be able to write something like:

@nested[#:style 'inset]{
@verbatim|{
3 9 X;
6 3 b 3 X 3 b;
3 9 X;
}|}

and have Racket automatically compile this down to something like this?
@racketblock[
(for ([i 3])
  (for ([k 9]) (displayln "X"))
  (newline))

(for ([i 6])
  (for ([k 3]) (displayln " "))
  (for ([k 3]) (displayln "X"))
  (for ([k 3]) (displayln " "))
  (newline))

(for ([i 3])
  (for ([k 9]) (displayln "X"))
  (newline))
]

Well, of course it won't work: we don't have a @litchar{#lang} line.

Let's add one.

@filebox["letter-i.rkt"]{
@verbatim|{
#lang brag/examples/simple-line-drawing
3 9 X;
6 3 b 3 X 3 b;
3 9 X;
}|
}

Now @filepath{letter-i.rkt} is a program.


How does this work?  From the previous sections, we've seen how to take the
contents of a file and interpret it. What we want to do now is teach Racket
how to compile programs labeled with this @litchar{#lang} line. We'll do two
things:

@itemize[
@item{Tell Racket to use the @tt{brag}-generated parser and lexer we defined
earlier whenever it sees a program written with
@litchar{#lang brag/examples/simple-line-drawing}.}

@item{Define transformation rules for @racket[drawing], @racket[rows], and
      @racket[chunk] to rewrite these into standard Racket forms.}
]

The second part, the writing of the transformation rules, will look very
similar to the definitions we wrote for the interpreter, but the transformation
will happen at compile-time. (We @emph{could} just resort to simply calling
into the interpreter we just wrote up, but this section is meant to show that
compilation is also viable.)


We do the first part by defining a @emph{module reader}: a
@link["http://docs.racket-lang.org/guide/syntax_module-reader.html"]{module
reader} tells Racket how to parse and compile a file. Whenever Racket sees a
@litchar{#lang <name>}, it looks for a corresponding module reader in
@filepath{<name>/lang/reader}.

Here's the definition for
@filepath{brag/examples/simple-line-drawing/lang/reader.rkt}:

@filebox["brag/examples/simple-line-drawing/lang/reader.rkt"]{
@codeblock|{
#lang s-exp syntax/module-reader
brag/examples/simple-line-drawing/semantics
#:read my-read
#:read-syntax my-read-syntax
#:whole-body-readers? #t

(require brag/examples/simple-line-drawing/lexer
         brag/examples/simple-line-drawing/grammar)

(define (my-read in)
  (syntax->datum (my-read-syntax #f in)))

(define (my-read-syntax src ip)
  (list (parse src (tokenize ip))))
}|
}

We use a helper module @racketmodname[syntax/module-reader], which provides
utilities for creating a module reader. It uses the lexer and
@tt{brag}-generated parser we defined earlier, and also tells Racket that it should compile the forms in the syntax
object using a module called @filepath{semantics.rkt}.

@margin-note{For a systematic treatment on capturing the semantics of
a language, see @link["http://cs.brown.edu/~sk/Publications/Books/ProgLangs/"]{Programming Languages: Application and
Interpretation}.}

Let's look into @filepath{semantics.rkt} and see what's involved in
compilation:
@filebox["brag/examples/simple-line-drawing/semantics.rkt"]{
@codeblock|{
#lang racket/base
(require (for-syntax racket/base syntax/parse))

(provide #%module-begin
         ;; We reuse Racket's treatment of raw datums, specifically
         ;; for strings and numbers:
         #%datum
         
         ;; And otherwise, we provide definitions of these three forms.
         ;; During compiliation, Racket uses these definitions to 
         ;; rewrite into for loops, displays, and newlines.
         drawing rows chunk)

;; Define a few compile-time functions to do the syntax rewriting:
(begin-for-syntax
  (define (compile-drawing drawing-stx)
    (syntax-parse drawing-stx
      [({~literal drawing} rows-stxs ...)

     (syntax/loc drawing-stx
       (begin rows-stxs ...))]))

  (define (compile-rows rows-stx)
    (syntax-parse rows-stx
      [({~literal rows}
        ({~literal repeat} repeat-number)
        chunks ... 
        ";")

       (syntax/loc rows-stx
         (for ([i repeat-number])
           chunks ...
           (newline)))]))

  (define (compile-chunk chunk-stx)
    (syntax-parse chunk-stx
      [({~literal chunk} chunk-size chunk-string)

       (syntax/loc chunk-stx
         (for ([k chunk-size])
           (display chunk-string)))])))


;; Wire up the use of "drawing", "rows", and "chunk" to these
;; transformers:
(define-syntax drawing compile-drawing)
(define-syntax rows compile-rows)
(define-syntax chunk compile-chunk)
}|
}

The semantics hold definitions for @racket[compile-drawing],
@racket[compile-rows], and @racket[compile-chunk], similar to what we had for
interpretation with @racket[interpret-drawing], @racket[interpret-rows], and
@racket[interpret-chunk]. However, compilation is not the same as
interpretation: each definition does not immediately execute the act of
drawing, but rather returns a syntax object whose evaluation will do the actual
work.

There are a few things to note:

@itemize[

@item{@tt{brag}'s native data structure is the syntax object because the
majority of Racket's language-processing infrastructure knows how to read and
write this structured value.}


@item{
@margin-note{By the way, we can just as easily rewrite the semantics so that
@racket[compile-rows] does explicitly call @racket[compile-chunk]. Often,
though, it's easier to write the transformation functions in this piecemeal way
and depend on the Racket macro expansion system to do the rewriting as it
encounters each of the forms.}
Unlike in interpretation, @racket[compile-rows] doesn't
compile each chunk by directly calling @racket[compile-chunk]. Rather, it
depends on the Racket macro expander to call each @racket[compile-XXX] function
as it encounters a @racket[drawing], @racket[rows], or @racket[chunk] in the
parsed value. The three statements at the bottom of @filepath{semantics.rkt} inform
the macro expansion system to do this:

@racketblock[
(define-syntax drawing compile-drawing)
(define-syntax rows compile-rows)
(define-syntax chunk compile-chunk)
]}
]


Altogether, @tt{brag}'s intent is to be a parser generator generator for Racket
that's easy and fun to use. It's meant to fit naturally with the other tools
in the Racket language toolchain. Hopefully, it will reduce the friction in
making new languages with alternative concrete syntaxes.

The rest of this document describes the @tt{brag} language and the parsers it
generates.


@;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

@section{The language}

@subsection[#:tag "brag-syntax"]{Syntax and terminology}
A program in the @tt{brag} language consists of the language line
@litchar{#lang brag}, followed by a collection of @tech{rule}s and
@tech{line comment}s.

A @deftech{rule} is a sequence consisting of: a @tech{rule identifier}, a colon
@litchar{":"}, and a @tech{pattern}.

A @deftech{rule identifier} is an @tech{identifier} that is not in upper case.

A @deftech{token identifier} is an @tech{identifier} that is in upper case.

An @deftech{identifier} is a character sequence of letters, numbers, and
characters in @racket["-.!$%&/<=>?^_~@"]. It must not contain
@litchar{*} or @litchar{+}, as those characters are used to denote
quantification.


A @deftech{pattern} is one of the following:
@itemize[
@item{an implicit sequence of @tech{pattern}s separated by whitespace}
@item{a terminal: either a literal string or a @tech{token identifier}}
@item{a @tech{rule identifier}}
@item{a @deftech{choice pattern}: a sequence of @tech{pattern}s delimited with @litchar{|} characters.}
@item{a @deftech{quantifed pattern}: a @tech{pattern} followed by either @litchar{*} (``zero or more'') or @litchar{+} (``one or more'')}
@item{an @deftech{optional pattern}: a @tech{pattern} surrounded by @litchar{[} and @litchar{]}}
@item{an explicit sequence: a @tech{pattern} surrounded by @litchar{(} and @litchar{)}}]

A @deftech{line comment} begins with either @litchar{#} or @litchar{;} and
continues till the end of the line.


For example, in the following program:
@nested[#:style 'inset
@verbatim|{
#lang brag
;; A parser for a silly language
sentence: verb optional-adjective object
verb: greeting
optional-adjective: ["happy" | "frumpy"]
greeting: "hello" | "hola" | "aloha"
object: "world" | WORLD
}|]

the elements @tt{sentence}, @tt{verb}, @tt{greeting}, and @tt{object} are rule
identifiers. The first rule, @litchar{sentence: verb optional-adjective
object}, is a rule whose right side is an implicit pattern sequence of three
sub-patterns. The uppercased @tt{WORLD} is a token identifier. The fourth rule in the program associates @tt{greeting} with a @tech{choice pattern}.


More examples:
@itemize[

@item{A
BNF for binary
strings that contain an equal number of zeros and ones.
@verbatim|{
#lang brag
equal: [zero one | one zero]   ;; equal number of "0"s and "1"s.
zero: "0" equal | equal "0"    ;; has an extra "0" in it.
one: "1" equal | equal "1"     ;; has an extra "1" in it.
}|
}

@item{A BNF for
@link["http://www.json.org/"]{JSON}-like structures.
@verbatim|{
#lang brag
json: number | string
    | array  | object
number: NUMBER
string: STRING
array: "[" [json ("," json)*] "]"
object: "{" [kvpair ("," kvpair)*] "}"
kvpair: ID ":" json
}|
}
]


@subsection{Syntax errors}

Besides the basic syntax errors that can occur with a malformed grammar, there
are a few other classes of situations that @litchar{#lang brag} will consider
as syntax errors.

@tt{brag} will raise a syntax error if the grammar:
@itemize[
@item{doesn't have any rules.}

@item{has a rule with the same left hand side as any other rule.}

@item{refers to rules that have not been defined. e.g. the
following program:
@nested[#:style 'code-inset
@verbatim|{
#lang brag
foo: [bar]
}|
]
should raise an error because @tt{bar} has not been defined, even though
@tt{foo} refers to it in an @tech{optional pattern}.}


@item{uses the token name @racket[EOF]; the end-of-file token type is reserved
for internal use by @tt{brag}.}


@item{contains a rule that has no finite derivation. e.g. the following
program:
@nested[#:style 'code-inset
@verbatim|{
#lang brag
infinite-a: "a" infinite-a
}|
]
should raise an error because no finite sequence of tokens will satisfy
@tt{infinite-a}.}

]

Otherwise, @tt{brag} should be fairly tolerant and permit even ambiguous
grammars.

@subsection{Semantics}
@declare-exporting[brag/examples/nested-word-list]

A program written in @litchar{#lang brag} produces a module that provides a few
bindings. The most important of these is @racket[parse]:

@defproc[(parse [source any/c #f] 
                [token-source (or/c (sequenceof token)
                                    (-> token))])
         syntax?]{

Parses the sequence of @tech{tokens} according to the rules in the grammar, using the
first rule as the start production. The parse must completely consume
@racket[token-source].

The @deftech{token source} can either be a sequence, or a 0-arity function that
produces @tech{tokens}.

A @deftech{token} in @tt{brag} can be any of the following values:
@itemize[
@item{a string}
@item{a symbol}
@item{an instance produced by @racket[token]}
@item{an instance produced by the token constructors of @racketmodname[parser-tools/lex]}
@item{an instance of @racketmodname[parser-tools/lex]'s @racket[position-token] whose 
      @racket[position-token-token] is a @tech{token}.}
]

A token whose type is either @racket[void] or @racket['EOF] terminates the
source.


If @racket[parse] succeeds, it will return a structured syntax object. The
structure of the syntax object follows the overall structure of the rules in
the BNF grammar. For each rule @racket[r] and its associated pattern @racket[p],
@racket[parse] generates a syntax object @racket[#'(r p-value)] where
@racket[p-value]'s structure follows a case analysis on @racket[p]:

@itemize[
@item{For implicit and explicit sequences of @tech{pattern}s @racket[p1],
      @racket[p2], ..., the corresponding values, spliced into the
      structure.}
@item{For terminals, the value associated to the token.}
@item{For @tech{rule identifier}s: the associated parse value for the rule.}
@item{For @tech{choice pattern}s: the associated parse value for one of the matching subpatterns.}
@item{For @tech{quantifed pattern}s and @tech{optional pattern}s: the corresponding values, spliced into the structure.}
]

Consequently, it's only the presence of @tech{rule identifier}s in a rule's
pattern that informs the parser to introduces nested structure into the syntax
object.


If the grammar has ambiguity, @tt{brag} will choose and return a parse, though
it does not guarantee which one it chooses.


If the parse cannot be performed successfully, or if a token in the
@racket[token-source] uses a type that isn't mentioned in the grammar, then
@racket[parse] raises an instance of @racket[exn:fail:parsing].}


@defproc[(parse-tree [source any/c #f] 
                [token-source (or/c (sequenceof token)
                                    (-> token))])
         list?]{
Same as @racket[parse], but the result is converted into a visible parse tree. Useful for testing or debugging a parser.
}


@defform[#:id make-rule-parser
         (make-rule-parser name)]{
Constructs a parser for the @racket[name] of one of the non-terminals
in the grammar. 

For example, given the @tt{brag} program
@filepath{simple-arithmetic-grammar.rkt}:
@filebox["simple-arithmetic-grammar.rkt"]{
@verbatim|{
#lang brag
expr : term ('+' term)*
term : factor ('*' factor)*
factor : INT
}|
}
the following interaction shows how to extract a parser for @racket[term]s.
@interaction[#:eval my-eval
@eval:alts[(require "simple-arithmetic-grammar.rkt") 
                    (require brag/examples/simple-arithmetic-grammar)]
(define term-parse (make-rule-parser term))
(define tokens (list (token 'INT 3) 
                     "*" 
                     (token 'INT 4)))
(syntax->datum (parse tokens))
(syntax->datum (term-parse tokens))

(define another-token-sequence
  (list (token 'INT 1) "+" (token 'INT 2)
        "*" (token 'INT 3)))
(syntax->datum (parse another-token-sequence))
@code:comment{Note that term-parse will break on another-token-sequence}
@code:comment{as it does not know what to do with the "+"}
(term-parse another-token-sequence)
]

}


@defthing[all-token-types (setof symbol?)]{
A set of all the token types used in a grammar.

For example:
@interaction[#:eval my-eval
@eval:alts[(require "simple-arithmetic-grammar.rkt") 
                    (require brag/examples/simple-arithmetic-grammar)]
all-token-types
]

}


@section{Support API}

@defmodule[brag/support]

The @racketmodname[brag/support] module provides functions to interact with
@tt{brag} programs. The most useful is the @racket[token] function, which
produces tokens to be parsed.

@defproc[(token [type (or/c string? symbol?)]
                [val any/c #f]
                [#:line line (or/c positive-integer? #f) #f]
                [#:column column (or/c natural-number? #f) #f]
                [#:offset offset (or/c positive-integer? #f) #f]
                [#:span span (or/c natural-number? #f) #f]
                [#:skip? skip? boolean? #f]
                )
         token-struct?]{
Creates instances of @racket[token-struct]s.

The syntax objects produced by a parse will inject the value @racket[val] in
place of the token name in the grammar.

If @racket[#:skip?] is true, then the parser will skip over it during a
parse.}


@defstruct[token-struct ([type symbol?]
                         [val any/c]
                         [offset (or/c positive-integer? #f)]
                         [line (or/c natural-number? #f)]
                         [column (or/c positive-integer? #f)]
                         [span (or/c natural-number? #f)]
                         [skip? boolean?])
                        #:transparent]{
The token structure type.

Rather than directly using the @racket[token-struct] constructor, please use
the helper function @racket[token] to construct instances.
}


@defstruct[(exn:fail:parsing exn:fail) 
           ([message string?]
            [continuation-marks continuation-mark-set?]
            [srclocs (listof srcloc?)])]{
The exception raised when parsing fails.

@racket[exn:fail:parsing] implements Racket's @racket[prop:exn:srcloc]
property, so if this exception reaches DrRacket's default error handler,
DrRacket should highlight the offending locations in the source.}

@section{Lexer support API}

@defmodule[brag/lexer-support]

In addition to the exports shown below, the @racketmodname[brag/lexer-support] module also provides everything from @racketmodname[brag/support], and everything from @racketmodname[parser-tools/lex].

@defproc[(apply-tokenizer [tokenizer procedure?] 
                [source-string (or/c string?
                                    input-port?)])
         list?]{
Repeatedly apply @racket[tokenizer] to @racket[source-string], gathering the resulting tokens into a list. Useful for testing or debugging a tokenizer.
}


@defproc[(trim-delimiters [left-delimiter string?]
[str string?]
[right-delimiter string?])
         string?]{
Remove @racket[left-delimiter] from the left side of @racket[str], and @racket[right-delimiter] from its right side. Intended as a helper function for @racket[delimited-by].
}


@defform[(:* re ...)]{

Repetition of @racket[re] sequence 0 or more times.}

@defform[(:+ re ...)]{

Repetition of @racket[re] sequence 1 or more times.}

@defform[(:? re ...)]{

Zero or one occurrence of @racket[re] sequence.}

@defform[(:= n re ...)]{

Exactly @racket[n] occurrences of @racket[re] sequence, where
@racket[n] must be a literal exact, non-negative number.}

@defform[(:>= n re ...)]{

At least @racket[n] occurrences of @racket[re] sequence, where
@racket[n] must be a literal exact, non-negative number.}

@defform[(:** n m re ...)]{

Between @racket[n] and @racket[m] (inclusive) occurrences of
@racket[re] sequence, where @racket[n] must be a literal exact,
non-negative number, and @racket[m] must be literally either
@racket[#f], @racket[+inf.0], or an exact, non-negative number; a
@racket[#f] value for @racket[m] is the same as @racket[+inf.0].}

@defform[(:or re ...)]{

Same as @racket[(union re ...)].}

@deftogether[(
@defform[(:: re ...)]
@defform[(:seq re ...)]
)]{

Both forms concatenate the @racket[re]s.}

@defform[(:& re ...)]{

Intersects the @racket[re]s.}

@defform[(:- re ...)]{

The set difference of the @racket[re]s.}

@defform[(:~ re ...)]{

Character-set complement, which each @racket[re] must match exactly
one character.}

@defform[(:/ char-or-string ...)]{

Character ranges, matching characters between successive pairs of
characters.}

@defform[(delimited-by open close)]{

A string that is bounded by the @racket[open] and @racket[close] delimiters. Matching is non-greedy (meaning, it stops at the first occurence of @racket[close]). The resulting lexeme includes the delimiters. To remove them, see @racket[trim-delimiters].}


@close-eval[my-eval]
-												add br/ragg

											
										
										
											8 years ago
+								#lang scribble/manual
 								@(require scribble/eval
 								          racket/date
 								          file/md5
 								          (for-label racket
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								                     brag/support
-												add `brag/lexer-support`

											
										
										
											8 years ago
+								                     brag/lexer-support
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								                     brag/examples/nested-word-list
-												add br/ragg

											
										
										
											8 years ago
+								                     (only-in parser-tools/lex lexer-src-pos)
 								                     (only-in syntax/parse syntax-parse ~literal)))
 								@(define (lookup-date filename [default ""])
 								   (cond
 								     [(file-exists? filename)
 								      (define modify-seconds (file-or-directory-modify-seconds filename))
 								      (define a-date (seconds->date modify-seconds))
 								      (date->string a-date)]
 								     [else
 								      default]))
 								@(define (compute-md5sum filename [default ""])
 								   (cond [(file-exists? filename)
 								          (bytes->string/utf-8 (call-with-input-file filename md5 #:mode 'binary))]
 								         [else
 								          default]))
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								@title{brag: the Beautiful Racket AST Generator}
-												update brag docs

											
										
										
											8 years ago
+								@author["Danny Yoo (95%)" "Matthew Butterick (5%)"]
-												add br/ragg

											
										
										
											8 years ago
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								@defmodulelang[brag]
-												add br/ragg

											
										
										
											8 years ago
-												more typo

											
										
										
											8 years ago
+								@section{Quick start}
-												add br/ragg

											
										
										
											8 years ago
 								@(define my-eval (make-base-eval))
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								@(my-eval '(require brag/examples/nested-word-list
-												add br/ragg

											
										
										
											8 years ago
+								                          racket/list
 								                          racket/match))
-												update brag docs

											
										
										
											8 years ago
+								Suppose we're given the
-												add br/ragg

											
										
										
											8 years ago
+								following string:
 								@racketblock["(radiant (humble))"]
-												update brag docs

											
										
										
											8 years ago
+								How would we turn this string into a structured value?  That is, how would we @emph{parse} it? (Let's also suppose we've never heard of @racket[read].)
-												add br/ragg

											
										
										
											8 years ago
-												update brag docs

											
										
										
											8 years ago
+								First, we need to consider the structure of the things we'd like to parse. The
 								string above looks like a nested list of words. Good start.
-												add br/ragg

											
										
										
											8 years ago
-												update brag docs

											
										
										
											8 years ago
+								Second, how might we describe this formally — meaning, in a way that a computer could understand? A common notation to describe the structure of these things is @link["http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form"]{Backus-Naur Form} (BNF). So let's try to notate the structure of nested word lists in BNF.
-												add br/ragg

											
										
										
											8 years ago
 								@nested[#:style 'code-inset]{
 								@verbatim{
 								nested-word-list: WORD
 								                | LEFT-PAREN nested-word-list* RIGHT-PAREN
 								}}
-												typo

											
										
										
											8 years ago
+								What we intend by this notation is this: @racket[nested-word-list] is either a @racket[WORD], or a parenthesized list of @racket[nested-word-list]s. We use the character @litchar{*} to represent zero or more repetitions of the previous thing. We treat the uppercased @racket[LEFT-PAREN], @racket[RIGHT-PAREN], and @racket[WORD] as placeholders for @emph{tokens} (a @tech{token} being the smallest meaningful item in the parsed string):
-												add br/ragg

											
										
										
											8 years ago
 								Here are a few examples of tokens:
 								@interaction[#:eval my-eval
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								(require brag/support)
-												add br/ragg

											
										
										
											8 years ago
+								(token 'LEFT-PAREN)
 								(token 'WORD "crunchy" #:span 7)
 								(token 'RIGHT-PAREN)]
-												update brag docs

											
										
										
											8 years ago
+								This BNF description is also known as a @deftech{grammar}. Just as it does in a natural language like English or French, a grammar describes something in terms of what elements can fit where.
-												add br/ragg

											
										
										
											8 years ago
-												update brag docs

											
										
										
											8 years ago
+								Have we made progress?  We have a valid grammar. But we're still missing a @emph{parser}: a function that can use that description to make structures out of a sequence of tokens.
-												add br/ragg

											
										
										
											8 years ago
-												update brag docs

											
										
										
											8 years ago
+								Meanwhile, it's clear that we don't yet have a valid program because there's no @litchar{#lang} line. Let's add one: put @litchar{#lang brag} at the top of the grammar, and save it as a file called @filepath{nested-word-list.rkt}.
-												add br/ragg

											
										
										
											8 years ago
 								@filebox["nested-word-list.rkt"]{
 								@verbatim{
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								#lang brag
-												add br/ragg

											
										
										
											8 years ago
+								nested-word-list: WORD
 								                | LEFT-PAREN nested-word-list* RIGHT-PAREN
 								}}
-												update brag docs

											
										
										
											8 years ago
+								Now it's a proper program. But what does it do?
-												add br/ragg

											
										
										
											8 years ago
 								@interaction[#:eval my-eval
 								@eval:alts[(require "nested-word-list.rkt") (void)]
 								parse
 								]
-												update brag docs

											
										
										
											8 years ago
+								It gives us a @racket[parse] function. Let's investigate what @racket[parse]
 								does. What happens if we pass it a sequence of tokens?
-												add br/ragg

											
										
										
											8 years ago
 								@interaction[#:eval my-eval
 								             (define a-parsed-value
 								               (parse (list (token 'LEFT-PAREN "(")
 								                            (token 'WORD "some")
 								                            (token 'LEFT-PAREN "[")
 								                            (token 'WORD "pig")
 								                            (token 'RIGHT-PAREN "]")
 								                            (token 'RIGHT-PAREN ")"))))
 								             a-parsed-value]
-												more typo

											
										
										
											8 years ago
+								Those who have messed around with macros will recognize this as a @seclink["stx-obj" #:doc '(lib "scribblings/guide/guide.scrbl")]{syntax object}.
-												update brag docs

											
										
										
											8 years ago
-												add br/ragg

											
										
										
											8 years ago
+								@interaction[#:eval my-eval
 								(syntax->datum a-parsed-value)
 								]
 								That's @racket[(some [pig])], essentially.
-												update brag docs

											
										
										
											8 years ago
+								What happens if we pass our @racket[parse] function a bigger source of tokens?
-												add br/ragg

											
										
										
											8 years ago
+								@interaction[#:eval my-eval
 								@code:comment{tokenize: string -> (sequenceof token-struct?)}
 								@code:comment{Generate tokens from a string:}
 								(define (tokenize s)
 								  (for/list ([str (regexp-match* #px"\\(|\\)|\\w+" s)])
 								    (match str
 								      ["("
 								       (token 'LEFT-PAREN str)]
 								      [")"
 								       (token 'RIGHT-PAREN str)]
 								      [else
 								       (token 'WORD str)])))
 								@code:comment{For example:}
-												replace missing line

											
										
										
											8 years ago
+								(define token-source (tokenize "(welcome (to (((brag)) ())))"))
-												add br/ragg

											
										
										
											8 years ago
+								(define v (parse token-source))
 								(syntax->datum v)
 								]
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								Welcome to @tt{brag}.
-												add br/ragg

											
										
										
											8 years ago
 								@;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 								@;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 								@section{Introduction}
-												more typo

											
										
										
											8 years ago
+								@tt{brag} is a parser generator designed to be easy
-												update brag docs

											
										
										
											8 years ago
+								to use:
-												add br/ragg

											
										
										
											8 years ago
+								@itemize[
-												update brag docs

											
										
										
											8 years ago
+								@item{It provides a @litchar{#lang} for writing BNF grammars.
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								A module written in @litchar{#lang brag} automatically generates a
-												update brag docs

											
										
										
											8 years ago
+								parser. The output of this parser tries to follow
-												add br/ragg

											
										
										
											8 years ago
+								@link["http://en.wikipedia.org/wiki/How_to_Design_Programs"]{HTDP}
-												update brag docs

											
										
										
											8 years ago
+								guidelines. The structure of the grammar informs the structure of the
-												add br/ragg

											
										
										
											8 years ago
+								Racket syntax objects it generates.}
 								@item{The language uses a few conventions to simplify the expression of
-												update brag docs

											
										
										
											8 years ago
+								grammars. The first rule in the grammar is assumed to be the
 								starting production. Identifiers in @tt{UPPERCASE} are treated as
 								terminal tokens. All other identifiers are treated as nonterminals.}
-												add br/ragg

											
										
										
											8 years ago
-												update brag docs

											
										
										
											8 years ago
+								@item{Tokenizers can be developed independently of parsers.
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								@tt{brag} takes a liberal view on tokens: they can be strings,
-												update brag docs

											
										
										
											8 years ago
+								symbols, or instances constructed with @racket[token]. Tokens can optionally provide source location, in which case a syntax object generated by the parser will too.}
-												add br/ragg

											
										
										
											8 years ago
-												update brag docs

											
										
										
											8 years ago
+								@item{The parser can usually handle ambiguous grammars.}
-												add br/ragg

											
										
										
											8 years ago
-												update brag docs

											
										
										
											8 years ago
+								@item{It integrates with the rest of the Racket
-												add br/ragg

											
										
										
											8 years ago
+								@link["http://docs.racket-lang.org/guide/languages.html"]{language toolchain}.}
 								]
 								@subsection{Example: a small DSL for ASCII diagrams}
-												update brag docs

											
										
										
											8 years ago
+								@margin-note{This example is
 								@link["http://stackoverflow.com/questions/12345647/rewrite-this-script-by-designing-an-interpreter-in-racket"]{derived from a question}  on Stack Overflow.}
 								To understand @tt{brag}'s design, let's look
 								at a toy problem. We'd like to define a language for
 								drawing simple ASCII diagrams. So if we write something like this:
-												add br/ragg

											
										
										
											8 years ago
 								@nested[#:style 'inset]{
 								@verbatim|{
 9 X;
 3 b 3 X 3 b;
 9 X;
 								}|}
-												update brag docs

											
										
										
											8 years ago
+								It should generate the following picture:
-												add br/ragg

											
										
										
											8 years ago
 								@nested[#:style 'inset]{
 								@verbatim|{
 								XXXXXXXXX
 								XXXXXXXXX
 								XXXXXXXXX
 								   XXX
 								   XXX
 								   XXX
 								   XXX
 								   XXX
 								   XXX
 								XXXXXXXXX
 								XXXXXXXXX
 								XXXXXXXXX
 								}|}
 								@subsection{Syntax and semantics}
-												update brag docs

											
										
										
											8 years ago
-												more typo

											
										
										
											8 years ago
+								We're being somewhat casual with what we mean by the program above. Let's try to nail down some meanings.
-												update brag docs

											
										
										
											8 years ago
 								Each line of the program has a semicolon at the end, and describes the output of several @emph{rows} of the line drawing. Let's look at two of the lines in the example:
-												add br/ragg

											
										
										
											8 years ago
 								@itemize[
 								@item{@litchar{3 9 X;}: ``Repeat the following 3 times: print @racket["X"] nine times, followed by
 								a newline.''}
 								@item{@litchar{6 3 b 3 X 3 b;}: ``Repeat the following 6 times: print @racket[" "] three times,
 								followed by @racket["X"] three times, followed by @racket[" "] three times, followed by a newline.''}
 								]
 								Then each line consists of a @emph{repeat} number, followed by pairs of
-												update brag docs

											
										
										
											8 years ago
+								(number, character) @emph{chunks}. We'll assume here that the intent of the lowercased character @litchar{b} is to represent the printing of a 1-character whitespace @racket[" "], and for other uppercase letters to represent the printing of themselves.
-												add br/ragg

											
										
										
											8 years ago
-												update brag docs

											
										
										
											8 years ago
+								By understanding the pieces of each line, we can more easily capture that meaning in a grammar. Once we have each instruction of our ASCII DSL in a structured format, we should be able to parse it.
-												add br/ragg

											
										
										
											8 years ago
-												update brag docs

											
										
										
											8 years ago
+								Here's a first pass at expressing the structure of these line-drawing programs.
-												add br/ragg

											
										
										
											8 years ago
 								@subsection{Parsing the concrete syntax}
-												update brag docs

											
										
										
											8 years ago
-												add br/ragg

											
										
										
											8 years ago
+								@filebox["simple-line-drawing.rkt"]{
 								@verbatim|{
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								#lang brag
-												add br/ragg

											
										
										
											8 years ago
+								drawing: rows*
 								rows: repeat chunk+ ";"
 								repeat: INTEGER
 								chunk: INTEGER STRING
 								}|
 								}
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								@margin-note{@secref{brag-syntax} describes @tt{brag}'s syntax in more detail.}
-												update brag docs

											
										
										
											8 years ago
+								We write a @tt{brag} program as an BNF grammar, where patterns can be:
-												add br/ragg

											
										
										
											8 years ago
+								@itemize[
 								@item{the names of other rules (e.g. @racket[chunk])}
 								@item{literal and symbolic token names (e.g. @racket[";"], @racket[INTEGER])}
 								@item{quantified patterns (e.g. @litchar{+} to represent one-or-more repetitions)}
 								]
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								The result of a @tt{brag} program is a module with a @racket[parse] function
-												add br/ragg

											
										
										
											8 years ago
+								that can parse tokens and produce a syntax object as a result.
 								Let's exercise this function:
 								@interaction[#:eval my-eval
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								(require brag/support)
-												add br/ragg

											
										
										
											8 years ago
+								@eval:alts[(require "simple-line-drawing.rkt")
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								           (require brag/examples/simple-line-drawing)]
-												add br/ragg

											
										
										
											8 years ago
+								(define stx
 								  (parse (list (token 'INTEGER 6)
 								               (token 'INTEGER 2)
 								               (token 'STRING " ")
 								               (token 'INTEGER 3)
 								               (token 'STRING "X")
 								               ";")))
 								(syntax->datum stx)
 								]
-												update brag docs

											
										
										
											8 years ago
+								A @emph{token} is the smallest meaningful element of a source program. Tokens can be  strings, symbols, or instances of the @racket[token] data structure. (Plus a few other special cases, which we'll discuss later.) Usually, a token holds a single character from the source program. But sometimes it makes sense to package a sequence of characters into a single token, if the sequence has an indivisible meaning.
-												add br/ragg

											
										
										
											8 years ago
-												update brag docs

											
										
										
											8 years ago
+								If possible, we also want to attach source location information to each token. Why? Because this informatino will be incorporated into the syntax objects produced by @racket[parse].
-												add br/ragg

											
										
										
											8 years ago
-												update brag docs

											
										
										
											8 years ago
+								A parser often works in conjunction with a helper function called a @emph{lexer} that converts the raw code of the source program into tokens. The @racketmodname[parser-tools/lex] library can help us write a position-sensitive
-												add br/ragg

											
										
										
											8 years ago
+								tokenizer:
 								@interaction[#:eval my-eval
 								(require parser-tools/lex)
 								(define (tokenize ip)
 								  (port-count-lines! ip)
 								  (define my-lexer
 								    (lexer-src-pos
 								      [(repetition 1 +inf.0 numeric)
 								       (token 'INTEGER (string->number lexeme))]
 								      [upper-case
 								       (token 'STRING lexeme)]
 								      ["b"
 								       (token 'STRING " ")]
 								      [";"
 								       (token ";" lexeme)]
 								      [whitespace
 								       (token 'WHITESPACE lexeme #:skip? #t)]
 								      [(eof)
 								       (void)]))
 								  (define (next-token) (my-lexer ip))
 								  next-token)
 								(define a-sample-input-port (open-input-string "6 2 b 3 X;"))
 								(define token-thunk (tokenize a-sample-input-port))
 								@code:comment{Now we can pass token-thunk to the parser:}
 								(define another-stx (parse token-thunk))
 								(syntax->datum another-stx)
 								@code:comment{The syntax object has location information:}
 								(syntax-line another-stx)
 								(syntax-column another-stx)
 								(syntax-span another-stx)
 								]
-												update brag docs

											
										
										
											8 years ago
+								Note also from this lexer example:
-												add br/ragg

											
										
										
											8 years ago
+								@itemize[
-												update brag docs

											
										
										
											8 years ago
+								@item{@racket[parse] accepts as input either a sequence of tokens, or a
 								function that produces tokens (which @racket[parse] will call repeatedly to get the next token).}
-												add br/ragg

											
										
										
											8 years ago
-												update brag docs

											
										
										
											8 years ago
+								@item{As an alternative to the basic @racket[token] structure, a token can also be an instance of the @racket[position-token] structure (also found in @racketmodname[parser-tools/lex]). In that case, the token will try to derive its position from that of the position-token.}
-												add br/ragg

											
										
										
											8 years ago
-												update brag docs

											
										
										
											8 years ago
+								@item{@racket[parse] will stop if it gets @racket[void] (or @racket['eof]) as a token.}
-												add br/ragg

											
										
										
											8 years ago
-												update brag docs

											
										
										
											8 years ago
+								@item{@racket[parse] will skip any token that has
 								@racket[#:skip?] attribute set to @racket[#t]. For instance, tokens representing comments often use @racket[#:skip?].}
-												add br/ragg

											
										
										
											8 years ago
 								]
 								@subsection{From parsing to interpretation}
 								We now have a parser for programs written in this simple-line-drawing language.
-												update brag docs

											
										
										
											8 years ago
+								Our parser will return syntax objects:
-												add br/ragg

											
										
										
											8 years ago
+								@interaction[#:eval my-eval
 								(define parsed-program
 								  (parse (tokenize (open-input-string "3 9 X; 6 3 b 3 X 3 b; 3 9 X;"))))
 								(syntax->datum parsed-program)
 								]
-												update brag docs

											
										
										
											8 years ago
+								Better still, these syntax objects will have a predictable
 								structure that follows the grammar:
-												add br/ragg

											
										
										
											8 years ago
 								@racketblock[
 								    (drawing (rows (repeat <number>)
 								                   (chunk <number> <string>) ... ";")
 								             ...)
 								]
 								where @racket[drawing], @racket[rows], @racket[repeat], and @racket[chunk]
 								should be treated literally, and everything else will be numbers or strings.
-												update brag docs

											
										
										
											8 years ago
+								Still, these syntax-object values are just inert structures. How do we
 								interpret them, and make them @emph{print}?  We claimed at the beginning of
 								this section that these syntax objects should be easy to interpret. So let's do it.
-												add br/ragg

											
										
										
											8 years ago
 								@margin-note{This is a very quick-and-dirty treatment of @racket[syntax-parse].
 								See the @racketmodname[syntax/parse] documentation for a gentler guide to its
 								features.}  Racket provides a special form called @racket[syntax-parse] in the
-												update brag docs

											
										
										
											8 years ago
+								@racketmodname[syntax/parse] library. @racket[syntax-parse] lets us do a
-												add br/ragg

											
										
										
											8 years ago
+								structural case-analysis on syntax objects: we provide it a set of patterns to
 								parse and actions to perform when those patterns match.
 								As a simple example, we can write a function that looks at a syntax object and
 								says @racket[#t] if it's the literal @racket[yes], and @racket[#f] otherwise:
 								@interaction[#:eval my-eval
 								(require syntax/parse)
 								@code:comment{yes-syntax-object?: syntax-object -> boolean}
 								@code:comment{Returns true if the syntax-object is yes.}
 								(define (yes-syntax-object? stx)
 								  (syntax-parse stx
 								    [(~literal yes)
 								     #t]
 								    [else
 								     #f]))
 								(yes-syntax-object? #'yes)
 								(yes-syntax-object? #'nooooooooooo)
 								]
 								Here, we use @racket[~literal] to let @racket[syntax-parse] know that
-												update brag docs

											
										
										
											8 years ago
+								@racket[yes] should show up literally in the syntax object. The patterns can
-												add br/ragg

											
										
										
											8 years ago
+								also have some structure to them, such as:
 								@racketblock[({~literal drawing} rows-stxs ...)]
 								which matches on syntax objects that begin, literally, with @racket[drawing],
 								followed by any number of rows (which are syntax objects too).
 								Now that we know a little bit more about @racket[syntax-parse],
 								we can use it to do a case analysis on the syntax
 								objects that our @racket[parse] function gives us.
 								We start by defining a function on syntax objects of the form @racket[(drawing
 								rows-stx ...)].
 								@interaction[#:eval my-eval
 								(define (interpret-drawing drawing-stx)
 								  (syntax-parse drawing-stx
 								    [({~literal drawing} rows-stxs ...)
 								     (for ([rows-stx (syntax->list #'(rows-stxs ...))])
 								       (interpret-rows rows-stx))]))]
 								When we encounter a syntax object with @racket[(drawing rows-stx
 								...)], then @racket[interpret-rows] each @racket[rows-stx].
 								@;The pattern we
 								@;express in @racket[syntax-parse] above marks what things should be treated
 								@;literally, and the @racket[...] is a a part of the pattern matching language
 								@;known by @racket[syntax-parse] that lets us match multiple instances of the
 								@;last pattern.
 								Let's define @racket[interpret-rows] now:
 								@interaction[#:eval my-eval
 								(define (interpret-rows rows-stx)
 								  (syntax-parse rows-stx
 								    [({~literal rows}
 								      ({~literal repeat} repeat-number)
 								      chunks ... ";")
 								     (for ([i (syntax-e #'repeat-number)])
 								       (for ([chunk-stx (syntax->list #'(chunks ...))])
 								         (interpret-chunk chunk-stx))
 								       (newline))]))]
 								For a @racket[rows], we extract out the @racket[repeat-number] out of the
-												update brag docs

											
										
										
											8 years ago
+								syntax object and use it as the range of the @racket[for] loop. The inner loop
-												add br/ragg

											
										
										
											8 years ago
+								walks across each @racket[chunk-stx] and calls @racket[interpret-chunk] on it.
-												update brag docs

											
										
										
											8 years ago
+								Finally, we need to write a definition for @racket[interpret-chunk]. We want
-												add br/ragg

											
										
										
											8 years ago
+								it to extract out the @racket[chunk-size] and @racket[chunk-string] portions,
 								and print to standard output:
 								@interaction[#:eval my-eval
 								(define (interpret-chunk chunk-stx)
 								  (syntax-parse chunk-stx
 								    [({~literal chunk} chunk-size chunk-string)
 								     (for ([k (syntax-e #'chunk-size)])
 								       (display (syntax-e #'chunk-string)))]))
 								]
 								@margin-note{Here are the definitions in a single file:
 								@link["examples/simple-line-drawing/interpret.rkt"]{interpret.rkt}.}
 								With these definitions in hand, now we can pass it syntax objects
 								that we construct directly by hand:
 								@interaction[#:eval my-eval
 								(interpret-chunk #'(chunk 3 "X"))
 								(interpret-drawing #'(drawing (rows (repeat 5) (chunk 3 "X") ";")))
 								]
 								or we can pass it the result generated by our parser:
 								@interaction[#:eval my-eval
 								(define parsed-program
 								  (parse (tokenize (open-input-string "3 9 X; 6 3 b 3 X 3 b; 3 9 X;"))))
 								(interpret-drawing parsed-program)]
 								And now we've got an interpreter!
 								@subsection{From interpretation to compilation}
 								@margin-note{For a gentler tutorial on writing @litchar{#lang} extensions, see:
 								@link["http://hashcollision.org/brainfudge"]{F*dging up a Racket}.}  (Just as a
 								warning: the following material is slightly more advanced, but shows how
 								writing a compiler for the line-drawing language reuses the ideas for the
 								interpreter.)
 								Wouldn't it be nice to be able to write something like:
 								@nested[#:style 'inset]{
 								@verbatim|{
 9 X;
 3 b 3 X 3 b;
 9 X;
 								}|}
 								and have Racket automatically compile this down to something like this?
 								@racketblock[
 								(for ([i 3])
 								  (for ([k 9]) (displayln "X"))
 								  (newline))
 								(for ([i 6])
 								  (for ([k 3]) (displayln " "))
 								  (for ([k 3]) (displayln "X"))
 								  (for ([k 3]) (displayln " "))
 								  (newline))
 								(for ([i 3])
 								  (for ([k 9]) (displayln "X"))
 								  (newline))
 								]
 								Well, of course it won't work: we don't have a @litchar{#lang} line.
 								Let's add one.
 								@filebox["letter-i.rkt"]{
 								@verbatim|{
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								#lang brag/examples/simple-line-drawing
-												add br/ragg

											
										
										
											8 years ago
+9 X;
 3 b 3 X 3 b;
 9 X;
 								}|
 								}
 								Now @filepath{letter-i.rkt} is a program.
 								How does this work?  From the previous sections, we've seen how to take the
-												update brag docs

											
										
										
											8 years ago
+								contents of a file and interpret it. What we want to do now is teach Racket
 								how to compile programs labeled with this @litchar{#lang} line. We'll do two
-												add br/ragg

											
										
										
											8 years ago
+								things:
 								@itemize[
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								@item{Tell Racket to use the @tt{brag}-generated parser and lexer we defined
-												add br/ragg

											
										
										
											8 years ago
+								earlier whenever it sees a program written with
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								@litchar{#lang brag/examples/simple-line-drawing}.}
-												add br/ragg

											
										
										
											8 years ago
 								@item{Define transformation rules for @racket[drawing], @racket[rows], and
 								      @racket[chunk] to rewrite these into standard Racket forms.}
 								]
 								The second part, the writing of the transformation rules, will look very
 								similar to the definitions we wrote for the interpreter, but the transformation
-												update brag docs

											
										
										
											8 years ago
+								will happen at compile-time. (We @emph{could} just resort to simply calling
-												add br/ragg

											
										
										
											8 years ago
+								into the interpreter we just wrote up, but this section is meant to show that
 								compilation is also viable.)
 								We do the first part by defining a @emph{module reader}: a
 								@link["http://docs.racket-lang.org/guide/syntax_module-reader.html"]{module
-												update brag docs

											
										
										
											8 years ago
+								reader} tells Racket how to parse and compile a file. Whenever Racket sees a
-												add br/ragg

											
										
										
											8 years ago
+								@litchar{#lang <name>}, it looks for a corresponding module reader in
 								@filepath{<name>/lang/reader}.
 								Here's the definition for
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								@filepath{brag/examples/simple-line-drawing/lang/reader.rkt}:
-												add br/ragg

											
										
										
											8 years ago
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								@filebox["brag/examples/simple-line-drawing/lang/reader.rkt"]{
-												add br/ragg

											
										
										
											8 years ago
+								@codeblock|{
 								#lang s-exp syntax/module-reader
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								brag/examples/simple-line-drawing/semantics
-												add br/ragg

											
										
										
											8 years ago
+								#:read my-read
 								#:read-syntax my-read-syntax
 								#:whole-body-readers? #t
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								(require brag/examples/simple-line-drawing/lexer
 								         brag/examples/simple-line-drawing/grammar)
-												add br/ragg

											
										
										
											8 years ago
 								(define (my-read in)
 								  (syntax->datum (my-read-syntax #f in)))
 								(define (my-read-syntax src ip)
 								  (list (parse src (tokenize ip))))
 								}|
 								}
 								We use a helper module @racketmodname[syntax/module-reader], which provides
-												update brag docs

											
										
										
											8 years ago
+								utilities for creating a module reader. It uses the lexer and
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								@tt{brag}-generated parser we defined earlier, and also tells Racket that it should compile the forms in the syntax
-												add br/ragg

											
										
										
											8 years ago
+								object using a module called @filepath{semantics.rkt}.
 								@margin-note{For a systematic treatment on capturing the semantics of
 								a language, see @link["http://cs.brown.edu/~sk/Publications/Books/ProgLangs/"]{Programming Languages: Application and
 								Interpretation}.}
 								Let's look into @filepath{semantics.rkt} and see what's involved in
 								compilation:
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								@filebox["brag/examples/simple-line-drawing/semantics.rkt"]{
-												add br/ragg

											
										
										
											8 years ago
+								@codeblock|{
 								#lang racket/base
 								(require (for-syntax racket/base syntax/parse))
 								(provide #%module-begin
 								         ;; We reuse Racket's treatment of raw datums, specifically
 								         ;; for strings and numbers:
 								         #%datum
 								         ;; And otherwise, we provide definitions of these three forms.
 								         ;; During compiliation, Racket uses these definitions to
 								         ;; rewrite into for loops, displays, and newlines.
 								         drawing rows chunk)
 								;; Define a few compile-time functions to do the syntax rewriting:
 								(begin-for-syntax
 								  (define (compile-drawing drawing-stx)
 								    (syntax-parse drawing-stx
 								      [({~literal drawing} rows-stxs ...)
 								     (syntax/loc drawing-stx
 								       (begin rows-stxs ...))]))
 								  (define (compile-rows rows-stx)
 								    (syntax-parse rows-stx
 								      [({~literal rows}
 								        ({~literal repeat} repeat-number)
 								        chunks ...
 								        ";")
 								       (syntax/loc rows-stx
 								         (for ([i repeat-number])
 								           chunks ...
 								           (newline)))]))
 								  (define (compile-chunk chunk-stx)
 								    (syntax-parse chunk-stx
 								      [({~literal chunk} chunk-size chunk-string)
 								       (syntax/loc chunk-stx
 								         (for ([k chunk-size])
 								           (display chunk-string)))])))
 								;; Wire up the use of "drawing", "rows", and "chunk" to these
 								;; transformers:
 								(define-syntax drawing compile-drawing)
 								(define-syntax rows compile-rows)
 								(define-syntax chunk compile-chunk)
 								}|
 								}
 								The semantics hold definitions for @racket[compile-drawing],
 								@racket[compile-rows], and @racket[compile-chunk], similar to what we had for
 								interpretation with @racket[interpret-drawing], @racket[interpret-rows], and
-												update brag docs

											
										
										
											8 years ago
+								@racket[interpret-chunk]. However, compilation is not the same as
-												add br/ragg

											
										
										
											8 years ago
+								interpretation: each definition does not immediately execute the act of
 								drawing, but rather returns a syntax object whose evaluation will do the actual
 								work.
 								There are a few things to note:
 								@itemize[
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								@item{@tt{brag}'s native data structure is the syntax object because the
-												add br/ragg

											
										
										
											8 years ago
+								majority of Racket's language-processing infrastructure knows how to read and
 								write this structured value.}
 								@item{
 								@margin-note{By the way, we can just as easily rewrite the semantics so that
-												update brag docs

											
										
										
											8 years ago
+								@racket[compile-rows] does explicitly call @racket[compile-chunk]. Often,
-												add br/ragg

											
										
										
											8 years ago
+								though, it's easier to write the transformation functions in this piecemeal way
 								and depend on the Racket macro expansion system to do the rewriting as it
 								encounters each of the forms.}
 								Unlike in interpretation, @racket[compile-rows] doesn't
-												update brag docs

											
										
										
											8 years ago
+								compile each chunk by directly calling @racket[compile-chunk]. Rather, it
-												add br/ragg

											
										
										
											8 years ago
+								depends on the Racket macro expander to call each @racket[compile-XXX] function
 								as it encounters a @racket[drawing], @racket[rows], or @racket[chunk] in the
-												update brag docs

											
										
										
											8 years ago
+								parsed value. The three statements at the bottom of @filepath{semantics.rkt} inform
-												add br/ragg

											
										
										
											8 years ago
+								the macro expansion system to do this:
 								@racketblock[
 								(define-syntax drawing compile-drawing)
 								(define-syntax rows compile-rows)
 								(define-syntax chunk compile-chunk)
 								]}
 								]
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								Altogether, @tt{brag}'s intent is to be a parser generator generator for Racket
-												update brag docs

											
										
										
											8 years ago
+								that's easy and fun to use. It's meant to fit naturally with the other tools
 								in the Racket language toolchain. Hopefully, it will reduce the friction in
-												add br/ragg

											
										
										
											8 years ago
+								making new languages with alternative concrete syntaxes.
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								The rest of this document describes the @tt{brag} language and the parsers it
-												add br/ragg

											
										
										
											8 years ago
+								generates.
 								@;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 								@;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 								@section{The language}
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								@subsection[#:tag "brag-syntax"]{Syntax and terminology}
 								A program in the @tt{brag} language consists of the language line
 								@litchar{#lang brag}, followed by a collection of @tech{rule}s and
-												add br/ragg

											
										
										
											8 years ago
+								@tech{line comment}s.
 								A @deftech{rule} is a sequence consisting of: a @tech{rule identifier}, a colon
 								@litchar{":"}, and a @tech{pattern}.
 								A @deftech{rule identifier} is an @tech{identifier} that is not in upper case.
 								A @deftech{token identifier} is an @tech{identifier} that is in upper case.
 								An @deftech{identifier} is a character sequence of letters, numbers, and
-												update brag docs

											
										
										
											8 years ago
+								characters in @racket["-.!$%&/<=>?^_~@"]. It must not contain
-												add br/ragg

											
										
										
											8 years ago
+								@litchar{*} or @litchar{+}, as those characters are used to denote
 								quantification.
 								A @deftech{pattern} is one of the following:
 								@itemize[
 								@item{an implicit sequence of @tech{pattern}s separated by whitespace}
 								@item{a terminal: either a literal string or a @tech{token identifier}}
 								@item{a @tech{rule identifier}}
 								@item{a @deftech{choice pattern}: a sequence of @tech{pattern}s delimited with @litchar{|} characters.}
 								@item{a @deftech{quantifed pattern}: a @tech{pattern} followed by either @litchar{*} (``zero or more'') or @litchar{+} (``one or more'')}
 								@item{an @deftech{optional pattern}: a @tech{pattern} surrounded by @litchar{[} and @litchar{]}}
 								@item{an explicit sequence: a @tech{pattern} surrounded by @litchar{(} and @litchar{)}}]
 								A @deftech{line comment} begins with either @litchar{#} or @litchar{;} and
 								continues till the end of the line.
 								For example, in the following program:
 								@nested[#:style 'inset
 								@verbatim|{
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								#lang brag
-												add br/ragg

											
										
										
											8 years ago
+								;; A parser for a silly language
 								sentence: verb optional-adjective object
 								verb: greeting
 								optional-adjective: ["happy" | "frumpy"]
 								greeting: "hello" | "hola" | "aloha"
 								object: "world" | WORLD
 								}|]
 								the elements @tt{sentence}, @tt{verb}, @tt{greeting}, and @tt{object} are rule
-												update brag docs

											
										
										
											8 years ago
+								identifiers. The first rule, @litchar{sentence: verb optional-adjective
-												add br/ragg

											
										
										
											8 years ago
+								object}, is a rule whose right side is an implicit pattern sequence of three
-												update brag docs

											
										
										
											8 years ago
+								sub-patterns. The uppercased @tt{WORLD} is a token identifier. The fourth rule in the program associates @tt{greeting} with a @tech{choice pattern}.
-												add br/ragg

											
										
										
											8 years ago
 								More examples:
 								@itemize[
 								@item{A
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								BNF for binary
-												add br/ragg

											
										
										
											8 years ago
+								strings that contain an equal number of zeros and ones.
 								@verbatim|{
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								#lang brag
-												add br/ragg

											
										
										
											8 years ago
+								equal: [zero one | one zero]   ;; equal number of "0"s and "1"s.
 								zero: "0" equal | equal "0"    ;; has an extra "0" in it.
 								one: "1" equal | equal "1"     ;; has an extra "1" in it.
 								}|
 								}
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								@item{A BNF for
-												add br/ragg

											
										
										
											8 years ago
+								@link["http://www.json.org/"]{JSON}-like structures.
 								@verbatim|{
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								#lang brag
-												add br/ragg

											
										
										
											8 years ago
+								json: number | string
 								    | array  | object
 								number: NUMBER
 								string: STRING
 								array: "[" [json ("," json)*] "]"
 								object: "{" [kvpair ("," kvpair)*] "}"
 								kvpair: ID ":" json
 								}|
 								}
 								]
 								@subsection{Syntax errors}
 								Besides the basic syntax errors that can occur with a malformed grammar, there
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								are a few other classes of situations that @litchar{#lang brag} will consider
-												add br/ragg

											
										
										
											8 years ago
+								as syntax errors.
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								@tt{brag} will raise a syntax error if the grammar:
-												add br/ragg

											
										
										
											8 years ago
+								@itemize[
 								@item{doesn't have any rules.}
 								@item{has a rule with the same left hand side as any other rule.}
-												update brag docs

											
										
										
											8 years ago
+								@item{refers to rules that have not been defined. e.g. the
-												add br/ragg

											
										
										
											8 years ago
+								following program:
 								@nested[#:style 'code-inset
 								@verbatim|{
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								#lang brag
-												add br/ragg

											
										
										
											8 years ago
+								foo: [bar]
 								}|
 								]
 								should raise an error because @tt{bar} has not been defined, even though
 								@tt{foo} refers to it in an @tech{optional pattern}.}
 								@item{uses the token name @racket[EOF]; the end-of-file token type is reserved
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								for internal use by @tt{brag}.}
-												add br/ragg

											
										
										
											8 years ago
-												update brag docs

											
										
										
											8 years ago
+								@item{contains a rule that has no finite derivation. e.g. the following
-												add br/ragg

											
										
										
											8 years ago
+								program:
 								@nested[#:style 'code-inset
 								@verbatim|{
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								#lang brag
-												add br/ragg

											
										
										
											8 years ago
+								infinite-a: "a" infinite-a
 								}|
 								]
 								should raise an error because no finite sequence of tokens will satisfy
 								@tt{infinite-a}.}
 								]
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								Otherwise, @tt{brag} should be fairly tolerant and permit even ambiguous
-												add br/ragg

											
										
										
											8 years ago
+								grammars.
 								@subsection{Semantics}
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								@declare-exporting[brag/examples/nested-word-list]
-												add br/ragg

											
										
										
											8 years ago
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								A program written in @litchar{#lang brag} produces a module that provides a few
-												update brag docs

											
										
										
											8 years ago
+								bindings. The most important of these is @racket[parse]:
-												add br/ragg

											
										
										
											8 years ago
 								@defproc[(parse [source any/c #f]
 								                [token-source (or/c (sequenceof token)
 								                                    (-> token))])
 								         syntax?]{
 								Parses the sequence of @tech{tokens} according to the rules in the grammar, using the
-												update brag docs

											
										
										
											8 years ago
+								first rule as the start production. The parse must completely consume
-												add br/ragg

											
										
										
											8 years ago
+								@racket[token-source].
 								The @deftech{token source} can either be a sequence, or a 0-arity function that
 								produces @tech{tokens}.
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								A @deftech{token} in @tt{brag} can be any of the following values:
-												add br/ragg

											
										
										
											8 years ago
+								@itemize[
 								@item{a string}
 								@item{a symbol}
 								@item{an instance produced by @racket[token]}
 								@item{an instance produced by the token constructors of @racketmodname[parser-tools/lex]}
 								@item{an instance of @racketmodname[parser-tools/lex]'s @racket[position-token] whose
 								      @racket[position-token-token] is a @tech{token}.}
 								]
 								A token whose type is either @racket[void] or @racket['EOF] terminates the
 								source.
-												update brag docs

											
										
										
											8 years ago
+								If @racket[parse] succeeds, it will return a structured syntax object. The
-												add br/ragg

											
										
										
											8 years ago
+								structure of the syntax object follows the overall structure of the rules in
-												update brag docs

											
										
										
											8 years ago
+								the BNF grammar. For each rule @racket[r] and its associated pattern @racket[p],
-												add br/ragg

											
										
										
											8 years ago
+								@racket[parse] generates a syntax object @racket[#'(r p-value)] where
 								@racket[p-value]'s structure follows a case analysis on @racket[p]:
 								@itemize[
 								@item{For implicit and explicit sequences of @tech{pattern}s @racket[p1],
 								      @racket[p2], ..., the corresponding values, spliced into the
 								      structure.}
 								@item{For terminals, the value associated to the token.}
 								@item{For @tech{rule identifier}s: the associated parse value for the rule.}
 								@item{For @tech{choice pattern}s: the associated parse value for one of the matching subpatterns.}
 								@item{For @tech{quantifed pattern}s and @tech{optional pattern}s: the corresponding values, spliced into the structure.}
 								]
 								Consequently, it's only the presence of @tech{rule identifier}s in a rule's
 								pattern that informs the parser to introduces nested structure into the syntax
 								object.
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								If the grammar has ambiguity, @tt{brag} will choose and return a parse, though
-												add br/ragg

											
										
										
											8 years ago
+								it does not guarantee which one it chooses.
 								If the parse cannot be performed successfully, or if a token in the
 								@racket[token-source] uses a type that isn't mentioned in the grammar, then
 								@racket[parse] raises an instance of @racket[exn:fail:parsing].}
-												updates

											
										
										
											8 years ago
+								@defproc[(parse-tree [source any/c #f]
 								                [token-source (or/c (sequenceof token)
 								                                    (-> token))])
 								         list?]{
 								Same as @racket[parse], but the result is converted into a visible parse tree. Useful for testing or debugging a parser.
 								}
-												add br/ragg

											
										
										
											8 years ago
 								@defform[#:id make-rule-parser
 								         (make-rule-parser name)]{
 								Constructs a parser for the @racket[name] of one of the non-terminals
-												updates

											
										
										
											8 years ago
+								in the grammar.
-												add br/ragg

											
										
										
											8 years ago
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								For example, given the @tt{brag} program
-												add br/ragg

											
										
										
											8 years ago
+								@filepath{simple-arithmetic-grammar.rkt}:
 								@filebox["simple-arithmetic-grammar.rkt"]{
 								@verbatim|{
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								#lang brag
-												add br/ragg

											
										
										
											8 years ago
+								expr : term ('+' term)*
 								term : factor ('*' factor)*
 								factor : INT
 								}|
 								}
 								the following interaction shows how to extract a parser for @racket[term]s.
 								@interaction[#:eval my-eval
 								@eval:alts[(require "simple-arithmetic-grammar.rkt")
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								                    (require brag/examples/simple-arithmetic-grammar)]
-												add br/ragg

											
										
										
											8 years ago
+								(define term-parse (make-rule-parser term))
 								(define tokens (list (token 'INT 3)
 								                     "*"
 								                     (token 'INT 4)))
 								(syntax->datum (parse tokens))
 								(syntax->datum (term-parse tokens))
 								(define another-token-sequence
 								  (list (token 'INT 1) "+" (token 'INT 2)
 								        "*" (token 'INT 3)))
 								(syntax->datum (parse another-token-sequence))
 								@code:comment{Note that term-parse will break on another-token-sequence}
 								@code:comment{as it does not know what to do with the "+"}
 								(term-parse another-token-sequence)
 								]
 								}
 								@defthing[all-token-types (setof symbol?)]{
 								A set of all the token types used in a grammar.
 								For example:
 								@interaction[#:eval my-eval
 								@eval:alts[(require "simple-arithmetic-grammar.rkt")
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								                    (require brag/examples/simple-arithmetic-grammar)]
-												add br/ragg

											
										
										
											8 years ago
+								all-token-types
 								]
 								}
 								@section{Support API}
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								@defmodule[brag/support]
-												add br/ragg

											
										
										
											8 years ago
-												rename `br/ragg` as `brag`

											
										
										
											8 years ago
+								The @racketmodname[brag/support] module provides functions to interact with
-												update brag docs

											
										
										
											8 years ago
+								@tt{brag} programs. The most useful is the @racket[token] function, which
-												add br/ragg

											
										
										
											8 years ago
+								produces tokens to be parsed.
 								@defproc[(token [type (or/c string? symbol?)]
 								                [val any/c #f]
 								                [#:line line (or/c positive-integer? #f) #f]
 								                [#:column column (or/c natural-number? #f) #f]
 								                [#:offset offset (or/c positive-integer? #f) #f]
 								                [#:span span (or/c natural-number? #f) #f]
 								                [#:skip? skip? boolean? #f]
 								                )
 								         token-struct?]{
 								Creates instances of @racket[token-struct]s.
 								The syntax objects produced by a parse will inject the value @racket[val] in
 								place of the token name in the grammar.
 								If @racket[#:skip?] is true, then the parser will skip over it during a
 								parse.}
 								@defstruct[token-struct ([type symbol?]
 								                         [val any/c]
 								                         [offset (or/c positive-integer? #f)]
 								                         [line (or/c natural-number? #f)]
 								                         [column (or/c positive-integer? #f)]
 								                         [span (or/c natural-number? #f)]
 								                         [skip? boolean?])
 								                        #:transparent]{
 								The token structure type.
 								Rather than directly using the @racket[token-struct] constructor, please use
 								the helper function @racket[token] to construct instances.
 								}
 								@defstruct[(exn:fail:parsing exn:fail)
 								           ([message string?]
 								            [continuation-marks continuation-mark-set?]
 								            [srclocs (listof srcloc?)])]{
 								The exception raised when parsing fails.
 								@racket[exn:fail:parsing] implements Racket's @racket[prop:exn:srcloc]
 								property, so if this exception reaches DrRacket's default error handler,
 								DrRacket should highlight the offending locations in the source.}
-												add `brag/lexer-support`

											
										
										
											8 years ago
+								@section{Lexer support API}
-												add br/ragg

											
										
										
											8 years ago
-												add `brag/lexer-support`

											
										
										
											8 years ago
+								@defmodule[brag/lexer-support]
-												updates

											
										
										
											8 years ago
+								In addition to the exports shown below, the @racketmodname[brag/lexer-support] module also provides everything from @racketmodname[brag/support], and everything from @racketmodname[parser-tools/lex].
 								@defproc[(apply-tokenizer [tokenizer procedure?]
 								                [source-string (or/c string?
 								                                    input-port?)])
 								         list?]{
 								Repeatedly apply @racket[tokenizer] to @racket[source-string], gathering the resulting tokens into a list. Useful for testing or debugging a tokenizer.
 								}
-												add `brag/lexer-support`

											
										
										
											8 years ago
-												improve tokenizer macros

											
										
										
											8 years ago
 								@defproc[(trim-delimiters [left-delimiter string?]
 								[str string?]
 								[right-delimiter string?])
 								         string?]{
 								Remove @racket[left-delimiter] from the left side of @racket[str], and @racket[right-delimiter] from its right side. Intended as a helper function for @racket[delimited-by].
 								}
-												add `brag/lexer-support`

											
										
										
											8 years ago
+								@defform[(:* re ...)]{
 								Repetition of @racket[re] sequence 0 or more times.}
 								@defform[(:+ re ...)]{
 								Repetition of @racket[re] sequence 1 or more times.}
 								@defform[(:? re ...)]{
 								Zero or one occurrence of @racket[re] sequence.}
 								@defform[(:= n re ...)]{
 								Exactly @racket[n] occurrences of @racket[re] sequence, where
 								@racket[n] must be a literal exact, non-negative number.}
 								@defform[(:>= n re ...)]{
 								At least @racket[n] occurrences of @racket[re] sequence, where
 								@racket[n] must be a literal exact, non-negative number.}
 								@defform[(:** n m re ...)]{
 								Between @racket[n] and @racket[m] (inclusive) occurrences of
 								@racket[re] sequence, where @racket[n] must be a literal exact,
 								non-negative number, and @racket[m] must be literally either
 								@racket[#f], @racket[+inf.0], or an exact, non-negative number; a
 								@racket[#f] value for @racket[m] is the same as @racket[+inf.0].}
 								@defform[(:or re ...)]{
 								Same as @racket[(union re ...)].}
 								@deftogether[(
 								@defform[(:: re ...)]
 								@defform[(:seq re ...)]
 								)]{
 								Both forms concatenate the @racket[re]s.}
 								@defform[(:& re ...)]{
 								Intersects the @racket[re]s.}
 								@defform[(:- re ...)]{
 								The set difference of the @racket[re]s.}
 								@defform[(:~ re ...)]{
 								Character-set complement, which each @racket[re] must match exactly
 								one character.}
 								@defform[(:/ char-or-string ...)]{
 								Character ranges, matching characters between successive pairs of
 								characters.}
-												add br/ragg

											
										
										
											8 years ago
-												improve tokenizer macros

											
										
										
											8 years ago
+								@defform[(delimited-by open close)]{
 								A string that is bounded by the @racket[open] and @racket[close] delimiters. Matching is non-greedy (meaning, it stops at the first occurence of @racket[close]). The resulting lexeme includes the delimiters. To remove them, see @racket[trim-delimiters].}
-												add br/ragg

											
										
										
											8 years ago
 								@close-eval[my-eval]