@ -3,8 +3,8 @@
racket/date
file/md5
(for-label racket
br/r ag g/support
br/r ag g/examples/nested-word-list
brag/support
brag/examples/nested-word-list
(only-in parser-tools/lex lexer-src-pos)
(only-in syntax/parse syntax-parse ~literal)))
@ -26,14 +26,15 @@
@title{ragg : a Racket AST Generator Generator}
@author+email["Danny Yoo" "dyoo@hashcollision.org "]
@title{b rag: the Be autiful Racket AST Generator}
@author["Danny Yoo" "Matthew Butterick "]
@defmodulelang[brag]
@section{Informal quickstart}
@(define my-eval (make-base-eval))
@(my-eval '(require br/r ag g/examples/nested-word-list
@(my-eval '(require brag/examples/nested-word-list
racket/list
racket/match))
@ -66,11 +67,9 @@ or more repetitions of the previous thing, and we treat the uppercased
@racket[LEFT-PAREN], @racket[RIGHT-PAREN], and @racket[WORD] as placeholders
for atomic @emph{tokens}.
@margin-note{See @secref{install-ragg} for instructions on installing
@tt{ragg.}}
Here are a few examples of tokens:
@interaction[#:eval my-eval
(require br/r ag g/support)
(require brag/support)
(token 'LEFT-PAREN)
(token 'WORD "crunchy" #:span 7)
(token 'RIGHT-PAREN)]
@ -82,12 +81,12 @@ use it to make structures out of a sequence of tokens.
It's clear that we don't yet have a program because there's no @litchar{#lang}
line. We should add one. Put @litchar{#lang br/r ag g} at the top of the BNF
line. We should add one. Put @litchar{#lang brag} at the top of the BNF
description, and save it as a file called @filepath{nested-word-list.rkt}.
@filebox["nested-word-list.rkt"]{
@verbatim{
#lang br/r agg
#lang brag
nested-word-list: WORD
| LEFT-PAREN nested-word-list* RIGHT-PAREN
}}
@ -135,12 +134,11 @@ What happens if we pass it a more substantial source of tokens?
(token 'WORD str)])))
@code:comment{For example:}
(define token-source (tokenize "(welcome (to (((ragg)) ())))"))
(define v (parse token-source))
(syntax->datum v)
]
Welcome to @tt{rag g}.
Welcome to @tt{b rag}.
@ -153,12 +151,12 @@ Welcome to @tt{ragg}.
@section{Introduction}
@tt{rag g} is a parsing framework for Racket with the design goal to be easy
@tt{b rag} is a parsing framework for Racket with the design goal to be easy
to use. It includes the following features:
@itemize[
@item{It provides a @litchar{#lang} for writing extended BNF grammars.
A module written in @litchar{#lang br/r ag g} automatically generates a
A module written in @litchar{#lang brag} automatically generates a
parser. The output of this parser tries to follow
@link["http://en.wikipedia.org/wiki/How_to_Design_Programs"]{HTDP}
doctrine; the structure of the grammar informs the structure of the
@ -170,7 +168,7 @@ starting production. Identifiers in uppercase are assumed to represent
terminal tokens, and are otherwise the names of nonterminals.}
@item{Tokenizers can be developed completely independently of parsers.
@tt{rag g} takes a liberal view on tokens: they can be strings,
@tt{b rag} takes a liberal view on tokens: they can be strings,
symbols, or instances constructed with @racket[token]. Furthermore,
tokens can optionally provide location: if tokens provide location, the
generated syntax objects will as well.}
@ -182,38 +180,13 @@ generated syntax objects will as well.}
]
@subsection[#:tag "install-ragg"]{Installation}
@itemize[
@item{@margin-note{At the time of this writing, Racket 5.3.2 is in
@link["http://pre.racket-lang.org/"]{pre-release}.} If you are using a version
of Racket > 5.3.1, then follow the instructions on the
@link["https://plt-etc.byu.edu:9004/info/ragg"]{PLaneT2 page}.}
@item{For those who are using Racket <= 5.3.1, you can download the following PLT package:
@nested[#:style 'inset]{@link["ragg.plt"]{ragg.plt} [md5sum: @compute-md5sum["ragg.plt" "ab79038b40e510a5cf13363825c4aef4"]]
Last updated: @lookup-date["ragg.plt" "Wednesday, January 16th, 2013"]
}
Once downloaded, either use DrRacket's package installation features
(@link["http://docs.racket-lang.org/drracket/Menus.html#(idx._(gentag._57._(lib._scribblings/drracket/drracket..scrbl)))"]{Install
PLT File...} under DrRacket's File menu), or use the command line:
@nested[#:style 'inset]{@tt{raco setup -A ragg.plt}}}
]
@subsection{Example: a small DSL for ASCII diagrams}
@margin-note{This is a
@link["http://stackoverflow.com/questions/12345647/rewrite-this-script-by-designing-an-interpreter-in-racket"]{restatement
of a question on Stack Overflow}.} To motivate @tt{rag g}'s design, let's look
of a question on Stack Overflow}.} To motivate @tt{brag}'s design, let's look
at the following toy problem: we'd like to define a language for
drawing simple ASCII diagrams. We'd like to be able write something like this:
@ -276,7 +249,7 @@ programs.
@subsection{Parsing the concrete syntax}
@filebox["simple-line-drawing.rkt"]{
@verbatim|{
#lang br/r agg
#lang brag
drawing: rows*
rows: repeat chunk+ ";"
repeat: INTEGER
@ -284,21 +257,21 @@ chunk: INTEGER STRING
}|
}
@margin-note{@secref{rag g-syntax} describes @tt{rag g}'s syntax in more detail.}
We write a @tt{rag g} program as an extended BNF grammar, where patterns can be:
@margin-note{@secref{b rag-syntax} describes @tt{b rag}'s syntax in more detail.}
We write a @tt{b rag} program as an extended BNF grammar, where patterns can be:
@itemize[
@item{the names of other rules (e.g. @racket[chunk])}
@item{literal and symbolic token names (e.g. @racket[";"], @racket[INTEGER])}
@item{quantified patterns (e.g. @litchar{+} to represent one-or-more repetitions)}
]
The result of a @tt{rag g} program is a module with a @racket[parse] function
The result of a @tt{b rag} program is a module with a @racket[parse] function
that can parse tokens and produce a syntax object as a result.
Let's exercise this function:
@interaction[#:eval my-eval
(require br/r ag g/support)
(require brag/support)
@eval:alts[(require "simple-line-drawing.rkt")
(require br/r ag g/examples/simple-line-drawing)]
(require brag/examples/simple-line-drawing)]
(define stx
(parse (list (token 'INTEGER 6)
(token 'INTEGER 2)
@ -553,7 +526,7 @@ Let's add one.
@filebox["letter-i.rkt"]{
@verbatim|{
#lang br/r ag g/examples/simple-line-drawing
#lang brag/examples/simple-line-drawing
3 9 X;
6 3 b 3 X 3 b;
3 9 X;
@ -569,9 +542,9 @@ how to compile programs labeled with this @litchar{#lang} line. We'll do two
things:
@itemize[
@item{Tell Racket to use the @tt{rag g}-generated parser and lexer we defined
@item{Tell Racket to use the @tt{b rag}-generated parser and lexer we defined
earlier whenever it sees a program written with
@litchar{#lang br/r ag g/examples/simple-line-drawing}.}
@litchar{#lang brag/examples/simple-line-drawing}.}
@item{Define transformation rules for @racket[drawing], @racket[rows], and
@racket[chunk] to rewrite these into standard Racket forms.}
@ -591,18 +564,18 @@ reader} tells Racket how to parse and compile a file. Whenever Racket sees a
@filepath{<name>/lang/reader}.
Here's the definition for
@filepath{br/r ag g/examples/simple-line-drawing/lang/reader.rkt}:
@filepath{brag/examples/simple-line-drawing/lang/reader.rkt}:
@filebox["br/r ag g/examples/simple-line-drawing/lang/reader.rkt"]{
@filebox["brag/examples/simple-line-drawing/lang/reader.rkt"]{
@codeblock|{
#lang s-exp syntax/module-reader
br/r ag g/examples/simple-line-drawing/semantics
brag/examples/simple-line-drawing/semantics
#:read my-read
#:read-syntax my-read-syntax
#:whole-body-readers? #t
(require br/r ag g/examples/simple-line-drawing/lexer
br/r ag g/examples/simple-line-drawing/grammar)
(require brag/examples/simple-line-drawing/lexer
brag/examples/simple-line-drawing/grammar)
(define (my-read in)
(syntax->datum (my-read-syntax #f in)))
@ -614,11 +587,7 @@ br/ragg/examples/simple-line-drawing/semantics
We use a helper module @racketmodname[syntax/module-reader], which provides
utilities for creating a module reader. It uses the lexer and
@tt{ragg}-generated parser we defined earlier (saved into
@link["http://hashcollision.org/ragg/examples/simple-line-drawing/lexer.rkt"]{lexer.rkt}
and
@link["http://hashcollision.org/ragg/examples/simple-line-drawing/grammar.rkt"]{grammar.rkt}
modules), and also tells Racket that it should compile the forms in the syntax
@tt{brag}-generated parser we defined earlier, and also tells Racket that it should compile the forms in the syntax
object using a module called @filepath{semantics.rkt}.
@margin-note{For a systematic treatment on capturing the semantics of
@ -627,7 +596,7 @@ Interpretation}.}
Let's look into @filepath{semantics.rkt} and see what's involved in
compilation:
@filebox["br/r ag g/examples/simple-line-drawing/semantics.rkt"]{
@filebox["brag/examples/simple-line-drawing/semantics.rkt"]{
@codeblock|{
#lang racket/base
(require (for-syntax racket/base syntax/parse))
@ -692,7 +661,7 @@ There are a few things to note:
@itemize[
@item{@tt{rag g}'s native data structure is the syntax object because the
@item{@tt{b rag}'s native data structure is the syntax object because the
majority of Racket's language-processing infrastructure knows how to read and
write this structured value.}
@ -718,12 +687,12 @@ the macro expansion system to do this:
]
Altogether, @tt{rag g}'s intent is to be a parser generator generator for Racket
Altogether, @tt{b rag}'s intent is to be a parser generator generator for Racket
that's easy and fun to use. It's meant to fit naturally with the other tools
in the Racket language toolchain. Hopefully, it will reduce the friction in
making new languages with alternative concrete syntaxes.
The rest of this document describes the @tt{rag g} language and the parsers it
The rest of this document describes the @tt{b rag} language and the parsers it
generates.
@ -732,9 +701,9 @@ generates.
@section{The language}
@subsection[#:tag "rag g-syntax"]{Syntax and terminology}
A program in the @tt{rag g} language consists of the language line
@litchar{#lang br/r ag g}, followed by a collection of @tech{rule}s and
@subsection[#:tag "b rag-syntax"]{Syntax and terminology}
A program in the @tt{b rag} language consists of the language line
@litchar{#lang brag}, followed by a collection of @tech{rule}s and
@tech{line comment}s.
A @deftech{rule} is a sequence consisting of: a @tech{rule identifier}, a colon
@ -767,7 +736,7 @@ continues till the end of the line.
For example, in the following program:
@nested[#:style 'inset
@verbatim|{
#lang br/r agg
#lang brag
;; A parser for a silly language
sentence: verb optional-adjective object
verb: greeting
@ -787,20 +756,20 @@ More examples:
@itemize[
@item{A
@link["http://hashcollision.org/ragg/examples/01-equal.rkt"]{ BNF} for binary
BNF for binary
strings that contain an equal number of zeros and ones.
@verbatim|{
#lang br/r agg
#lang brag
equal: [zero one | one zero] ;; equal number of "0"s and "1"s.
zero: "0" equal | equal "0" ;; has an extra "0" in it.
one: "1" equal | equal "1" ;; has an extra "1" in it.
}|
}
@item{A @link["http://hashcollision.org/ragg/examples/baby-json.rkt"]{ BNF} for
@item{A BNF for
@link["http://www.json.org/"]{JSON}-like structures.
@verbatim|{
#lang br/r agg
#lang brag
json: number | string
| array | object
number: NUMBER
@ -812,20 +781,16 @@ kvpair: ID ":" json
}
]
The @link["https://github.com/dyoo/ragg"]{ragg github source repository}
includes
@link["https://github.com/dyoo/ragg/tree/master/ragg/examples"]{several more
examples}.
@subsection{Syntax errors}
Besides the basic syntax errors that can occur with a malformed grammar, there
are a few other classes of situations that @litchar{#lang br/r ag g} will consider
are a few other classes of situations that @litchar{#lang brag} will consider
as syntax errors.
@tt{rag g} will raise a syntax error if the grammar:
@tt{b rag} will raise a syntax error if the grammar:
@itemize[
@item{doesn't have any rules.}
@ -835,7 +800,7 @@ as syntax errors.
following program:
@nested[#:style 'code-inset
@verbatim|{
#lang br/r agg
#lang brag
foo: [bar]
}|
]
@ -844,14 +809,14 @@ should raise an error because @tt{bar} has not been defined, even though
@item{uses the token name @racket[EOF]; the end-of-file token type is reserved
for internal use by @tt{rag g}.}
for internal use by @tt{b rag}.}
@item{contains a rule that has no finite derivation. e.g. the following
program:
@nested[#:style 'code-inset
@verbatim|{
#lang br/r agg
#lang brag
infinite-a: "a" infinite-a
}|
]
@ -860,13 +825,13 @@ should raise an error because no finite sequence of tokens will satisfy
]
Otherwise, @tt{rag g} should be fairly tolerant and permit even ambiguous
Otherwise, @tt{b rag} should be fairly tolerant and permit even ambiguous
grammars.
@subsection{Semantics}
@declare-exporting[br/r ag g/examples/nested-word-list]
@declare-exporting[brag/examples/nested-word-list]
A program written in @litchar{#lang br/r ag g} produces a module that provides a few
A program written in @litchar{#lang brag} produces a module that provides a few
bindings. The most important of these is @racket[parse]:
@defproc[(parse [source any/c #f]
@ -881,7 +846,7 @@ first rule as the start production. The parse must completely consume
The @deftech{token source} can either be a sequence, or a 0-arity function that
produces @tech{tokens}.
A @deftech{token} in @tt{rag g} can be any of the following values:
A @deftech{token} in @tt{b rag} can be any of the following values:
@itemize[
@item{a string}
@item{a symbol}
@ -916,7 +881,7 @@ pattern that informs the parser to introduces nested structure into the syntax
object.
If the grammar has ambiguity, @tt{rag g} will choose and return a parse, though
If the grammar has ambiguity, @tt{b rag} will choose and return a parse, though
it does not guarantee which one it chooses.
@ -927,7 +892,7 @@ If the parse cannot be performed successfully, or if a token in the
It's often convenient to extract a parser for other non-terminal rules in the
grammar, and not just for the first rule. A @tt{rag g}-generated module also
grammar, and not just for the first rule. A @tt{b rag}-generated module also
provides a form called @racket[make-rule-parser] to extract a parser for the
other non-terminals:
@ -936,11 +901,11 @@ other non-terminals:
Constructs a parser for the @racket[name] of one of the non-terminals
in the grammar.
For example, given the @tt{rag g} program
For example, given the @tt{b rag} program
@filepath{simple-arithmetic-grammar.rkt}:
@filebox["simple-arithmetic-grammar.rkt"]{
@verbatim|{
#lang br/r agg
#lang brag
expr : term ('+' term)*
term : factor ('*' factor)*
factor : INT
@ -949,7 +914,7 @@ factor : INT
the following interaction shows how to extract a parser for @racket[term]s.
@interaction[#:eval my-eval
@eval:alts[(require "simple-arithmetic-grammar.rkt")
(require br/r ag g/examples/simple-arithmetic-grammar)]
(require brag/examples/simple-arithmetic-grammar)]
(define term-parse (make-rule-parser term))
(define tokens (list (token 'INT 3)
"*"
@ -977,7 +942,7 @@ A set of all the token types used in a grammar.
For example:
@interaction[#:eval my-eval
@eval:alts[(require "simple-arithmetic-grammar.rkt")
(require br/r ag g/examples/simple-arithmetic-grammar)]
(require brag/examples/simple-arithmetic-grammar)]
all-token-types
]
@ -989,10 +954,10 @@ all-token-types
@section{Support API}
@defmodule[br/r ag g/support]
@defmodule[brag/support]
The @racketmodname[br/r ag g/support] module provides functions to interact with
@tt{rag g} programs. The most useful is the @racket[token] function, which
The @racketmodname[brag/support] module provides functions to interact with
@tt{b rag} programs. The most useful is the @racket[token] function, which
produces tokens to be parsed.
@defproc[(token [type (or/c string? symbol?)]
@ -1043,65 +1008,4 @@ DrRacket should highlight the offending locations in the source.}
@;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@section{Caveats and things to do}
Here are a few caveats and future aims for @tt{ragg}.
@itemize[
@item{@tt{ragg} doesn't currently have a good story about operator precedence.
Future versions of @tt{ragg} will support the specification of operator
precedence to deal with grammar ambiguity, probably by extending the BNF
grammar rules in @litchar{#lang br/ragg} with keyword arguments.}
@item{I currently depend on the lexer framework provided by
@racketmodname[parser-tools/lex], which has a steeper learning curve than I'd
like. A future version of @tt{ragg} will probably try to provide a nicer set
of tools for defining lexers.}
@item{The underlying parsing engine (an Earley-style parser) has not been fully
optimized, so it may exhibit degenerate parse times. A future version of
@tt{ragg} will guarantee @math{O(n^3)} time bounds so that at the very least,
parses will be polynomial-time.}
@item{@tt{ragg} doesn't yet have a good story on dealing with parser error
recovery. If a parse fails, it tries to provide the source location, but does
little else.}
@item{@tt{ragg} is slightly misnamed: what it really builds is a concrete
syntax tree rather than an abstract syntax tree. A future version of @tt{ragg}
will probably support annotations on patterns so that they can be omitted or
transformed in the parser output.}
]
@;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@section{Miscellaneous and thanks}
Thanks to Matthew Flatt for pointing me to @racket[cfg-parser] from the
@racket[cfg-parser] library. Joe Politz gave me good advice and
feedback. Also, he suggested the name ``ragg''. Other alternatives I'd been
considering were ``autogrammar'' or ``chompy''. Thankfully, he is a better
Namer than me. Daniel Patterson provided feedback that led to
@racket[make-rule-parser]. Robby Findler and Guillaume Marceau provided
steadfast suggestions to look into other parsing frameworks like
@link["http://en.wikipedia.org/wiki/Syntax_Definition_Formalism"]{SDF} and
@link["http://sablecc.org/"]{SableCC}. Special thanks to Shriram
Krishnamurthi, who convinced me that other people might find this package
useful.
@close-eval[my-eval]