allow "start" and "atok" as tokens (fixes #3)

pull/6/head
Matthew Butterick 7 years ago
parent 7aa81c36c3
commit 18c6119a26

@ -679,9 +679,12 @@ A @deftech{pattern} is one of the following:
@item{an implicit sequence of @tech{pattern}s separated by whitespace} @item{an implicit sequence of @tech{pattern}s separated by whitespace}
@item{a terminal: either a literal string or a @tech{symbolic token identifier}. @item{a terminal: either a literal string or a @tech{symbolic token identifier}.
When used in a pattern, both these terminals will match the same set of inputs. A literal string can match the string itself, or a @racket[token] whose type field contains that string (or its symbol form). So @racket["FOO"] would match @racket["FOO"], @racket[(token "FOO" "bar")], or @racket[(token 'FOO "bar")]. A symbolic token identifier can also match the string version of the identifier, or a @racket[token] whose type field is the symbol or string form of the identifier. So @racket[FOO] would also match @racket["FOO"], @racket[(token 'FOO "bar")], or @racket[(token "FOO" "bar")]. (In every case, the value of a token, like @racket["bar"], can be anything, and may or may not be the same as its type.) When used in a pattern, both these terminals will match the same set of inputs. A literal string can match the string itself, or a @racket[token] structure whose type field contains that string (or its symbol form). So @racket["FOO"] would match @racket["FOO"], @racket[(token "FOO" "bar")], or @racket[(token 'FOO "bar")]. A symbolic token identifier can also match the string version of the identifier, or a @racket[token] whose type field is the symbol or string form of the identifier. So @racket[FOO] would also match @racket["FOO"], @racket[(token 'FOO "bar")], or @racket[(token "FOO" "bar")]. (In every case, the value of a token, like @racket["bar"], can be anything, and may or may not be the same as its type.)
Because their underlying meanings are the same, the symbolic token identifier ends up being a notational convenience for readability inside a grammar pattern. Typically, the literal string @racket["FOO"] is used to connote ``match the string @racket["FOO"] exactly'' and the symbolic token identifier @racket[FOO] specially connotes ``match any token of type @racket['FOO]''.} Because their underlying meanings are the same, the symbolic token identifier ends up being a notational convenience for readability inside a grammar pattern. Typically, the literal string @racket["FOO"] is used to connote ``match the string @racket["FOO"] exactly'' and the symbolic token identifier @racket[FOO] specially connotes ``match any token of type @racket['FOO]''.
You @bold{cannot} use the literal string @racket["error"] as a terminal in a grammar, because it's reserved for @tt{brag}. You can, however, adjust your lexer to package it inside a token structure — say, @racket[(token ERROR "error")] — and then use the symbolic token identifier @racket[ERROR] in the grammar to match this token structure.
}
@item{a @tech{rule identifier}} @item{a @tech{rule identifier}}
@item{a @deftech{choice pattern}: a sequence of @tech{pattern}s delimited with @litchar{|} characters.} @item{a @deftech{choice pattern}: a sequence of @tech{pattern}s delimited with @litchar{|} characters.}

@ -705,10 +705,13 @@
(if src-pos? (if src-pos?
#'($1-start-pos $1-end-pos) #'($1-start-pos $1-end-pos)
#'(#f #f))]) #'(#f #f))])
#`(grammar (start [() null] ;; rename `start` and `atok` to `%start` and `%atok`
[(atok start) (cons $1 $2)]) ;; so that "start" and "atok" can be used as literal string tokens in a grammar.
(atok [(tok) (make-tok 'tok-id 'tok $e pos ...)] ...))) ;; not sure why this works, but it passes all tests.
#`(start start) #`(grammar (%start [() null]
[(%atok %start) (cons $1 $2)])
(%atok [(tok) (make-tok 'tok-id 'tok $e pos ...)] ...)))
#`(start %start)
parser-clauses)))] parser-clauses)))]
[(grammar . _) [(grammar . _)
(raise-syntax-error (raise-syntax-error
@ -745,30 +748,30 @@
(next success-k fail-k max-depth tasks)))] (next success-k fail-k max-depth tasks)))]
[fail-k (lambda (max-depth tasks) [fail-k (lambda (max-depth tasks)
(cond (cond
[(null? tok-list) [(null? tok-list)
(if error-proc (if error-proc
(error-proc #t (error-proc #t
'no-tokens 'no-tokens
#f #f
(make-position #f #f #f) (make-position #f #f #f)
(make-position #f #f #f)) (make-position #f #f #f))
(error (error
'cfg-parse 'cfg-parse
"no tokens"))] "no tokens"))]
[else [else
(let ([bad-tok (list-ref tok-list (let ([bad-tok (list-ref tok-list
(min (sub1 (length tok-list)) (min (sub1 (length tok-list))
max-depth))]) max-depth))])
(if error-proc (if error-proc
(error-proc #t (error-proc #t
(tok-orig-name bad-tok) (tok-orig-name bad-tok)
(tok-val bad-tok) (tok-val bad-tok)
(tok-start bad-tok) (tok-start bad-tok)
(tok-end bad-tok)) (tok-end bad-tok))
(error (error
'cfg-parse 'cfg-parse
"failed at ~a" "failed at ~a"
(tok-val bad-tok))))]))]) (tok-val bad-tok))))]))])
(#,start tok-list (#,start tok-list
;; we simulate a token at the very beginning with zero width ;; we simulate a token at the very beginning with zero width
;; for use with the position-generating code (*-start-pos, *-end-pos). ;; for use with the position-generating code (*-start-pos, *-end-pos).

Loading…
Cancel
Save