You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
support for codepoint escape sequences in strings (closes #29) (#31)
This improves the lexing of escape sequences within strings that appear in a grammar. It relies on Racket’s `read` to interpret these escape sequences rather than a hard-coded hash table. This gives strings in a grammar pretty much the same semantics as standard Racket strings, including support for octal and hex escape sequences for Unicode codepoints.
Though this passes all current tests, there are still some oddball corner cases that can be discovered by sticking together certain combinations of escape sequences (backslashes, double quotes, and codepoints). The better solution would be to peek into the input port for a double quote, and if it’s there, use the standard Racket lexer to pull out the string (this lexer already handles the weirdo cases). We can’t do this, however, because brag also supports single-quoted strings, which need to have the same semantics, and the Racket lexer won’t work with those. So I think we’re stuck with the homegrown solution (for consistency with both kinds of quotes) even at the expense of a few unresolved corner cases. Let’s leave that question for another day, as these cases haven’t surfaced in practical use thus far.
3 years ago
|
|
|
#lang racket/base
|
|
|
|
|
|
|
|
(module+ test
|
support for codepoint escape sequences in strings (closes #29) (#31)
This improves the lexing of escape sequences within strings that appear in a grammar. It relies on Racket’s `read` to interpret these escape sequences rather than a hard-coded hash table. This gives strings in a grammar pretty much the same semantics as standard Racket strings, including support for octal and hex escape sequences for Unicode codepoints.
Though this passes all current tests, there are still some oddball corner cases that can be discovered by sticking together certain combinations of escape sequences (backslashes, double quotes, and codepoints). The better solution would be to peek into the input port for a double quote, and if it’s there, use the standard Racket lexer to pull out the string (this lexer already handles the weirdo cases). We can’t do this, however, because brag also supports single-quoted strings, which need to have the same semantics, and the Racket lexer won’t work with those. So I think we’re stuck with the homegrown solution (for consistency with both kinds of quotes) even at the expense of a few unresolved corner cases. Let’s leave that question for another day, as these cases haven’t surfaced in practical use thus far.
3 years ago
|
|
|
|
|
|
|
(require yaragg/examples/codepoints
|
|
|
|
rackunit)
|
|
|
|
|
|
|
|
(check-equal? (parse-to-datum '("\"A\\" "'c\\" "*d\\\"\\ef\"" "hello world"))
|
|
|
|
'(start (A "\"A\\")
|
|
|
|
(c "'c\\")
|
|
|
|
(def "*d\\\"\\ef\"")
|
|
|
|
(hello-world "hello world"))))
|