12 KiB
Reader Extensions
+[missing] in [missing] provides more on reader extensions.
The reader layer of the Racket language can be extended through the
#reader
form. A reader extension is implemented as a module that is
named after #reader
. The module exports functions that parse raw
characters into a form to be consumed by the expander layer.
The syntax of #reader
is
#reader
>module-path< >reader-specific<
where >module-path< names a module that provides read
and
read-syntax
functions. The >reader-specific< part is a sequence of
characters that is parsed as determined by the read
and read-syntax
functions from >module-path<.
For example, suppose that file "five.rkt"
contains
"five.rkt"
#lang racket/base
(provide read read-syntax)
(define (read in) (list (read-string 5 in)))
(define (read-syntax src in) (list (read-string 5 in)))
Then, the program
#lang racket/base
'(1 #reader"five.rkt"234567 8)
is equivalent to
#lang racket/base
'(1 ("23456") 7 8)
because the read
and read-syntax
functions of "five.rkt"
both read
five characters from the input stream and put them into a string and
then a list. The reader functions from "five.rkt"
are not obliged to
follow Racket lexical conventions and treat the continuous sequence
234567
as a single number. Since only the 23456
part is consumed by
read
or read-syntax
, the 7
remains to be parsed in the usual
Racket way. Similarly, the reader functions from "five.rkt"
are not
obliged to ignore whitespace, and
#lang racket/base
'(1 #reader"five.rkt" 234567 8)
is equivalent to
#lang racket/base
'(1 (" 2345") 67 8)
since the first character immediately after "five.rkt"
is a space.
A #reader
form can be used in the REPL, too:
> '#reader"five.rkt"abcde
'("abcde")
1. Source Locations
The difference between read
and read-syntax
is that read
is meant
to be used for data while read-syntax
is meant to be used to parse
programs. More precisely, the read
function will be used when the
enclosing stream is being parsed by the Racket read
, and read-syntax
is used when the enclosing stream is being parsed by the Racket
read-syntax
function. Nothing requires read
and read-syntax
to
parse input in the same way, but making them different would confuse
programmers and tools.
The read-syntax
function can return the same kind of value as read
,
but it should normally return a syntax object that connects the parsed
expression with source locations. Unlike the "five.rkt"
example, the
read-syntax
function is typically implemented directly to produce
syntax objects, and then read
can use read-syntax
and strip away
syntax object wrappers to produce a raw result.
The following "arith.rkt"
module implements a reader to parse simple
infix arithmetic expressions into Racket forms. For example, 1*2+3
parses into the Racket form (+ (* 1 2) 3)
. The supported operators are
+
, -
, *
, and /
, while operands can be unsigned integers or
single-letter variables. The implementation uses port-next-location
to
obtain the current source location, and it uses datum->syntax
to turn
raw values into syntax objects.
"arith.rkt"
#lang racket
(require syntax/readerr)
(provide read read-syntax)
(define (read in)
(syntax->datum (read-syntax #f in)))
(define (read-syntax src in)
(skip-whitespace in)
(read-arith src in))
(define (skip-whitespace in)
(regexp-match #px"^\\s*" in))
(define (read-arith src in)
(define-values (line col pos) (port-next-location in))
(define expr-match
(regexp-match
; Match an operand followed by any number of
; operator–operand sequences, and prohibit an
; additional operator from following immediately:
#px"^([a-z]|[0-9]+)(?:[-+*/]([a-z]|[0-9]+))*(?![-+*/])"
in))
(define (to-syntax v delta span-str)
(datum->syntax #f v (make-srcloc delta span-str)))
(define (make-srcloc delta span-str)
(and line
(vector src line (+ col delta) (+ pos delta)
(string-length span-str))))
(define (parse-expr s delta)
(match (or (regexp-match #rx"^(.*?)([+-])(.*)$" s)
(regexp-match #rx"^(.*?)([*/])(.*)$" s))
[(list _ a-str op-str b-str)
(define a-len (string-length a-str))
(define a (parse-expr a-str delta))
(define b (parse-expr b-str (+ delta 1 a-len)))
(define op (to-syntax (string->symbol op-str)
(+ delta a-len) op-str))
(to-syntax (list op a b) delta s)]
[_ (to-syntax (or (string->number s)
(string->symbol s))
delta s)]))
(unless expr-match
(raise-read-error "bad arithmetic syntax"
src line col pos
(and pos (- (file-position in) pos))))
(parse-expr (bytes->string/utf-8 (car expr-match)) 0))
If the "arith.rkt"
reader is used in an expression position, then its
parse result will be treated as a Racket expression. If it is used in a
quoted form, however, then it just produces a number or a list:
> #reader"arith.rkt" 1*2+3
5
> '#reader"arith.rkt" 1*2+3
'(+ (* 1 2) 3)
The "arith.rkt"
reader could also be used in positions that make no
sense. Since the read-syntax
implementation tracks source locations,
syntax errors can at least refer to parts of the input in terms of their
original locations at the beginning of the error message
:
> (let #reader"arith.rkt" 1*2+3 8)
repl:1:27: let: bad syntax (not an identifier and expression
for a binding)
at: +
in: (let (+ (* 1 2) 3) 8)
2. Readtables
A reader extension’s ability to parse input characters in an arbitrary way can be powerful, but many cases of lexical extension call for a less general but more composable approach. In much the same way that the expander level of Racket syntax can be extended through macros, the reader level of Racket syntax can be composably extended through a readtable.
The Racket reader is a recursive-descent parser, and the readtable maps
characters to parsing handlers. For example, the default readtable maps
(
to a handler that recursively parses subforms until it finds a )
.
The current-readtable
parameter determines the readtable that is used
by read
or read-syntax
. Rather than parsing raw characters directly,
a reader extension can install an extended readtable and then chain to
read
or read-syntax
.
+See [missing] for an introduction to parameters.
The make-readtable
function constructs a new readtable as an extension
of an existing one. It accepts a sequence of specifications in terms of
a character, a type of mapping for the character, and (for certain
types of mappings) a parsing procedure. For example, to extend the
readtable so that $
can be used to start and end infix expressions,
implement a read-dollar
function and use:
(make-readtable (current-readtable)
#\$ 'terminating-macro read-dollar)
The protocol for read-dollar
requires the function to accept different
numbers of arguments depending on whether it is being used in read
or
read-syntax
mode. In read
mode, the parser function is given two
arguments: the character that triggered the parser function and the
input port that is being read. In read-syntax
mode, the function must
accept four additional arguments that provide the source location of the
character.
The following "dollar.rkt"
module defines a read-dollar
function in
terms of the read
and read-syntax
functions provided by
"arith.rkt"
, and it puts read-dollar
together with new read
and
read-syntax
functions that install the readtable and chain to Racket’s
read
or read-syntax
:
"dollar.rkt"
#lang racket
(require syntax/readerr
(prefix-in arith: "arith.rkt"))
(provide (rename-out [$-read read]
[$-read-syntax read-syntax]))
(define ($-read in)
(parameterize ([current-readtable (make-$-readtable)])
(read in)))
(define ($-read-syntax src in)
(parameterize ([current-readtable (make-$-readtable)])
(read-syntax src in)))
(define (make-$-readtable)
(make-readtable (current-readtable)
#\$ 'terminating-macro read-dollar))
(define read-dollar
(case-lambda
[(ch in)
(check-$-after (arith:read in) in (object-name in))]
[(ch in src line col pos)
(check-$-after (arith:read-syntax src in) in src)]))
(define (check-$-after val in src)
(regexp-match #px"^\\s*" in) ; skip whitespace
(let ([ch (peek-char in)])
(unless (equal? ch #\$) (bad-ending ch src in))
(read-char in))
val)
(define (bad-ending ch src in)
(let-values ([(line col pos) (port-next-location in)])
((if (eof-object? ch)
raise-read-error
raise-read-eof-error)
"expected a closing `$'"
src line col pos
(if (eof-object? ch) 0 1))))
With this reader extension, a single #reader
can be used at the
beginning of an expression to enable multiple uses of $
that switch to
infix arithmetic:
> #reader"dollar.rkt" (let ([a $1*2+3$] [b $5/6$]) $a+b$)
35/6