You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
typesetting/quad/qtest/mds/reader-extension.md

282 lines
12 KiB
Markdown

This file contains invisible Unicode characters!

This file contains invisible Unicode characters that may be processed differently from what appears below. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to reveal hidden characters.

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# Reader Extensions
> +\[missing\] in \[missing\] provides more on reader extensions.
The reader layer of the Racket language can be extended through the
`#reader` form. A reader extension is implemented as a module that is
named after `#reader`. The module exports functions that parse raw
characters into a form to be consumed by the expander layer.
The syntax of `#reader` is
`#reader` >_module-path_< >_reader-specific_<
where >_module-path_< names a module that provides `read` and
`read-syntax` functions. The >_reader-specific_< part is a sequence of
characters that is parsed as determined by the `read` and `read-syntax`
functions from >_module-path_<.
For example, suppose that file `"five.rkt"` contains
`"five.rkt"`
```racket
#lang racket/base
(provide read read-syntax)
(define (read in) (list (read-string 5 in)))
(define (read-syntax src in) (list (read-string 5 in)))
```
Then, the program
```racket
#lang racket/base
'(1 #reader"five.rkt"234567 8)
```
is equivalent to
```racket
#lang racket/base
'(1 ("23456") 7 8)
```
because the `read` and `read-syntax` functions of `"five.rkt"` both read
five characters from the input stream and put them into a string and
then a list. The reader functions from `"five.rkt"` are not obliged to
follow Racket lexical conventions and treat the continuous sequence
`234567` as a single number. Since only the `23456` part is consumed by
`read` or `read-syntax`, the `7` remains to be parsed in the usual
Racket way. Similarly, the reader functions from `"five.rkt"` are not
obliged to ignore whitespace, and
```racket
#lang racket/base
'(1 #reader"five.rkt" 234567 8)
```
is equivalent to
```racket
#lang racket/base
'(1 (" 2345") 67 8)
```
since the first character immediately after `"five.rkt"` is a space.
A `#reader` form can be used in the REPL, too:
```racket
> '#reader"five.rkt"abcde
'("abcde")
```
## 1. Source Locations
The difference between `read` and `read-syntax` is that `read` is meant
to be used for data while `read-syntax` is meant to be used to parse
programs. More precisely, the `read` function will be used when the
enclosing stream is being parsed by the Racket `read`, and `read-syntax`
is used when the enclosing stream is being parsed by the Racket
`read-syntax` function. Nothing requires `read` and `read-syntax` to
parse input in the same way, but making them different would confuse
programmers and tools.
The `read-syntax` function can return the same kind of value as `read`,
but it should normally return a syntax object that connects the parsed
expression with source locations. Unlike the `"five.rkt"` example, the
`read-syntax` function is typically implemented directly to produce
syntax objects, and then `read` can use `read-syntax` and strip away
syntax object wrappers to produce a raw result.
The following `"arith.rkt"` module implements a reader to parse simple
infix arithmetic expressions into Racket forms. For example, `1*2+3`
parses into the Racket form `(+ (* 1 2) 3)`. The supported operators are
`+`, `-`, `*`, and `/`, while operands can be unsigned integers or
single-letter variables. The implementation uses `port-next-location` to
obtain the current source location, and it uses `datum->syntax` to turn
raw values into syntax objects.
`"arith.rkt"`
```racket
#lang racket
(require syntax/readerr)
(provide read read-syntax)
(define (read in)
(syntax->datum (read-syntax #f in)))
(define (read-syntax src in)
(skip-whitespace in)
(read-arith src in))
(define (skip-whitespace in)
(regexp-match #px"^\\s*" in))
(define (read-arith src in)
(define-values (line col pos) (port-next-location in))
(define expr-match
(regexp-match
; Match an operand followed by any number of
; operatoroperand sequences, and prohibit an
; additional operator from following immediately:
#px"^([a-z]|[0-9]+)(?:[-+*/]([a-z]|[0-9]+))*(?![-+*/])"
in))
(define (to-syntax v delta span-str)
(datum->syntax #f v (make-srcloc delta span-str)))
(define (make-srcloc delta span-str)
(and line
(vector src line (+ col delta) (+ pos delta)
(string-length span-str))))
(define (parse-expr s delta)
(match (or (regexp-match #rx"^(.*?)([+-])(.*)$" s)
(regexp-match #rx"^(.*?)([*/])(.*)$" s))
[(list _ a-str op-str b-str)
(define a-len (string-length a-str))
(define a (parse-expr a-str delta))
(define b (parse-expr b-str (+ delta 1 a-len)))
(define op (to-syntax (string->symbol op-str)
(+ delta a-len) op-str))
(to-syntax (list op a b) delta s)]
[_ (to-syntax (or (string->number s)
(string->symbol s))
delta s)]))
(unless expr-match
(raise-read-error "bad arithmetic syntax"
src line col pos
(and pos (- (file-position in) pos))))
(parse-expr (bytes->string/utf-8 (car expr-match)) 0))
```
If the `"arith.rkt"` reader is used in an expression position, then its
parse result will be treated as a Racket expression. If it is used in a
quoted form, however, then it just produces a number or a list:
```racket
> #reader"arith.rkt" 1*2+3
5
> '#reader"arith.rkt" 1*2+3
'(+ (* 1 2) 3)
```
The `"arith.rkt"` reader could also be used in positions that make no
sense. Since the `read-syntax` implementation tracks source locations,
syntax errors can at least refer to parts of the input in terms of their
original locations \(at the beginning of the error message\):
```racket
> (let #reader"arith.rkt" 1*2+3 8)
repl:1:27: let: bad syntax (not an identifier and expression
for a binding)
at: +
in: (let (+ (* 1 2) 3) 8)
```
## 2. Readtables
A reader extensions ability to parse input characters in an arbitrary
way can be powerful, but many cases of lexical extension call for a less
general but more composable approach. In much the same way that the
expander level of Racket syntax can be extended through macros, the
reader level of Racket syntax can be composably extended through a
_readtable_.
The Racket reader is a recursive-descent parser, and the readtable maps
characters to parsing handlers. For example, the default readtable maps
`(` to a handler that recursively parses subforms until it finds a `)`.
The `current-readtable` parameter determines the readtable that is used
by `read` or `read-syntax`. Rather than parsing raw characters directly,
a reader extension can install an extended readtable and then chain to
`read` or `read-syntax`.
> +See \[missing\] for an introduction to parameters.
The `make-readtable` function constructs a new readtable as an extension
of an existing one. It accepts a sequence of specifications in terms of
a character, a type of mapping for the character, and \(for certain
types of mappings\) a parsing procedure. For example, to extend the
readtable so that `$` can be used to start and end infix expressions,
implement a `read-dollar` function and use:
```racket
(make-readtable (current-readtable)
#\$ 'terminating-macro read-dollar)
```
The protocol for `read-dollar` requires the function to accept different
numbers of arguments depending on whether it is being used in `read` or
`read-syntax` mode. In `read` mode, the parser function is given two
arguments: the character that triggered the parser function and the
input port that is being read. In `read-syntax` mode, the function must
accept four additional arguments that provide the source location of the
character.
The following `"dollar.rkt"` module defines a `read-dollar` function in
terms of the `read` and `read-syntax` functions provided by
`"arith.rkt"`, and it puts `read-dollar` together with new `read` and
`read-syntax` functions that install the readtable and chain to Rackets
`read` or `read-syntax`:
`"dollar.rkt"`
```racket
#lang racket
(require syntax/readerr
(prefix-in arith: "arith.rkt"))
(provide (rename-out [$-read read]
[$-read-syntax read-syntax]))
(define ($-read in)
(parameterize ([current-readtable (make-$-readtable)])
(read in)))
(define ($-read-syntax src in)
(parameterize ([current-readtable (make-$-readtable)])
(read-syntax src in)))
(define (make-$-readtable)
(make-readtable (current-readtable)
#\$ 'terminating-macro read-dollar))
(define read-dollar
(case-lambda
[(ch in)
(check-$-after (arith:read in) in (object-name in))]
[(ch in src line col pos)
(check-$-after (arith:read-syntax src in) in src)]))
(define (check-$-after val in src)
(regexp-match #px"^\\s*" in) ; skip whitespace
(let ([ch (peek-char in)])
(unless (equal? ch #\$) (bad-ending ch src in))
(read-char in))
val)
(define (bad-ending ch src in)
(let-values ([(line col pos) (port-next-location in)])
((if (eof-object? ch)
raise-read-error
raise-read-eof-error)
"expected a closing `$'"
src line col pos
(if (eof-object? ch) 0 1))))
```
With this reader extension, a single `#reader` can be used at the
beginning of an expression to enable multiple uses of `$` that switch to
infix arithmetic:
```racket
> #reader"dollar.rkt" (let ([a $1*2+3$] [b $5/6$]) $a+b$)
35/6
```