55 KiB
General Macro Transformers
The define-syntax
form creates a transformer binding for an
identifier, which is a binding that can be used at compile time while
expanding expressions to be evaluated at run time. The compile-time
value associated with a transformer binding can be anything; if it is a
procedure of one argument, then the binding is used as a macro, and the
procedure is the macro transformer.
1 Syntax Objects
2 Macro Transformer Procedures
3 Mixing Patterns and Expressions: `syntax-case`
4 `with-syntax` and `generate-temporaries`
5 Compile and Run-Time Phases
6 General Phase Levels
6.1 Phases and Bindings
6.2 Phases and Modules
7 Syntax Taints
1. Syntax Objects
The input and output of a macro transformer (i.e., source and
replacement forms) are represented as syntax objects. A syntax object
contains symbols, lists, and constant values such as numbers
that
essentially correspond to the quote
d form of the expression. For
example, a representation of the expression (+ 1 2)
contains the
symbol '+
and the numbers 1
and 2
, all in a list. In addition to
this quoted content, a syntax object associates source-location and
lexical-binding information with each part of the form. The
source-location information is used when reporting syntax errors (for
example), and the lexical-binding information allows the macro system
to maintain lexical scope. To accommodate this extra information, the
represention of the expression (+ 1 2)
is not merely '(+ 1 2)
, but a
packaging of '(+ 1 2)
into a syntax object.
To create a literal syntax object, use the syntax
form:
> (syntax (+ 1 2))
#<syntax:eval:1:0 (+ 1 2)>
In the same way that '
abbreviates quote
, #'
abbreviates syntax
:
> #'(+ 1 2)
#<syntax:eval:1:0 (+ 1 2)>
A syntax object that contains just a symbol is an identifier syntax
object. Racket provides some additional operations specific to
identifier syntax objects, including the identifier?
operation to
detect identifiers. Most notably, free-identifier=?
determines
whether two identifiers refer to the same binding:
> (identifier? #'car)
#t
> (identifier? #'(+ 1 2))
#f
> (free-identifier=? #'car #'cdr)
#f
> (free-identifier=? #'car #'car)
#t
> (require (only-in racket/base [car also-car]))
> (free-identifier=? #'car #'also-car)
#t
To see the lists, symbols, numbers, etc. within a syntax object, use
syntax->datum
:
> (syntax->datum #'(+ 1 2))
'(+ 1 2)
The syntax-e
function is similar to syntax->datum
, but it unwraps a
single layer of source-location and lexical-context information, leaving
sub-forms that have their own information wrapped as syntax objects:
> (syntax-e #'(+ 1 2))
'(#<syntax:eval:1:0 +> #<syntax:eval:1:0 1> #<syntax:eval:1:0 2>)
The syntax-e
function always leaves syntax-object wrappers around
sub-forms that are represented via symbols, numbers, and other literal
values. The only time it unwraps extra sub-forms is when unwrapping a
pair, in which case the cdr
of the pair may be recursively unwrapped,
depending on how the syntax object was constructed.
The opposite of syntax->datum
is, of course, datum->syntax
. In
addition to a datum like '(+ 1 2)
, datum->syntax
needs an existing
syntax object to donate its lexical context, and optionally another
syntax object to donate its source location:
> (datum->syntax #'lex
'(+ 1 2)
#'srcloc)
#<syntax:eval:1:0 (+ 1 2)>
In the above example, the lexical context of #'lex
is used for the new
syntax object, while the source location of #'srcloc
is used.
When the second i.e., the “datum”
argument to datum->syntax
includes syntax objects, those syntax objects are preserved intact in
the result. That is, deconstructing the result with syntax-e
eventually produces the syntax objects that were given to
datum->syntax
.
2. Macro Transformer Procedures
Any procedure of one argument can be a macro transformer. As it turns
out, the syntax-rules
form is a macro that expands to a procedure
form. For example, if you evaluate a syntax-rules
form directly
instead of placing on the right-hand of a `define-syntax` form
, the
result is a procedure:
> (syntax-rules () [(nothing) something])
#<procedure>
Instead of using syntax-rules
, you can write your own macro
transformer procedure directly using lambda
. The argument to the
procedure is a syntax object that represents the source form, and the
result of the procedure must be a syntax object that represents the
replacement form:
> (define-syntax self-as-string
(lambda (stx)
(datum->syntax stx
(format "~s" (syntax->datum stx)))))
> (self-as-string (+ 1 2))
"(self-as-string (+ 1 2))"
The source form passed to a macro transformer represents an expression
in which its identifier is used in an application position (i.e., after
a parenthesis that starts an expression), or it represents the
identifier by itself if it is used as an expression position and not in
an application position.The procedure produced by syntax-rules
raises
a syntax error if its argument corresponds to a use of the identifier by
itself, which is why syntax-rules
does not implement an identifier
macro.
> (self-as-string (+ 1 2))
"(self-as-string (+ 1 2))"
> self-as-string
"self-as-string"
The define-syntax
form supports the same shortcut syntax for functions
as define
, so that the following self-as-string
definition is
equivalent to the one that uses lambda
explicitly:
> (define-syntax (self-as-string stx)
(datum->syntax stx
(format "~s" (syntax->datum stx))))
> (self-as-string (+ 1 2))
"(self-as-string (+ 1 2))"
3. Mixing Patterns and Expressions: syntax-case
The procedure generated by syntax-rules
internally uses syntax-e
to
deconstruct the given syntax object, and it uses datum->syntax
to
construct the result. The syntax-rules
form doesn’t provide a way to
escape from pattern-matching and template-construction mode into an
arbitrary Racket expression.
The syntax-case
form lets you mix pattern matching, template
construction, and arbitrary expressions:
(syntax-case stx-expr (literal-id ...)
[pattern expr]
...)
Unlike syntax-rules
, the syntax-case
form does not produce a
procedure. Instead, it starts with a stx-expr
expression that
determines the syntax object to match against the pattern
s. Also, each
syntax-case
clause has a pattern
and expr
, instead of a pattern
and template
. Within an expr
, the syntax
form—usually abbreviated
with #'
—shifts into template-construction mode; if the expr
of a
clause starts with #'
, then we have something like a syntax-rules
form:
> (syntax->datum
(syntax-case #'(+ 1 2) ()
[(op n1 n2) #'(- n1 n2)]))
'(- 1 2)
We could write the swap
macro using syntax-case
instead of
define-syntax-rule
or syntax-rules
:
(define-syntax (swap stx)
(syntax-case stx ()
[(swap x y) #'(let ([tmp x])
(set! x y)
(set! y tmp))]))
One advantage of using syntax-case
is that we can provide better error
reporting for swap
. For example, with the define-syntax-rule
definition of swap
, then (swap x 2)
produces a syntax error in terms
of set!
, because 2
is not an identifier. We can refine our
syntax-case
implementation of swap
to explicitly check the
sub-forms:
(define-syntax (swap stx)
(syntax-case stx ()
[(swap x y)
(if (and (identifier? #'x)
(identifier? #'y))
#'(let ([tmp x])
(set! x y)
(set! y tmp))
(raise-syntax-error #f
"not an identifier"
stx
(if (identifier? #'x)
#'y
#'x)))]))
With this definition, (swap x 2)
provides a syntax error originating
from swap
instead of set!
.
In the above definition of swap
, #'x
and #'y
are templates, even
though they are not used as the result of the macro transformer. This
example illustrates how templates can be used to access pieces of the
input syntax, in this case for checking the form of the pieces. Also,
the match for #'x
or #'y
is used in the call to
raise-syntax-error
, so that the syntax-error message can point
directly to the source location of the non-identifier.
4. with-syntax
and generate-temporaries
Since syntax-case
lets us compute with arbitrary Racket expressions,
we can more simply solve a problem that we had in writing
define-for-cbr
see \[missing\]
, where we needed to generate a set
of names based on a sequence id ...
:
(define-syntax (define-for-cbr stx)
(syntax-case stx ()
[(_ do-f (id ...) body)
....
#'(define (do-f get ... put ...)
(define-get/put-id id get put) ...
body) ....]))
In place of the ....
s above, we need to bind get ...
and put ...
to lists of generated identifiers. We cannot use let
to bind get
and
put
, because we need bindings that count as pattern variables, instead
of normal local variables. The with-syntax
form lets us bind pattern
variables:
(define-syntax (define-for-cbr stx)
(syntax-case stx ()
[(_ do-f (id ...) body)
(with-syntax ([(get ...) ....]
[(put ...) ....])
#'(define (do-f get ... put ...)
(define-get/put-id id get put) ...
body))]))
Now we need an expression in place of ....
that generates as many
identifiers as there are id
matches in the original pattern. Since
this is a common task, Racket provides a helper function,
generate-temporaries
, that takes a sequence of identifiers and returns
a sequence of generated identifiers:
(define-syntax (define-for-cbr stx)
(syntax-case stx ()
[(_ do-f (id ...) body)
(with-syntax ([(get ...) (generate-temporaries #'(id ...))]
[(put ...) (generate-temporaries #'(id ...))])
#'(define (do-f get ... put ...)
(define-get/put-id id get put) ...
body))]))
This way of generating identifiers is normally easier to think about than tricking the macro expander into generating names with purely pattern-based macros.
In general, the left-hand side of a with-syntax
binding is a pattern,
just like in syntax-case
. In fact, a with-syntax
form is just a
syntax-case
form turned partially inside-out.
5. Compile and Run-Time Phases
As sets of macros get more complicated, you might want to write your own
helper functions, like generate-temporaries
. For example, to provide
good syntax error messsages, swap
, rotate
, and define-cbr
all
should check that certain sub-forms in the source form are identifiers.
We could use a check-ids
function to perform this checking everywhere:
(define-syntax (swap stx)
(syntax-case stx ()
[(swap x y) (begin
(check-ids stx #'(x y))
#'(let ([tmp x])
(set! x y)
(set! y tmp)))]))
(define-syntax (rotate stx)
(syntax-case stx ()
[(rotate a c ...)
(begin
(check-ids stx #'(a c ...))
#'(shift-to (c ... a) (a c ...)))]))
The check-ids
function can use the syntax->list
function to convert
a syntax-object wrapping a list into a list of syntax objects:
(define (check-ids stx forms)
(for-each
(lambda (form)
(unless (identifier? form)
(raise-syntax-error #f
"not an identifier"
stx
form)))
(syntax->list forms)))
If you define swap
and check-ids
in this way, however, it doesn’t
work:
> (let ([a 1] [b 2]) (swap a b))
check-ids: undefined;
cannot reference an identifier before its definition
in module: top-level
The problem is that check-ids
is defined as a run-time expression, but
swap
is trying to use it at compile time. In interactive mode, compile
time and run time are interleaved, but they are not interleaved within
the body of a module, and they are not interleaved across modules that
are compiled ahead-of-time. To help make all of these modes treat code
consistently, Racket separates the binding spaces for different phases.
To define a check-ids
function that can be referenced at compile time,
use begin-for-syntax
:
(begin-for-syntax
(define (check-ids stx forms)
(for-each
(lambda (form)
(unless (identifier? form)
(raise-syntax-error #f
"not an identifier"
stx
form)))
(syntax->list forms))))
With this for-syntax definition, then swap
works:
> (let ([a 1] [b 2]) (swap a b) (list a b))
'(2 1)
> (swap a 1)
eval:13:0: swap: not an identifier
at: 1
in: (swap a 1)
When organizing a program into modules, you may want to put helper
functions in one module to be used by macros that reside on other
modules. In that case, you can write the helper function using define
:
"utils.rkt"
#lang racket
(provide check-ids)
(define (check-ids stx forms)
(for-each
(lambda (form)
(unless (identifier? form)
(raise-syntax-error #f
"not an identifier"
stx
form)))
(syntax->list forms)))
Then, in the module that implements macros, import the helper function
using (require (for-syntax "utils.rkt"))
instead of (require "utils.rkt")
:
#lang racket
(require (for-syntax "utils.rkt"))
(define-syntax (swap stx)
(syntax-case stx ()
[(swap x y) (begin
(check-ids stx #'(x y))
#'(let ([tmp x])
(set! x y)
(set! y tmp)))]))
Since modules are separately compiled and cannot have circular
dependencies, the "utils.rkt"
module’s run-time body can be compiled
before the compiling the module that implements swap
. Thus, the
run-time definitions in "utils.rkt"
can be used to implement swap
,
as long as they are explicitly shifted into compile time by (require (for-syntax ....))
.
The racket
module provides syntax-case
, generate-temporaries
,
lambda
, if
, and more for use in both the run-time and compile-time
phases. That is why we can use syntax-case
in the racket
REPL both
directly and in the right-hand side of a define-syntax
form.
The racket/base
module, in contrast, exports those bindings only in
the run-time phase. If you change the module above that defines swap
so that it uses the racket/base
language instead of racket
, then it
no longer works. Adding (require (for-syntax racket/base))
imports
syntax-case
and more into the compile-time phase, so that the module
works again.
Suppose that define-syntax
is used to define a local macro in the
right-hand side of a define-syntax
form. In that case, the right-hand
side of the inner define-syntax
is in the meta-compile phase level,
also known as phase level 2. To import syntax-case
into that phase
level, you would have to use (require (for-syntax (for-syntax racket/base)))
or, equivalently, (require (for-meta 2 racket/base))
.
For example,
#lang racket/base
(require ;; This provides the bindings for the definition
;; of shell-game.
(for-syntax racket/base)
;; And this for the definition of
;; swap.
(for-syntax (for-syntax racket/base)))
(define-syntax (shell-game stx)
(define-syntax (swap stx)
(syntax-case stx ()
[(_ a b)
#'(let ([tmp a])
(set! a b)
(set! b tmp))]))
(syntax-case stx ()
[(_ a b c)
(let ([a #'a] [b #'b] [c #'c])
(when (= 0 (random 2)) (swap a b))
(when (= 0 (random 2)) (swap b c))
(when (= 0 (random 2)) (swap a c))
#`(list #,a #,b #,c))]))
(shell-game 3 4 5)
(shell-game 3 4 5)
(shell-game 3 4 5)
Negative phase levels also exist. If a macro uses a helper function that
is imported for-syntax
, and if the helper function returns
syntax-object constants generated by syntax
, then identifiers in the
syntax will need bindings at phase level -1, also known as the
template phase level, to have any binding at the run-time phase level
relative to the module that defines the macro.
For instance, the swap-stx
helper function in the example below is not
a syntax transformer—it’s just an ordinary function—but it produces
syntax objects that get spliced into the result of shell-game
.
Therefore, its containing helper
submodule needs to be imported at
shell-game
’s phase 1 with (require (for-syntax 'helper))
.
But from the perspective of swap-stx
, its results will ultimately be
evaluated at phase level -1, when the syntax returned by shell-game
is
evaluated. In other words, a negative phase level is a positive phase
level from the opposite direction: shell-game
’s phase 1 is
swap-stx
’s phase 0, so shell-game
’s phase 0 is swap-stx
’s phase
-1. And that’s why this example won’t work—the 'helper
submodule has
no bindings at phase -1.
#lang racket/base
(require (for-syntax racket/base))
(module helper racket/base
(provide swap-stx)
(define (swap-stx a-stx b-stx)
#`(let ([tmp #,a-stx])
(set! #,a-stx #,b-stx)
(set! #,b-stx tmp))))
(require (for-syntax 'helper))
(define-syntax (shell-game stx)
(syntax-case stx ()
[(_ a b c)
#`(begin
#,(swap-stx #'a #'b)
#,(swap-stx #'b #'c)
#,(swap-stx #'a #'c)
(list a b c))]))
(define x 3)
(define y 4)
(define z 5)
(shell-game x y z)
To repair this example, we add (require (for-template racket/base))
to
the 'helper
submodule.
#lang racket/base
(require (for-syntax racket/base))
(module helper racket/base
(require (for-template racket/base)) ; binds `let` and `set!` at phase -1
(provide swap-stx)
(define (swap-stx a-stx b-stx)
#`(let ([tmp #,a-stx])
(set! #,a-stx #,b-stx)
(set! #,b-stx tmp))))
(require (for-syntax 'helper))
(define-syntax (shell-game stx)
(syntax-case stx ()
[(_ a b c)
#`(begin
#,(swap-stx #'a #'b)
#,(swap-stx #'b #'c)
#,(swap-stx #'a #'c)
(list a b c))]))
(define x 3)
(define y 4)
(define z 5)
(shell-game x y z)
(shell-game x y z)
(shell-game x y z)
6. General Phase Levels
A phase can be thought of as a way to separate computations in a pipeline of processes where one produces code that is used by the next. (E.g., a pipeline that consists of a preprocessor process, a compiler, and an assembler.)
Imagine starting two Racket processes for this purpose. If you ignore inter-process communication channels like sockets and files, the processes will have no way to share anything other than the text that is piped from the standard output of one process into the standard input of the other. Similarly, Racket effectively allows multiple invocations of a module to exist in the same process but separated by phase. Racket enforces separation of such phases, where different phases cannot communicate in any way other than via the protocol of macro expansion, where the output of one phases is the code used in the next.
6.1. Phases and Bindings
Every binding of an identifier exists in a particular phase. The link
between a binding and its phase is represented by an integer phase
level. Phase level 0 is the phase used for “plain” or “runtime”
definitions, so
(define
age
5)
adds a binding for age
into phase level 0. The identifier age
can
be defined at a higher phase level using begin-for-syntax
:
(begin-for-syntax
(define age 5))
With a single begin-for-syntax
wrapper, age
is defined at phase
level 1. We can easily mix these two definitions in the same module or
in a top-level namespace, and there is no clash between the two age
s
that are defined at different phase levels:
> (define age 3)
> (begin-for-syntax
(define age 9))
The age
binding at phase level 0 has a value of 3, and the age
binding at phase level 1 has a value of 9.
Syntax objects capture binding information as a first-class value. Thus,
#'age
is a syntax object that represents the age
binding—but since there are
two age
s one at phase level 0 and one at phase level 1
, which one
does it capture? In fact, Racket imbues #'age
with lexical
information for all phase levels, so the answer is that #'age
captures
both.
The relevant binding of age
captured by #'age
is determined when
#'age
is eventually used. As an example, we bind #'age
to a pattern
variable so we can use it in a template, and then we eval
uate the
template: We use eval
here to demonstrate phases, but see [missing]
for caveats about eval
.
> (eval (with-syntax ([age #'age])
#'(displayln age)))
3
The result is 3
because age
is used at phase 0 level. We can try
again with the use of age
inside begin-for-syntax
:
> (eval (with-syntax ([age #'age])
#'(begin-for-syntax
(displayln age))))
9
In this case, the answer is 9
, because we are using age
at phase
level 1 instead of 0 (i.e., begin-for-syntax
evaluates its
expressions at phase level 1). So, you can see that we started with the
same syntax object, #'age
, and we were able to use it in two different
ways: at phase level 0 and at phase level 1.
A syntax object has a lexical context from the moment it first exists. A
syntax object that is provided from a module retains its lexical
context, and so it references bindings in the context of its source
module, not the context of its use. The following example defines
button
at phase level 0 and binds it to 0
, while see-button
binds
the syntax object for button
in module a
:
> (module a racket
(define button 0)
(provide (for-syntax see-button))
; Why not (define see-button #'button)? We explain later.
(define-for-syntax see-button #'button))
> (module b racket
(require 'a)
(define button 8)
(define-syntax (m stx)
see-button)
(m))
> (require 'b)
0
The result of the m
macro is the value of see-button
, which is
#'button
with the lexical context of the a
module. Even though
there is another button
in b
, the second button
will not confuse
Racket, because the lexical context of #'button
(the value bound to
see-button
) is a
.
Note that see-button
is bound at phase level 1 by virtue of defining
it with define-for-syntax
. Phase level 1 is needed because m
is a
macro, so its body executes at one phase higher than the context of its
definition. Since m
is defined at phase level 0, its body is at phase
level 1, so any bindings referenced by the body must be at phase level
1.
6.2. Phases and Modules
A phase level is a module-relative concept. When importing from another
module via require
, Racket lets us shift imported bindings to a phase
level that is different from the original one:
(require "a.rkt") ; import with no phase shift
(require (for-syntax "a.rkt")) ; shift phase by +1
(require (for-template "a.rkt")) ; shift phase by -1
(require (for-meta 5 "a.rkt")) ; shift phase by +5
That is, using for-syntax
in require
means that all of the bindings
from that module will have their phase levels increased by one. A
binding that is define
d at phase level 0 and imported with
for-syntax
becomes a phase-level 1 binding:
> (module c racket
(define x 0) ; defined at phase level 0
(provide x))
> (module d racket
(require (for-syntax 'c))
; has a binding at phase level 1, not 0:
#'x)
Let’s see what happens if we try to create a binding for the #'button
syntax object at phase level 0:
> (define button 0)
> (define see-button #'button)
Now both button
and see-button
are defined at phase 0. The lexical
context of #'button
will know that there is a binding for button
at
phase 0. In fact, it seems like things are working just fine if we try
to eval
see-button
:
> (eval see-button)
0
Now, let’s use see-button
in a macro:
> (define-syntax (m stx)
see-button)
> (m)
see-button: undefined;
cannot reference an identifier before its definition
in module: top-level
Clearly, see-button
is not defined at phase level 1, so we cannot
refer to it inside the macro body. Let’s try to use see-button
in
another module by putting the button definitions in a module and
importing it at phase level 1. Then, we will get see-button
at phase
level 1:
> (module a racket
(define button 0)
(define see-button #'button)
(provide see-button))
> (module b racket
(require (for-syntax 'a)) ; gets see-button at phase level 1
(define-syntax (m stx)
see-button)
(m))
eval:1:0: button: unbound identifier;
also, no #%top syntax transformer is bound
in: button
Racket says that button
is unbound now! When a
is imported at phase
level 1, we have the following bindings:
button at phase level 1
see-button at phase level 1
So the macro m
can see a binding for see-button
at phase level 1 and
will return the #'button
syntax object, which refers to button
binding at phase level 1. But the use of m
is at phase level 0, and
there is no button
at phase level 0 in b
. That is why see-button
needs to be bound at phase level 1, as in the original a
. In the
original b
, then, we have the following bindings:
button at phase level 0
see-button at phase level 1
In this scenario, we can use see-button
in the macro, since
see-button
is bound at phase level 1. When the macro expands, it will
refer to a button
binding at phase level 0.
Defining see-button
with (define see-button #'button)
isn’t
inherently wrong; it depends on how we intend to use see-button
. For
example, we can arrange for m
to sensibly use see-button
because it
puts it in a phase level 1 context using begin-for-syntax
:
> (module a racket
(define button 0)
(define see-button #'button)
(provide see-button))
> (module b racket
(require (for-syntax 'a))
(define-syntax (m stx)
(with-syntax ([x see-button])
#'(begin-for-syntax
(displayln x))))
(m))
0
In this case, module b
has both button
and see-button
bound at
phase level 1. The expansion of the macro is
(begin-for-syntax
(displayln button))
which works, because button
is bound at phase level 1.
Now, you might try to cheat the phase system by importing a
at both
phase level 0 and phase level 1. Then you would have the following
bindings
button at phase level 0
see-button at phase level 0
button at phase level 1
see-button at phase level 1
You might expect now that see-button
in a macro would work, but it
doesn’t:
> (module a racket
(define button 0)
(define see-button #'button)
(provide see-button))
> (module b racket
(require 'a
(for-syntax 'a))
(define-syntax (m stx)
see-button)
(m))
eval:1:0: button: unbound identifier;
also, no #%top syntax transformer is bound
in: button
The see-button
inside macro m
comes from the (for-syntax 'a)
import. For macro m
to work, it needs to have button
bound at phase
0. That binding exists—it’s implied by (require 'a)
. However,
(require 'a)
and (require (for-syntax 'a))
are different
instantiations of the same module. The see-button
at phase 1 only
refers to the button
at phase 1, not the button
bound at phase 0
from a different instantiation—even from the same source module.
This kind of phase-level mismatch between instantiations can be repaired
with syntax-shift-phase-level
. Recall that a syntax object like
#'button
captures lexical information at all phase levels. The
problem here is that see-button
is invoked at phase 1, but needs to
return a syntax object that can be evaluated at phase 0. By default,
see-button
is bound to #'button
at the same phase level. But with
syntax-shift-phase-level
, we can make see-button
refer to #'button
at a different relative phase level. In this case, we use a phase shift
of -1
to make see-button
at phase 1 refer to #'button
at phase 0.
(Because the phase shift happens at every level, it will also make
see-button
at phase 0 refer to #'button
at phase -1.)
Note that syntax-shift-phase-level
merely creates a reference across
phases. To make that reference work, we still need to instantiate our
module at both phases so the reference and its target have their
bindings available. Thus, in module 'b
, we still import module 'a
at
both phase 0 and phase 1—using (require 'a (for-syntax 'a))
—so we have
a phase-1 binding for see-button
and a phase-0 binding for button
.
Now macro m
will work.
> (module a racket
(define button 0)
(define see-button (syntax-shift-phase-level #'button -1))
(provide see-button))
> (module b racket
(require 'a (for-syntax 'a))
(define-syntax (m stx)
see-button)
(m))
> (require 'b)
0
By the way, what happens to the see-button
that’s bound at phase 0?
Its #'button
binding has likewise been shifted, but to phase -1. Since
button
itself isn’t bound at phase -1, if we try to evaluate
see-button
at phase 0, we get an error. In other words, we haven’t
permanently cured our mismatch problem—we’ve just shifted it to a less
bothersome location.
> (module a racket
(define button 0)
(define see-button (syntax-shift-phase-level #'button -1))
(provide see-button))
> (module b racket
(require 'a (for-syntax 'a))
(define-syntax (m stx)
see-button)
(m))
> (module b2 racket
(require 'a)
(eval see-button))
> (require 'b2)
button: undefined;
cannot reference an identifier before its definition
in module: top-level
Mismatches like the one above can also arise when a macro tries to match
literal bindings—using syntax-case
or syntax-parse
.
> (module x racket
(require (for-syntax syntax/parse)
(for-template racket/base))
(provide (all-defined-out))
(define button 0)
(define (make) #'button)
(define-syntax (process stx)
(define-literal-set locals (button))
(syntax-parse stx
[(_ (n (~literal button))) #'#''ok])))
> (module y racket
(require (for-meta 1 'x)
(for-meta 2 'x racket/base))
(begin-for-syntax
(define-syntax (m stx)
(with-syntax ([out (make)])
#'(process (0 out)))))
(define-syntax (p stx)
(m))
(p))
eval:2.0: process: expected the identifier `button'
at: button
in: (process (0 button))
In this example, make
is being used in y
at phase level 2, and it
returns the #'button
syntax object—which refers to button
bound at
phase level 0 inside x
and at phase level 2 in y
from (for-meta 2 'x)
. The process
macro is imported at phase level 1 from (for-meta 1 'x)
, and it knows that button
should be bound at phase level 1.
When the syntax-parse
is executed inside process
, it is looking for
button
bound at phase level 1 but it sees only a phase level 2 binding
and doesn’t match.
To fix the example, we can provide make
at phase level 1 relative to
x
, and then we import it at phase level 1 in y
:
> (module x racket
(require (for-syntax syntax/parse)
(for-template racket/base))
(provide (all-defined-out))
(define button 0)
(provide (for-syntax make))
(define-for-syntax (make) #'button)
(define-syntax (process stx)
(define-literal-set locals (button))
(syntax-parse stx
[(_ (n (~literal button))) #'#''ok])))
> (module y racket
(require (for-meta 1 'x)
(for-meta 2 racket/base))
(begin-for-syntax
(define-syntax (m stx)
(with-syntax ([out (make)])
#'(process (0 out)))))
(define-syntax (p stx)
(m))
(p))
> (require 'y)
'ok
7. Syntax Taints
A use of a macro can expand into a use of an identifier that is not exported from the module that binds the macro. In general, such an identifier must not be extracted from the expanded expression and used in a different context, because using the identifier in a different context may break invariants of the macro’s module.
For example, the following module exports a macro go
that expands to a
use of unchecked-go
:
"m.rkt"
#lang racket
(provide go)
(define (unchecked-go n x)
; to avoid disaster, n must be a number
(+ n 17))
(define-syntax (go stx)
(syntax-case stx ()
[(_ x)
#'(unchecked-go 8 x)]))
If the reference to unchecked-go
is extracted from the expansion of
(go 'a)
, then it might be inserted into a new expression,
(unchecked-go #f 'a)
, leading to disaster. The datum->syntax
procedure can be used similarly to construct references to an unexported
identifier, even when no macro expansion includes a reference to the
identifier.
To prevent such abuses of unexported identifiers, the go
macro must
explicitly protect its expansion by using syntax-protect
:
(define-syntax (go stx)
(syntax-case stx ()
[(_ x)
(syntax-protect #'(unchecked-go 8 x))]))
The syntax-protect
function causes any syntax object that is extracted
from the result of go
to be tainted. The macro expander rejects
tainted identifiers, so attempting to extract unchecked-go
from the
expansion of (go 'a)
produces an identifier that cannot be used to
construct a new expression (or, at least, not one that the macro
expander will accept). The syntax-rules
, syntax-id-rule
, and
define-syntax-rule
forms automatically protect their expansion
results.
More precisely, syntax-protect
arms a syntax object with a dye
pack. When a syntax object is armed, then syntax-e
taints any syntax
object in its result. Similarly, datum->syntax
taints its result when
its first argument is armed. Finally, if any part of a quoted syntax
object is armed, then the corresponding part is tainted in the resulting
syntax constant.
Of course, the macro expander itself must be able to disarm a taint on
a syntax object, so that it can further expand an expression or its
sub-expressions. When a syntax object is armed with a dye pack, the dye
pack has an associated inspector that can be used to disarm the dye
pack. A (syntax-protect stx)
function call is actually a shorthand for
(syntax-arm stx #f #t)
, which arms stx
using a suitable inspector.
The expander uses syntax-disarm
and with its inspector on every
expression before trying to expand or compile it.
In much the same way that the macro expander copies properties from a
syntax transformer’s input to its output see \[missing\]
, the
expander copies dye packs from a transformer’s input to its output.
Building on the previous example,
"n.rkt"
#lang racket
(require "m.rkt")
(provide go-more)
(define y 'hello)
(define-syntax (go-more stx)
(syntax-protect #'(go y)))
the expansion of (go-more)
introduces a reference to the unexported
y
in (go y)
, and the expansion result is armed so that y
cannot be
extracted from the expansion. Even if go
did not use syntax-protect
for its result (perhaps because it does not need to protect
unchecked-go
after all), the dye pack on (go y)
is propagated to
the final expansion (unchecked-go 8 y)
. The macro expander uses
syntax-rearm
to propagate dye packs from a transformer’s input to its
output.
7.1. Tainting Modes
In some cases, a macro implementor intends to allow limited
destructuring of a macro result without tainting the result. For
example, given the following define-like-y
macro,
"q.rkt"
#lang racket
(provide define-like-y)
(define y 'hello)
(define-syntax (define-like-y stx)
(syntax-case stx ()
[(_ id) (syntax-protect #'(define-values (id) y))]))
someone may use the macro in an internal definition:
(let ()
(define-like-y x)
x)
The implementor of the "q.rkt"
module most likely intended to allow
such uses of define-like-y
. To convert an internal definition into a
letrec
binding, however, the define
form produced by define-like-y
must be deconstructed, which would normally taint both the binding x
and the reference to y
.
Instead, the internal use of define-like-y
is allowed, because
syntax-protect
treats specially a syntax list that begins with
define-values
. In that case, instead of arming the overall expression,
each individual element of the syntax list is armed, pushing dye packs
further into the second element of the list so that they are attached to
the defined identifiers. Thus, define-values
, x
, and y
in the
expansion result (define-values (x) y)
are individually armed, and the
definition can be deconstructed for conversion to letrec
.
Just like syntax-protect
, the expander rearms a transformer result
that starts with define-values
, by pushing dye packs into the list
elements. As a result, define-like-y
could have been implemented to
produce (define id y)
, which uses define
instead of define-values
.
In that case, the entire define
form is at first armed with a dye
pack, but as the define
form is expanded to define-values
, the dye
pack is moved to the parts.
The macro expander treats syntax-list results starting with
define-syntaxes
in the same way that it treats results starting with
define-values
. Syntax-list results starting with begin
are treated
similarly, except that the second element of the syntax list is treated
like all the other elements (i.e., the immediate element is armed,
instead of its content). Furthermore, the macro expander applies this
special handling recursively, in case a macro produces a begin
form
that contains nested define-values
forms.
The default application of dye packs can be overridden by attaching a
'taint-mode
property see \[missing\]
to the resulting syntax
object of a macro transformer. If the property value is 'opaque
, then
the syntax object is armed and not its parts. If the property value is
'transparent
, then the syntax object’s parts are armed. If the
property value is 'transparent-binding
, then the syntax object’s parts
and the sub-parts of the second part (as for define-values
and
define-syntaxes
) are armed. The 'transparent
and
'transparent-binding
modes trigger recursive property checking at the
parts, so that armings can be pushed arbitrarily deeply into a
transformer’s result.
7.2. Taints and Code Inspectors
Tools that are intended to be privileged (such as a debugging transformer) must disarm dye packs in expanded programs. Privilege is granted through code inspectors. Each dye pack records an inspector, and a syntax object can be disarmed using a sufficiently powerful inspector.
When a module is declared, the declaration captures the current value of
the current-code-inspector
parameter. The captured inspector is used
when syntax-protect
is applied by a macro transformer that is defined
within the module. A tool can disarm the resulting syntax object by
supplying syntax-disarm
with an inspector that is the same or a
super-inspector of the module’s inspector. Untrusted code is ultimately
run after setting current-code-inspector
to a less powerful inspector
after trusted code, such as debugging tools, have been loaded
.
With this arrangement, macro-generating macros require some care, since
the generating macro may embed syntax objects in the generated macro
that need to have the generating module’s protection level, rather than
the protection level of the module that contains the generated macro. To
avoid this problem, use the module’s declaration-time inspector, which
is accessible as (variable-reference->module-declaration-inspector (#%variable-reference))
, and use it to define a variant of
syntax-protect
.
For example, suppose that the go
macro is implemented through a macro:
#lang racket
(provide def-go)
(define (unchecked-go n x)
(+ n 17))
(define-syntax (def-go stx)
(syntax-case stx ()
[(_ go)
(protect-syntax
#'(define-syntax (go stx)
(syntax-case stx ()
[(_ x)
(protect-syntax #'(unchecked-go 8 x))])))]))
When def-go
is used inside another module to define go
, and when the
go
-defining module is at a different protection level than the
def-go
-defining module, the generated macro’s use of protect-syntax
is not right. The use of unchecked-go
should be protected at the
level of the def-go
-defining module, not the go
-defining module.
The solution is to define and use go-syntax-protect
, instead:
#lang racket
(provide def-go)
(define (unchecked-go n x)
(+ n 17))
(define-for-syntax go-syntax-protect
(let ([insp (variable-reference->module-declaration-inspector
(#%variable-reference))])
(lambda (stx) (syntax-arm stx insp))))
(define-syntax (def-go stx)
(syntax-case stx ()
[(_ go)
(protect-syntax
#'(define-syntax (go stx)
(syntax-case stx ()
[(_ x)
(go-syntax-protect #'(unchecked-go 8 x))])))]))
7.3. Protected Exports
Sometimes, a module needs to export bindings to some modules—other
modules that are at the same trust level as the exporting module—but
prevent access from untrusted modules. Such exports should use the
protect-out
form in provide
. For example, ffi/unsafe
exports all
of its unsafe bindings as protected in this sense.
Code inspectors, again, provide the mechanism for determining which
modules are trusted and which are untrusted. When a module is declared,
the value of current-code-inspector
is associated to the module
declaration. When a module is instantiated (i.e., when the body of the
declaration is actually executed), a sub-inspector is created to guard
the module’s exports. Access to the module’s protected exports requires
a code inspector higher in the inspector hierarchy than the module’s
instantiation inspector; note that a module’s declaration inspector is
always higher than its instantiation inspector, so modules are declared
with the same code inspector can access each other’s exports.
Syntax-object constants within a module, such as literal identifiers in a template, retain the inspector of their source module. In this way, a macro from a trusted module can be used within an untrusted module, and protected identifiers in the macro expansion still work, even through they ultimately appear in an untrusted module. Naturally, such identifiers should be armed, so that they cannot be extracted from the macro expansion and abused by untrusted code.
Compiled code from a ".zo"
file is inherently untrustworthy,
unfortunately, since it can be synthesized by means other than
compile
. When compiled code is written to a ".zo"
file,
syntax-object constants within the compiled code lose their inspectors.
All syntax-object constants within compiled code acquire the enclosing
module’s declaration-time inspector when the code is loaded.