Rule name identifiers maybe shouldn't have source locations
#34
Open
opened 3 years ago by jackfirth
·
2 comments
Loading…
Reference in New Issue
There is no content yet.
Delete Branch '%!s(<nil>)'
Deleting a branch is permanent. It CANNOT be undone. Continue?
So given a grammar like this:
And an appropriate lexer-based tokenizer, using
(parse path (make-tokenizer port))
produces syntax objects that look like this:All well and good. The source locations are even correct, assuming the lexer uses
lexer-srcloc
. Specifically, the following syntax objects have source locations:program
andstatement
identifiers has a source locationThat last part seems off to me. The
program
identifier gets the same source location as the surrounding(program ...)
syntax object. But the identifier itself is more of an implicitly-inserted thing from the user's perspective, like#%app
or#%datum
.Where this matters to me is that I use the source locations of original syntax objects in my
resyntax
tool to figure out how to copy their original source code text into the refactored output code. So if one of thoseprogram
orstatement
identifiers ends up in the output syntax object of my refactoring tool - perhaps because it was rearranging pieces of the enclosing(program ...)
expression - the tool will duplicate the whole original expression when it tries to figure out how to render the outputprogram
identifier in refactored source code.I think the rule name identifiers shouldn't have any source location information. Maybe they shouldn't even be
syntax-original?
, but that I'm less sure on.Does this happen with
ragg
too, or justbrag
?if
brag
handles source locations in a way that’s contrary to documentation or syntax-object norms, I welcome supporting evidence that this is so. Otherwise I would invoke the existing Racket norm against changing the behavior of a package in a backward-incompatible way.ragg
too.It produces this syntax object:
Both of the
module
andanonymous-module
identifiers have a span of zero and are not original. Theracket/base
identifier and the42
literal each have correct starts and spans, pointing to theracket/base
and42
substrings of#lang racket/base 42
, and they're both original. The#%module-begin
identifier is an odd one: it's not original but it does have a source location that is the same as the enclosing(#%module-begin 42)
form. Due to the way themodule
andanonymous-module
identifiers are handled, I suspect that's just a bug.The whole form has a start position of 7 and a span of 14, pointing to the
racket/base 42
substring, and it is not original because it contains the unoriginalmodule
,anonymous-module
, and#%module-begin
pieces. The(#%module-begin 42)
form also isn't original and it has the same start location and span, which I suspect is another bug since it claims to represent theracket/base 42
substring of the program code but the(#%module-begin 42)
form doesn't actually contain theracket/base
identifier. It should probably only claim to contain the42
substring of the code.It's a bit tricky to say for sure what the "intent" here is because source locations are tricky to produce and mistakes in them are rarely noticed. I think for syntax objects produced by a language's
read-syntax
function, these are some good guidelines:#lang
line is used for the module's initial bindings, it should be original and have a source location.