A choice pattern seems to throw off the parser if followed later by a zero-or-more quantified pattern #18

Closed
opened 5 years ago by bkovitz · 8 comments
bkovitz commented 5 years ago (Migrated from github.com)

In the grammar below, once a THING containing the case marked by the comment has matched, the grammar returns an error on the next THING.

#lang brag

start : elem*

elem : THING "(" arg ("," arg)* ")"

arg : IDENTIFIER
    | IDENTIFIER ":" IDENTIFIER   ; Matching this case seems to throw
                                  ; everything off, but only if the
                                  ; ("," arg)* clause is included in elem.

I can't figure out how to attach a file in GitHub, but you can reproduce the error by pasting the text above into a file called bug.brag and the text below into a file called bug.rkt and running the latter.

#lang debug racket

(require brag/support br-parser-tools/lex
         (only-in "bug.brag" parse))

(define (tokenize ip)
  (port-count-lines! ip)
  (define my-lexer
    (lexer-src-pos
      [(char-set "(),:") lexeme]  ; single-character tokens
      ["thing" 'THING]
      [(:+ alphabetic) (token 'IDENTIFIER (string->symbol lexeme))]
      [whitespace (return-without-pos (my-lexer ip))]))
  (λ () #R (my-lexer ip)))

(define (p str)
  (parse (tokenize (open-input-string str))))

(define t0 "thing(a)")                      ; This works.
(define t1 "thing(a) thing(b)")             ; This works.
(define t2 "thing(a : Integer)")            ; This works.
(define t3 "thing(a) thing(b : Integer)")   ; This works.
(define t4 "thing(a : Integer) thing(b)")   ; This doesn't work. The parser
                                            ; says that the second "thing" is
                                            ; an error.
(p t0)
(p t1)
(p t2)
(p t3)
(p t4)
In the grammar below, once a `THING` containing the case marked by the comment has matched, the grammar returns an error on the next `THING`. ``` #lang brag start : elem* elem : THING "(" arg ("," arg)* ")" arg : IDENTIFIER | IDENTIFIER ":" IDENTIFIER ; Matching this case seems to throw ; everything off, but only if the ; ("," arg)* clause is included in elem. ``` I can't figure out how to attach a file in GitHub, but you can reproduce the error by pasting the text above into a file called `bug.brag` and the text below into a file called `bug.rkt` and running the latter. ``` #lang debug racket (require brag/support br-parser-tools/lex (only-in "bug.brag" parse)) (define (tokenize ip) (port-count-lines! ip) (define my-lexer (lexer-src-pos [(char-set "(),:") lexeme] ; single-character tokens ["thing" 'THING] [(:+ alphabetic) (token 'IDENTIFIER (string->symbol lexeme))] [whitespace (return-without-pos (my-lexer ip))])) (λ () #R (my-lexer ip))) (define (p str) (parse (tokenize (open-input-string str)))) (define t0 "thing(a)") ; This works. (define t1 "thing(a) thing(b)") ; This works. (define t2 "thing(a : Integer)") ; This works. (define t3 "thing(a) thing(b : Integer)") ; This works. (define t4 "thing(a : Integer) thing(b)") ; This doesn't work. The parser ; says that the second "thing" is ; an error. (p t0) (p t1) (p t2) (p t3) (p t4) ```
bkovitz commented 5 years ago (Migrated from github.com)

Here's a workaround:

#lang brag

start : elem*

elem : THING "(" arg ")"
     | THING "(" arg ("," arg)+ ")"   ; In lieu of * (i.e. zero-or-more)

arg : IDENTIFIER
    | IDENTIFIER ":" IDENTIFIER
Here's a workaround: ``` #lang brag start : elem* elem : THING "(" arg ")" | THING "(" arg ("," arg)+ ")" ; In lieu of * (i.e. zero-or-more) arg : IDENTIFIER | IDENTIFIER ":" IDENTIFIER ```
mbutterick commented 5 years ago (Migrated from github.com)

Does this grammar fix the problem for you?

#lang brag

start : elem*

elem : THING "(" arg ("," arg)* ")"

arg : IDENTIFIER [":" IDENTIFIER]
Does this grammar fix the problem for you? ``` #lang brag start : elem* elem : THING "(" arg ("," arg)* ")" arg : IDENTIFIER [":" IDENTIFIER] ```
mbutterick commented 5 years ago (Migrated from github.com)

(If so it doesn’t negate the possibility of a bug, but I am interested in collecting information about its behavior)

(If so it doesn’t negate the possibility of a bug, but I am interested in collecting information about its behavior)
bkovitz commented 5 years ago (Migrated from github.com)

Nope, same error.

Nope, same error.
mbutterick commented 5 years ago (Migrated from github.com)

That’s strange, because it does fix the parse error for me. In any case, I can reproduce the original error (and have simplified it further). Also, I have reproduced it in the original ragg package that brag is based on, so it may take a little excavation to sort out.

That’s strange, because it does fix the parse error for me. In any case, I can reproduce the original error (and have simplified it further). Also, I have reproduced it in the original `ragg` package that `brag` is based on, so it may take a little excavation to sort out.
mbutterick commented 5 years ago (Migrated from github.com)

Making a note of the minimal error case:

#lang br

@module/lang[parser]{
#lang brag
foo : ( (X | X Y) A* )*
}

(require 'parser)

(parse (list "X" "Y" "X"))
Making a note of the minimal error case: ``` #lang br @module/lang[parser]{ #lang brag foo : ( (X | X Y) A* )* } (require 'parser) (parse (list "X" "Y" "X")) ```
bkovitz commented 5 years ago (Migrated from github.com)

I just tried your alternate grammar with arg : IDENTIFIER [":" IDENTIFIER] again and it does work. Sorry, I must have done something wrong when I tried it the first time.

I just tried your alternate grammar with `arg : IDENTIFIER [":" IDENTIFIER]` again and it does work. Sorry, I must have done something wrong when I tried it the first time.
mbutterick commented 5 years ago (Migrated from github.com)

I changed the way the * quantifier works, which I believe fixes the problem.

I changed the way the `*` quantifier works, which I believe fixes the problem.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mbutterick/brag#18
Loading…
There is no content yet.