brag cuts literal when it is not instructed to #8

Closed
opened 7 years ago by v-nys · 3 comments
v-nys commented 7 years ago (Migrated from github.com)

Hi,

I'm currently writing a grammar for a Prolog-like language. It contains the following rule: abstract-list : /"[" [abstract-term ("," abstract-term)* ["|" (abstract-list | abstract-variable)]] /"]"
If I use this rule to parse a larger data structure which contains an abstract list (for instance, (mycompound([g1,g2]))), the comma disappears from the parsed datum. If I put in a vertical bar instead, that is included in the parsed datum.

So, the whole parse tree, when I use a comma:

(at
  (treelabel
    ((abstract-atom "mycompound" (abstract-list (abstract-g-variable 1) (abstract-g-variable 2))))))

And the parse tree, when I use a bar:

(at
  (treelabel
    ((abstract-atom "mycompound" (abstract-list (abstract-g-variable 1) "|" (abstract-g-variable 2))))))

I figured this might have something to do with the cut also being applied in nested terms, so I rewrote the rule as follows:

abstract-list : /"[" [abstract-term term-tail ["|" (abstract-list | abstract-variable)]] /"]"
@term-tail : ("," abstract-term)*

But now the comma is included, so I get:

(at
  (treelabel
    ((abstract-atom "mycompound" (abstract-list (abstract-g-variable 1) "," (abstract-g-variable 2))))))

The latter is the result I want, but fail to see why the original rule and the one with the term-tail spliced in lead to a different outcome.

Hi, I'm currently writing a grammar for a Prolog-like language. It contains the following rule: `abstract-list : /"[" [abstract-term ("," abstract-term)* ["|" (abstract-list | abstract-variable)]] /"]"` If I use this rule to parse a larger data structure which contains an abstract list (for instance, `(mycompound([g1,g2]))`), the comma disappears from the parsed datum. If I put in a vertical bar instead, that *is* included in the parsed datum. So, the whole parse tree, when I use a comma: (at (treelabel ((abstract-atom "mycompound" (abstract-list (abstract-g-variable 1) (abstract-g-variable 2)))))) And the parse tree, when I use a bar: (at (treelabel ((abstract-atom "mycompound" (abstract-list (abstract-g-variable 1) "|" (abstract-g-variable 2)))))) I figured this might have something to do with the cut also being applied in nested terms, so I rewrote the rule as follows: abstract-list : /"[" [abstract-term term-tail ["|" (abstract-list | abstract-variable)]] /"]" @term-tail : ("," abstract-term)* But now the comma is included, so I get: (at (treelabel ((abstract-atom "mycompound" (abstract-list (abstract-g-variable 1) "," (abstract-g-variable 2)))))) The latter is the result I want, but fail to see why the original rule and the one with the `term-tail` spliced in lead to a different outcome.
mbutterick commented 7 years ago (Migrated from github.com)

Here’s my attempt to make a test case that reproduces the problem:

parser.rkt

#lang brag
abstract-list : /"[" [abstract-term ("," abstract-term)* ["|" (abstract-list | abstract-variable)]] /"]"
abstract-term : "t"
abstract-variable : "v"

test.rkt

#lang br
(require "parser.rkt" brag/support)

(define (lex p)
  (λ () ((lexer
          [(eof) eof]
          [any-char lexeme]) p)))

(parse-to-datum (lex (open-input-string "[t,t]")))

Result:

'(abstract-list (abstract-term "t") "," (abstract-term "t"))

The comma is there. What am I missing?

Here’s my attempt to make a test case that reproduces the problem: `parser.rkt` ```racket #lang brag abstract-list : /"[" [abstract-term ("," abstract-term)* ["|" (abstract-list | abstract-variable)]] /"]" abstract-term : "t" abstract-variable : "v" ``` `test.rkt` ```racket #lang br (require "parser.rkt" brag/support) (define (lex p) (λ () ((lexer [(eof) eof] [any-char lexeme]) p))) (parse-to-datum (lex (open-input-string "[t,t]"))) ``` Result: ```racket '(abstract-list (abstract-term "t") "," (abstract-term "t")) ``` The comma is there. What am I missing?
v-nys commented 7 years ago (Migrated from github.com)

I whittled down my code until the comma appeared and what I found was that the comma is missing when I use the following lexer, grammar and test:

#lang br
(require brag/support
              syntax/strip-context)
(define at-lexer
  (lexer-srcloc
   [(eof) (return-without-srcloc eof)]
   [(:+ whitespace) (token lexeme #:skip? #t)]
   [(:-
     (:seq (char-range "a" "z") (:* (:or (char-range "a" "z") (char-range "A" "Z") numeric  "_")))
     (:seq "g" (:+ numeric)))
    (token 'SYMBOL lexeme)]
   [(:or "(" ")" "[" "]" ",") (token lexeme lexeme)]
   [(:seq "g" (:+ numeric)) (token 'AVAR-G (string->number (substring lexeme 1)))]))
(provide at-lexer)

#lang brag
abstract-atom : SYMBOL [/"(" abstract-term (/"," abstract-term)* /")"]
@abstract-term : abstract-g-variable | abstract-list
abstract-g-variable : AVAR-G
abstract-list : /"[" [abstract-term ("," abstract-term)* ["|" (abstract-list | abstract-g-variable)]] /"]"

#lang br
(require brag/support rackunit "at-parser.rkt" "at-tokenizer.rkt")
 (check-equal?
  (parse-to-datum (apply-tokenizer make-tokenizer "myatom([g1,g2])"))
  '(abstract-atom "myatom" (abstract-list (abstract-g-variable 1) "," (abstract-g-variable 2))))

If I strip away the atom and try to parse "[g1,g2]", then the comma is there.

I whittled down my code until the comma appeared and what I found was that the comma is missing when I use the following lexer, grammar and test: #lang br (require brag/support syntax/strip-context) (define at-lexer (lexer-srcloc [(eof) (return-without-srcloc eof)] [(:+ whitespace) (token lexeme #:skip? #t)] [(:- (:seq (char-range "a" "z") (:* (:or (char-range "a" "z") (char-range "A" "Z") numeric "_"))) (:seq "g" (:+ numeric))) (token 'SYMBOL lexeme)] [(:or "(" ")" "[" "]" ",") (token lexeme lexeme)] [(:seq "g" (:+ numeric)) (token 'AVAR-G (string->number (substring lexeme 1)))])) (provide at-lexer) #lang brag abstract-atom : SYMBOL [/"(" abstract-term (/"," abstract-term)* /")"] @abstract-term : abstract-g-variable | abstract-list abstract-g-variable : AVAR-G abstract-list : /"[" [abstract-term ("," abstract-term)* ["|" (abstract-list | abstract-g-variable)]] /"]" #lang br (require brag/support rackunit "at-parser.rkt" "at-tokenizer.rkt") (check-equal? (parse-to-datum (apply-tokenizer make-tokenizer "myatom([g1,g2])")) '(abstract-atom "myatom" (abstract-list (abstract-g-variable 1) "," (abstract-g-variable 2)))) If I strip away the atom and try to parse `"[g1,g2]"`, then the comma is there.
mbutterick commented 7 years ago (Migrated from github.com)

Thanks. I found the problem & fixed it.

Thanks. I found the problem & fixed it.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mbutterick/beautiful-racket#8
Loading…
There is no content yet.