Extended syntax in Pollen? (e.g. begin/end) #147

What capability is there for extending the Pollen syntax; e.g. adding LaTeX-style ◊begin{...}/◊end{...} syntax:

◊begin{verse}
In Xanadu did Kubla Khan 
A stately pleasure-dome decree: 
Where Alph, the sacred river, ran 
Through caverns measureless to man 
   Down to a sunless sea.
◊end{verse}

versus

◊verse{
In Xanadu did Kubla Khan 
A stately pleasure-dome decree: 
Where Alph, the sacred river, ran 
Through caverns measureless to man 
   Down to a sunless sea.
}

Or using more Unicode characters to do create a more readable itemize syntax:

◊begin{itemize}
 • first;
 • second;
 • penultimate;
 • final.
◊end{itemize}

instead of

◊itemize{
    ◊item{first;}
    ◊item{second;}
    ◊item{penultimate;}
    ◊item{final.}
}

What capability is there for extending the Pollen syntax; e.g. adding LaTeX-style `◊begin{...}/◊end{...}` syntax: ``` ◊begin{verse} In Xanadu did Kubla Khan A stately pleasure-dome decree: Where Alph, the sacred river, ran Through caverns measureless to man Down to a sunless sea. ◊end{verse} ``` _versus_ ``` ◊verse{ In Xanadu did Kubla Khan A stately pleasure-dome decree: Where Alph, the sacred river, ran Through caverns measureless to man Down to a sunless sea. } ``` Or using more Unicode characters to do create a more readable `itemize` syntax: ``` ◊begin{itemize} • first; • second; • penultimate; • final. ◊end{itemize} ``` instead of ``` ◊itemize{ ◊item{first;} ◊item{second;} ◊item{penultimate;} ◊item{final.} } ```

I wouldn’t call this extending the syntax, exactly — you’d just be using decode-elements to pick out the begin and end markers, and gather the elements between, for instance —

#lang pollen/mode racket
(require pollen/tag pollen/decode)

(define start (default-tag-function 'start))
(define end (default-tag-function 'end))

(define (listify elems)
  `(ul ,@(for/list ([elem (in-list elems)]
                    #:when (and (string? elem) (string-prefix? elem "• ")))
                   `(li ,(string-trim elem "• " #:right? #f)))))

(define (group-itemize-elements elems)
  (match-define-values
   (new-elems _ _)
   (for/fold ([elems null]
              [list-elems null]
              [found-itemize? #f])
             ([elem (in-list elems)])
     (cond
       [(equal? elem '(start "itemize"))
        (values elems list-elems #t)]
       [(equal? elem '(end "itemize"))
        (values (cons (listify (reverse list-elems)) elems) null #f)]
       [found-itemize?
        (values elems (cons elem list-elems) found-itemize?)]
       [else
        (values (cons elem elems) list-elems found-itemize?)])))
  (reverse new-elems))

(define-tag-function (root attrs elems)
  `(root ,attrs ,@(decode-elements elems
                                  #:txexpr-elements-proc group-itemize-elements)))

◊root{
      Not in the list.
          ◊start{itemize}
          • first;
          • second;
          • penultimate;
          • final.
          ◊end{itemize}
          Also not in the list.}

Result:

'(root
  ()
  ("Not in the list."
   "\n"
   "    "
   (ul (li "final.") (li "penultimate;") (li "second;") (li "first;"))
   "\n"
   "    "
   "Also not in the list."))

For another approach using whitespace instead of ◊item, see detect-list-items in the pollen-tfl sample project.

I wouldn’t call this extending the syntax, exactly — you’d just be using `decode-elements` to pick out the `begin` and `end` markers, and gather the elements between, for instance — ```racket #lang pollen/mode racket (require pollen/tag pollen/decode) (define start (default-tag-function 'start)) (define end (default-tag-function 'end)) (define (listify elems) `(ul ,@(for/list ([elem (in-list elems)] #:when (and (string? elem) (string-prefix? elem "• "))) `(li ,(string-trim elem "• " #:right? #f))))) (define (group-itemize-elements elems) (match-define-values (new-elems _ _) (for/fold ([elems null] [list-elems null] [found-itemize? #f]) ([elem (in-list elems)]) (cond [(equal? elem '(start "itemize")) (values elems list-elems #t)] [(equal? elem '(end "itemize")) (values (cons (listify (reverse list-elems)) elems) null #f)] [found-itemize? (values elems (cons elem list-elems) found-itemize?)] [else (values (cons elem elems) list-elems found-itemize?)]))) (reverse new-elems)) (define-tag-function (root attrs elems) `(root ,attrs ,@(decode-elements elems #:txexpr-elements-proc group-itemize-elements))) ◊root{ Not in the list. ◊start{itemize} • first; • second; • penultimate; • final. ◊end{itemize} Also not in the list.} ``` Result: ```racket '(root () ("Not in the list." "\n" " " (ul (li "final.") (li "penultimate;") (li "second;") (li "first;")) "\n" " " "Also not in the list.")) ``` For another approach using whitespace instead of `◊item`, see [`detect-list-items`](http://docs.racket-lang.org/pollen-tfl/_pollen_rkt_.html?q=detect-list-items#%28def._%28%28lib._pollen-tfl%2Fpollen..rkt%29._detect-list-items%29%29) in the [`pollen-tfl`](http://docs.racket-lang.org/pollen-tfl) sample project.

Thanks for the speedy reply! I'd thought (with mounting dread) that I'd need to extend Pollen's reader to pull something like is off – so it heartens me to find that it's actually rather easy. Thanks also for the link to the sample project. ;)

BTW you can generalize the start / end detection with a macro, e.g. —

#lang pollen/mode racket
(require pollen/tag pollen/decode)

(define start (default-tag-function 'start))
(define end (default-tag-function 'end))

(define-syntax-rule (define-grouper grouper-id grouper-name grouper-proc)
  (define (grouper-id elems)
    (match-define-values
     (new-elems _ _)
     (for/fold ([elems null]
                [list-elems null]
                [found-itemize? #f])
               ([elem (in-list elems)])
       (cond
         [(equal? elem '(start grouper-name))
          (values elems list-elems #t)]
         [(equal? elem '(end grouper-name))
          (values (cons (grouper-proc (reverse list-elems)) elems) null #f)]
         [found-itemize?
          (values elems (cons elem list-elems) found-itemize?)]
         [else
          (values (cons elem elems) list-elems found-itemize?)])))
    (reverse new-elems)))

(define (listify elems)
  `(ul ,@(for/list ([elem (in-list elems)]
                    #:when (and (string? elem) (string-prefix? elem "• ")))
                   `(li ,(string-trim elem "• " #:right? #f)))))

(define-grouper group-itemize "itemize" listify)

(define (versify elems) `(verse ,@elems))

(define-grouper group-verse "verse" versify)

(define-tag-function (root attrs elems)
  `(root ,attrs ,@(decode-elements elems
                                  #:txexpr-elements-proc (compose1 group-verse group-itemize))))

◊root{
      ◊start{verse}
            Stately Pleasuredome.
      ◊end{verse}
      Not in the list.
          ◊start{itemize}
          • first;
          • second;
          • penultimate;
          • final.
          ◊end{itemize}
          Also not in the list.}

BTW you can generalize the `start` / `end` detection with a macro, e.g. — ```racket #lang pollen/mode racket (require pollen/tag pollen/decode) (define start (default-tag-function 'start)) (define end (default-tag-function 'end)) (define-syntax-rule (define-grouper grouper-id grouper-name grouper-proc) (define (grouper-id elems) (match-define-values (new-elems _ _) (for/fold ([elems null] [list-elems null] [found-itemize? #f]) ([elem (in-list elems)]) (cond [(equal? elem '(start grouper-name)) (values elems list-elems #t)] [(equal? elem '(end grouper-name)) (values (cons (grouper-proc (reverse list-elems)) elems) null #f)] [found-itemize? (values elems (cons elem list-elems) found-itemize?)] [else (values (cons elem elems) list-elems found-itemize?)]))) (reverse new-elems))) (define (listify elems) `(ul ,@(for/list ([elem (in-list elems)] #:when (and (string? elem) (string-prefix? elem "• "))) `(li ,(string-trim elem "• " #:right? #f))))) (define-grouper group-itemize "itemize" listify) (define (versify elems) `(verse ,@elems)) (define-grouper group-verse "verse" versify) (define-tag-function (root attrs elems) `(root ,attrs ,@(decode-elements elems #:txexpr-elements-proc (compose1 group-verse group-itemize)))) ◊root{ ◊start{verse} Stately Pleasuredome. ◊end{verse} Not in the list. ◊start{itemize} • first; • second; • penultimate; • final. ◊end{itemize} Also not in the list.} ```

BTW2 the disadvantage of this approach vs. using tag functions is that it’s less composable. Meaning, when ◊itemize is a tag function, it can be used anywhere in the document. Whereas with this “flag” approach, ◊itemize relies on cooperation with the root tag (and thus ◊itemize could not be used inside other tags).

If you wanted every ◊start{foo} ··· ◊end{foo} block to be interpreted as ◊foo{···}, no matter where it appears, then you’d need to extend the reader.

BTW2 the disadvantage of this approach vs. using tag functions is that it’s less composable. Meaning, when `◊itemize` is a tag function, it can be used anywhere in the document. Whereas with this “flag” approach, `◊itemize` relies on cooperation with the `root` tag (and thus `◊itemize` could not be used inside other tags). If you wanted every `◊start{foo} ··· ◊end{foo}` block to be interpreted as `◊foo{···}`, no matter where it appears, then you’d need to extend the reader.

Another way to write the loop:

#lang pollen/mode racket
(require pollen/tag pollen/decode)

(define start (default-tag-function 'start))
(define end (default-tag-function 'end))

(define (listify elems)
  `(ul ,@(for/list ([elem (in-list elems)]
                    #:when (and (string? elem) (string-prefix? elem "• ")))
                   `(li ,(string-trim elem "• " #:right? #f)))))

(define (group-itemize-elements elems)
  (let loop ([new-elems null][old-elems elems])
    (cond
      [(null? old-elems) (reverse new-elems)]
      [(equal? (car old-elems) '(start "itemize"))
       (define-values (itemize-elems others)
         (splitf-at (cdr old-elems) (λ (e) (not (equal? e '(end "itemize"))))))
       (loop (cons (listify itemize-elems) new-elems) (cdr others))]
      [else (loop (cons (car old-elems) new-elems) (cdr old-elems))])))

(define-tag-function (root attrs elems)
  `(root ,attrs ,@(decode-elements elems
                                   #:txexpr-elements-proc group-itemize-elements)))

◊root{
      Not in the list.
          ◊start{itemize}
          • first
          • second
          • third
          ◊end{itemize}
          Also not in the list.}

Another way to write the loop: ```racket #lang pollen/mode racket (require pollen/tag pollen/decode) (define start (default-tag-function 'start)) (define end (default-tag-function 'end)) (define (listify elems) `(ul ,@(for/list ([elem (in-list elems)] #:when (and (string? elem) (string-prefix? elem "• "))) `(li ,(string-trim elem "• " #:right? #f))))) (define (group-itemize-elements elems) (let loop ([new-elems null][old-elems elems]) (cond [(null? old-elems) (reverse new-elems)] [(equal? (car old-elems) '(start "itemize")) (define-values (itemize-elems others) (splitf-at (cdr old-elems) (λ (e) (not (equal? e '(end "itemize")))))) (loop (cons (listify itemize-elems) new-elems) (cdr others))] [else (loop (cons (car old-elems) new-elems) (cdr old-elems))]))) (define-tag-function (root attrs elems) `(root ,attrs ,@(decode-elements elems #:txexpr-elements-proc group-itemize-elements))) ◊root{ Not in the list. ◊start{itemize} • first • second • third ◊end{itemize} Also not in the list.} ```

Okay, two questions then:

How hard would it be to add this to the reader? (And would I be able to extend it from outside the pollen source?)
Could the ◊root tag come from outside the source file? I.e., could I load a pollen file into a (root ...) txexpr? (So that the file wouldn't be cluttered with an all-encompassing ◊root{···}.)

Okay, two questions then: 1. How hard would it be to add this to the reader? (And would I be able to extend it from _outside_ the pollen source?) 2. Could the `◊root` tag come from outside the source file? I.e., could I load a pollen file _into_ a `(root ...)` txexpr? (So that the file wouldn't be cluttered with an all-encompassing `◊root{···}`.)

How hard would it be to add this to the reader?

No idea. Pollen relies on the @-reader. Though I have a few minor quibbles with the @-reader, I’ve found it’s wiser to overlook these so that Pollen remains consistent with the @-reader (the same is true of X-expressions).

I don’t know how long you’ve used the @-reader, but if the answer is “not long”, I’d gently recommend giving it a try before attempting surgery. The idea of begin / end blocks goes against the grain of how S-expressions work.

Could the ◊root tag come from outside the source file?

Right, that’s ordinarily how Pollen works — the root wrapper is implicitly added. In this example, I’m just including it for clarity.

> How hard would it be to add this to the reader? No idea. Pollen relies on [the @-reader](https://docs.racket-lang.org/scribble/reader-internals.html). Though I have a few minor quibbles with the @-reader, I’ve found it’s wiser to overlook these so that Pollen remains consistent with the @-reader (the same is true of X-expressions). I don’t know how long you’ve used the @-reader, but if the answer is “not long”, I’d gently recommend giving it a try before attempting surgery. The idea of `begin / end` blocks goes against the grain of how S-expressions work. > Could the ◊root tag come from outside the source file? Right, that’s ordinarily how Pollen works — the `root` wrapper is [implicitly added](https://docs.racket-lang.org/pollen/third-tutorial.html#%28part._.Creating_a_.Pollen_markup_file%29). In this example, I’m just including it for clarity.

BTW here’s an example of why begin / end blocks are difficult to get right — what should happen in this example? Whereas with the @-reader, this can never happen.

◊begin{foo}
···
◊begin{bar}
···
◊end{foo}
···
◊end{bar}

BTW here’s an example of why `begin / end` blocks are difficult to get right — what should happen in this example? Whereas with the @-reader, this can never happen. ```racket ◊begin{foo} ··· ◊begin{bar} ··· ◊end{foo} ··· ◊end{bar} ```

You're right in guessing that I haven't used the @-reader for long! (Though I'm used to S-expressions, coming from a Common Lisp background – it was in fact Pollen which drew me to Racket.) To me, the example you give above should just barf – but given that root is added implicitly anyway (allowing for these sort of additions), that seems like a much cleaner solution than messing with a perfectly good reader.

Thanks for all your help on this! (I don't think I've ever seen such an engaged maintainer…)

You're right in guessing that I haven't used the `@`-reader for long! (Though I'm used to S-expressions, coming from a Common Lisp background – it was in fact Pollen which drew me to Racket.) To me, the example you give above should just barf – but given that `root` is added implicitly anyway (allowing for these sort of additions), that seems like a much cleaner solution than messing with a perfectly good reader. Thanks for all your help on this! (I don't think I've ever seen such an engaged maintainer…)

Unlike many OSS maintainers, 1) I use Pollen every day 2) I make money from it 😉

Labels Milestones

Extended syntax in Pollen? (e.g. begin/end) #147