pollen/scribblings/decode.scrbl

#lang scribble/manual

@(require scribble/eval pollen/decode pollen/world (prefix-in html: pollen/html) txexpr (for-label racket (except-in pollen #%module-begin) pollen/world pollen/cache pollen/decode txexpr xml pollen/html))

@(define my-eval (make-base-eval))
@(my-eval `(require pollen pollen/decode xml racket/list txexpr))


@title{Decode}

@defmodule[pollen/decode]

The @racket[doc] export of a Pollen markup file is a simple X-expression. @italic{Decoding} refers to any post-processing of this X-expression. The @racket[pollen/decode] module provides tools for creating decoders.

The decode step can happen separately from the compilation of the file. But you can also attach a decoder to the markup file's @racket[root] node, so the decoding happens automatically when the markup is compiled, and thus automatically incorporated into @racket[doc]. (Following this approach, you could also attach multiple decoders to different tags within @racket[doc].)

You can, of course, embed function calls within Pollen markup. But since markup is optimized for authors, decoding is useful for operations that can or should be moved out of the authoring layer. 

One example is presentation and layout. For instance, @racket[detect-paragraphs] is a decoder function that lets authors mark paragraphs in their source simply by using two carriage returns. 

Another example is conversion of output into a particular data format. Most Pollen functions are optimized for HTML output, but one could write a decoder that targets another format.


@defproc[
(decode
[tagged-xexpr txexpr?]
[#:txexpr-tag-proc txexpr-tag-proc (txexpr-tag? . -> . txexpr-tag?) (λ(tag) tag)]
[#:txexpr-attrs-proc txexpr-attrs-proc (txexpr-attrs? . -> . txexpr-attrs?) (λ(attrs) attrs)]
[#:txexpr-elements-proc txexpr-elements-proc (txexpr-elements? . -> . txexpr-elements?) (λ(elements) elements)]
[#:block-txexpr-proc block-txexpr-proc (block-txexpr? . -> . (or/c xexpr? (listof xexpr?))) (λ(tx) tx)]
[#:inline-txexpr-proc inline-txexpr-proc (txexpr? . -> . (or/c xexpr? (listof xexpr?))) (λ(tx) tx)]
[#:string-proc string-proc (string? . -> . (or/c xexpr? (listof xexpr?))) (λ(str) str)]
[#:entity-proc entity-proc ((or/c symbol? valid-char?) . -> . (or/c xexpr? (listof xexpr?))) (λ(ent) ent)]
[#:cdata-proc cdata-proc (cdata? . -> . (or/c xexpr? (listof xexpr?))) (λ(cdata) cdata)]
[#:exclude-tags tags-to-exclude (listof txexpr-tag?) null]
[#:exclude-attrs attrs-to-exclude txexpr-attrs? null]
)
(or/c xexpr/c (listof xexpr/c))]
Recursively process a @racket[_tagged-xexpr], usually the one exported from a Pollen source file as @racket[doc]. 

This function doesn't do much on its own. Rather, it provides the hooks upon which harder-working functions can be hung. 

Recall that in Pollen, all @secref["tags-are-functions"]. By default, the @racket[_tagged-xexpr] from a source file is tagged with @racket[root]. So the typical way to use @racket[decode] is to attach your decoding functions to it, and then define @racket[root] to invoke your @racket[decode] function. Then it will be automatically applied to every @racket[doc] during compile. 

For instance, here's how @racket[decode] is attached to @racket[root] in @link["http://practicaltypography.com"]{@italic{Butterick's Practical Typography}}. There's not much to it —

@racketblock[
(define (root . items)
  (decode (make-txexpr 'root '() items)
          #:txexpr-elements-proc detect-paragraphs
          #:block-txexpr-proc (compose1 hyphenate wrap-hanging-quotes)
          #:string-proc (compose1 smart-quotes smart-dashes)
          #:exclude-tags '(style script)))
          ]

@margin-note{The @racket[hyphenate] function is not part of Pollen, but rather the @link["http://github.com/mbutterick/hyphenate"]{@racket[hyphenate] package}, which you can install separately.}

This illustrates another important point: even though @racket[decode] presents an imposing list of arguments, you're unlikely to use all of them at once. These represent possibilities, not requirements. For instance, let's see what happens when @racket[decode] is invoked without any of its optional arguments.

@examples[#:eval my-eval
(define tx '(root "I wonder" (em "why") "this works."))
(decode tx)
]

Right — nothing. That's because the default value for the decoding arguments is the identity function, @racket[(λ(x)x)]. So all the input gets passed through intact unless another action is specified.

The @racket[_*-proc] arguments of @racket[decode] take procedures that are applied to specific categories of elements within @racket[_txexpr].

The @racket[_txexpr-tag-proc] argument is a procedure that handles X-expression tags.

@examples[#:eval my-eval
(define tx '(p "I'm from a strange" (strong "namespace")))
(code:comment @#,t{Tags are symbols, so a tag-proc should return a symbol})
(decode tx #:txexpr-tag-proc (λ(t) (string->symbol (format "ns:~a" t))))
]

The @racket[_txexpr-attrs-proc] argument is a procedure that handles lists of X-expression attributes. (The @racketmodname[txexpr] module, included at no extra charge with Pollen, includes useful helper functions for dealing with these attribute lists.)

@examples[#:eval my-eval
(define tx '(p [[id "first"]] "If I only had a brain."))
(code:comment @#,t{Attrs is a list, so cons is OK for simple cases})
(decode tx #:txexpr-attrs-proc (λ(attrs) (cons '[class "PhD"] attrs )))
]

Note that @racket[_txexpr-attrs-proc] will change the attributes of every tagged X-expression, even those that don't have attributes. This is useful, because sometimes you want to add attributes where none existed before. But be careful, because the behavior may make your processing function overinclusive.

@examples[#:eval my-eval
(define tx '(div (p [[id "first"]] "If I only had a brain.") 
(p "Me too.")))
(code:comment @#,t{This will insert the new attribute everywhere})
(decode tx #:txexpr-attrs-proc (λ(attrs) (cons '[class "PhD"] attrs )))
(code:comment @#,t{This will add the new attribute only to non-null attribute lists})
(decode tx #:txexpr-attrs-proc 
(λ(attrs) (if (null? attrs) attrs (cons '[class "PhD"] attrs ))))
]


The @racket[_txexpr-elements-proc] argument is a procedure that operates on the list of elements that represents the content of each tagged X-expression. Note that each element of an X-expression is subject to two passes through the decoder: once now, as a member of the list of elements, and also later, through its type-specific decoder (i.e., @racket[_string-proc], @racket[_entity-proc], and so on).

@examples[#:eval my-eval
(define tx '(div "Double" "\n" "toil" amp "trouble")) 
(code:comment @#,t{Every element gets doubled ...})
(decode tx #:txexpr-elements-proc (λ(es) (append-map (λ(e) (list e e)) es)))
(code:comment @#,t{... but only strings get capitalized})
(decode tx #:txexpr-elements-proc (λ(es) (append-map (λ(e) (list e e)) es))
#:string-proc (λ(s) (string-upcase s)))
]

So why do you need @racket[_txexpr-elements-proc]? Because some types of element decoding depend on context, thus it's necessary to handle the elements as a group. For instance, paragraph detection. The behavior is not merely a @racket[map] across each element, because elements are being removed and altered contextually:

@examples[#:eval my-eval
(define (paras tx) (decode tx #:txexpr-elements-proc detect-paragraphs))
(code:comment @#,t{Context matters. Trailing whitespace is ignored ...})
(paras '(body "The first paragraph." "\n\n")) 
(code:comment @#,t{... but whitespace between strings is converted to a break.})
(paras '(body "The first paragraph." "\n\n" "And another.")) 
(code:comment @#,t{A combination of both types})
(paras '(body "The first paragraph." "\n\n" "And another." "\n\n")) 
]


The @racket[_block-txexpr-proc] argument and the @racket[_inline-txexpr-proc] arguments are procedures that operate on tagged X-expressions. If the X-expression meets the @racket[block-txexpr?] test, it's processed by @racket[_block-txexpr-proc]. Otherwise, it's inline, so it's processed by @racket[_inline-txexpr-proc]. (Careful, however — these aren't mutually exclusive, because @racket[_block-txexpr-proc] operates on all the elements of a block, including other tagged X-expressions within.) 

Of course, if you want block and inline elements to be handled the same way, you can set @racket[_block-txexpr-proc] and @racket[_inline-txexpr-proc] to be the same procedure.

@examples[#:eval my-eval
(define tx '(div "Please" (em "mind the gap") (h1 "Tuesdays only"))) 
(define add-ns (λ(tx) (make-txexpr 
    (string->symbol (format "ns:~a" (get-tag tx))) 
    (get-attrs tx) 
    (get-elements tx))))
(code:comment @#,t{div and h1 are block elements, so this will only affect them})
(decode tx #:block-txexpr-proc add-ns)
(code:comment @#,t{em is an inline element, so this will only affect it})
(decode tx #:inline-txexpr-proc add-ns)
(code:comment @#,t{this will affect all elements})
(decode tx #:block-txexpr-proc add-ns #:inline-txexpr-proc add-ns)
]

The @racket[_string-proc], @racket[_entity-proc], and @racket[_cdata-proc] arguments are procedures that operate on X-expressions that are strings, entities, and CDATA, respectively. Deliberately, the output contracts for these procedures accept any kind of X-expression (meaning, the procedure can change the X-expression type).

@examples[#:eval my-eval
(code:comment @#,t{A div with string, entity, and cdata elements})
(define tx `(div "Moe" amp 62 ,(cdata #f #f "3 > 2;")))
(define rulify (λ(x) '(hr)))
(code:comment @#,t{The rulify function is selectively applied to each})
(print (decode tx #:string-proc rulify))
(print (decode tx #:entity-proc rulify))
(print (decode tx #:cdata-proc rulify))
] 

Note that entities come in two flavors — symbolic and numeric — and @racket[_entity-proc] affects both. If you only want to affect one or the other, you can add a test within @racket[_entity-proc]. Symbolic entities can be detected with @racket[symbol?], and numeric entities with @racket[valid-char?]:

@examples[#:eval my-eval
(define tx `(div amp 62))
(define symbolic-detonate (λ(x) (if (symbol? x) 'BOOM x)))
(print (decode tx #:entity-proc symbolic-detonate))
(define numeric-detonate (λ(x) (if (valid-char? x) 'BOOM x)))
(print (decode tx #:entity-proc numeric-detonate))
] 

The five previous procedures — @racket[_block-txexpr-proc], @racket[_inline-txexpr-proc], @racket[_string-proc], @racket[_entity-proc], and @racket[_cdata-proc] — can return either a single X-expression, or a list of X-expressions, which will be spliced into the parent at the same point.

For instance, earlier we saw how to double elements by using @racket[_txexpr-elements-proc]. But you can accomplish the same thing on a case-by-case basis by returning a list of values:

@examples[#:eval my-eval
(code:comment @#,t{A div with string, entity, and inline-txexpr elements})
(define tx `(div "Axl" amp (span "Slash")))
(define doubler (λ(x) (list x x)))
(code:comment @#,t{The doubler function is selectively applied to each type of element})
(print (decode tx #:string-proc doubler))
(print (decode tx #:entity-proc doubler))
(print (decode tx #:inline-txexpr-proc doubler))
] 

Caution: when returning list values, it's possible to trip over the unavoidable ambiguity between a @racket[txexpr?] and a list of @racket[xexpr?]s that happens to begin with a symbolic entity: 

@examples[#:eval my-eval
(code:comment @#,t{An ambiguous expression})
(define amb '(guitar "player-name"))
(and (txexpr-elements? amb) (txexpr? amb))
(code:comment @#,t{Ambiguity in context})
(define x '(gnr "Izzy" "Slash"))
(define rockit (λ(str) (list 'guitar str)))
(code:comment @#,t{Expecting '(gnr guitar "Izzy" guitar "Slash") from next line,
but return value will be treated as tagged X-expression})
(decode x #:string-proc rockit)
(code:comment @#,t{Changing the order makes it unambiguous})
(define rockit2 (λ(str) (list str 'guitar)))
(decode x #:string-proc rockit2)
] 

The @racket[_tags-to-exclude] argument is a list of tags that will be exempted from decoding. Though you could get the same result by testing the input within the individual decoding functions, that's tedious and potentially slower.

@examples[#:eval my-eval
(define tx '(p "I really think" (em "italics") "should be lowercase."))
(decode tx #:string-proc string-upcase)
(decode tx #:string-proc string-upcase #:exclude-tags '(em))
]

The @racket[_tags-to-exclude] argument is useful if you're decoding source that's destined to become HTML. According to the HTML spec, material within a @racket[<style>] or @racket[<script>] block needs to be preserved literally. In this example, if the CSS and JavaScript blocks are capitalized, they won't work. So exclude @racket['(style script)], and problem solved.

@examples[#:eval my-eval
(define tx '(body (h1 [[class "Red"]] "Let's visit Planet Telex.") 
(style [[type "text/css"]] ".Red {color: green;}")
(script [[type "text/javascript"]] "var area = h * w;")))
(decode tx #:string-proc string-upcase)
(decode tx #:string-proc string-upcase #:exclude-tags '(style script))
]

Finally, the @racket[_attrs-to-exclude] argument works the same way as @racket[_tags-to-exclude], but instead of excluding an element based on its tag, it excludes based on whether the element has a matching attribute/value pair.

@examples[#:eval my-eval
(define tx '(p (span "No attrs") (span ((id "foo")) "One attr")))
(decode tx #:string-proc string-upcase)
(decode tx #:string-proc string-upcase #:exclude-attrs '((id "foo")))
]

@defproc[
(decode-elements
[elements txexpr-elements?]
[#:txexpr-tag-proc txexpr-tag-proc (txexpr-tag? . -> . txexpr-tag?) (λ(tag) tag)]
[#:txexpr-attrs-proc txexpr-attrs-proc (txexpr-attrs? . -> . txexpr-attrs?) (λ(attrs) attrs)]
[#:txexpr-elements-proc txexpr-elements-proc (txexpr-elements? . -> . txexpr-elements?) (λ(elements) elements)]
[#:block-txexpr-proc block-txexpr-proc (block-txexpr? . -> . (or/c xexpr? (listof xexpr?))) (λ(tx) tx)]
[#:inline-txexpr-proc inline-txexpr-proc (txexpr? . -> . (or/c xexpr? (listof xexpr?))) (λ(tx) tx)]
[#:string-proc string-proc (string? . -> . (or/c xexpr? (listof xexpr?))) (λ(str) str)]
[#:entity-proc entity-proc ((or/c symbol? valid-char?) . -> . (or/c xexpr? (listof xexpr?))) (λ(ent) ent)]
[#:cdata-proc cdata-proc (cdata? . -> . (or/c xexpr? (listof xexpr?))) (λ(cdata) cdata)]
[#:exclude-tags tags-to-exclude (listof txexpr-tag?) null]
[#:exclude-attrs attrs-to-exclude txexpr-attrs? null]
)
(or/c xexpr/c (listof xexpr/c))]
Identical to @racket[decode], but takes @racket[txexpr-elements?] as input rather than a whole tagged X-expression, and likewise returns @racket[txexpr-elements?] rather than a tagged X-expression. A convenience variant for use inside tag functions.

@section{Block}

Because it's convenient, Pollen puts tagged X-expressions into two categories: @italic{block} and @italic{inline}. Why is it convenient? When using @racket[decode], you often want to treat the two categories differently. Not that you have to. But this is how you can.

@defparam[project-block-tags block-tags (listof txexpr-tag?)]{
A parameter that defines the set of tags that @racket[decode] will treat as blocks. This parameter is initialized with the HTML block tags, namely:

@code[(format "~a" html:block-tags)]}


@defproc[
(register-block-tag
[tag txexpr-tag?])
void?]
Adds a tag to @racket[project-block-tags] so that @racket[block-txexpr?] will report it as a block, and @racket[decode] will process it with @racket[_block-txexpr-proc] rather than @racket[_inline-txexpr-proc].

Pollen tries to do the right thing without being told. But this is the rare case where you have to be explicit. If you introduce a tag into your markup that you want treated as a block, you @bold{must} use this function to identify it, or you will get spooky behavior later on.

For instance, @racket[detect-paragraphs] knows that block elements in the markup shouldn't be wrapped in a @racket[p] tag. So if you introduce a new block element called @racket[bloq] without registering it as a block, misbehavior will follow:

@examples[#:eval my-eval
(define (paras tx) (decode tx #:txexpr-elements-proc detect-paragraphs))
(paras '(body "I want to be a paragraph." "\n\n" (bloq "But not me."))) 
(code:comment @#,t{Wrong: bloq should not be wrapped})
]

But once you register @racket[bloq] as a block, order is restored:

@examples[#:eval my-eval
(define (paras tx) (decode tx #:txexpr-elements-proc detect-paragraphs))
(register-block-tag 'bloq)
(paras '(body "I want to be a paragraph." "\n\n" (bloq "But not me."))) 
(code:comment @#,t{Right: bloq is treated as a block})
]

If you find the idea of registering block tags unbearable, good news. The @racket[project-block-tags] include the standard HTML block tags by default. So if you just want to use things like @racket[div] and @racket[p] and @racket[h1–h6], you'll get the right behavior for free.

@examples[#:eval my-eval
(define (paras tx) (decode tx #:txexpr-elements-proc detect-paragraphs))
(paras '(body "I want to be a paragraph." "\n\n" (div "But not me."))) 
]


@defproc[
(block-txexpr?
[v any/c])
boolean?]
Predicate that tests whether @racket[_v] is a tagged X-expression, and if so, whether the tag is among the @racket[project-block-tags]. If not, it is treated as inline. To adjust how this test works, use @racket[register-block-tag].


@section{Typography}

An assortment of typography & layout functions, designed to be used with @racket[decode]. These aren't hard to write. So if you like these, use them. If not, make your own.


@defproc[
(whitespace?
[v any/c])
boolean?]
A predicate that returns @racket[#t] for any stringlike @racket[_v] that's entirely whitespace, but also the empty string, as well as lists and vectors that are made only of @racket[whitespace?] members. Following the @racket[regexp-match] convention, @racket[whitespace?] does not return @racket[#t] for a nonbreaking space. If you prefer that behavior, use @racket[whitespace/nbsp?]. 


@examples[#:eval my-eval
(whitespace? "\n\n   ")
(whitespace? (string->symbol "\n\n   "))
(whitespace? "")
(whitespace? '("" "  " "\n\n\n" " \n"))
(define nonbreaking-space (format "~a" #\u00A0))
(whitespace? nonbreaking-space)
]

@defproc[
(whitespace/nbsp?
[v any/c])
boolean?]
Like @racket[whitespace?], but also returns @racket[#t] for nonbreaking spaces.


@examples[#:eval my-eval
(whitespace/nbsp? "\n\n   ")
(whitespace/nbsp? (string->symbol "\n\n   "))
(whitespace/nbsp? "")
(whitespace/nbsp? '("" "  " "\n\n\n" " \n"))
(define nonbreaking-space (format "~a" #\u00A0))
(whitespace/nbsp? nonbreaking-space)
]


@defproc[
(smart-quotes
[str string?])
string?]
Convert straight quotes in @racket[_str] to curly according to American English conventions.

@examples[#:eval my-eval
(define tricky-string 
"\"Why,\" she could've asked, \"are we in O‘ahu watching 'Mame'?\"")
(display tricky-string)
(display (smart-quotes tricky-string))
]

@defproc[
(smart-dashes
[str string?])
string?]
In @racket[_str], convert three hyphens to an em dash, and two hyphens to an en dash, and remove surrounding spaces.

@examples[#:eval my-eval
(define tricky-string "I had a few --- OK, like 6--8 --- thin mints.")
(display tricky-string)
(display (smart-dashes tricky-string))
(code:comment @#,t{Monospaced font not great for showing dashes, but you get the idea})
]


@defproc[
(merge-newlines
[elements (listof xexpr?)])
(listof xexpr?)]
Within @racket[_elements], merge sequential newline characters (@racket["\n"]) into a single whitespace element. Helper function used by @racket[detect-paragraphs].

@examples[#:eval my-eval
(merge-newlines '(p "\n" "\n" "foo" "\n" "\n\n" "bar" 
  (em "\n" "\n" "\n")))]


@defproc[
(detect-linebreaks
[tagged-xexpr-elements (listof xexpr?)]
[#:separator linebreak-sep string? (world:current-linebreak-separator)]
[#:insert linebreak xexpr? '(br)])
(listof xexpr?)]
Within @racket[_tagged-xexpr-elements], convert occurrences of @racket[_linebreak-sep] (@racket["\n"] by default) to @racket[_linebreak], but only if @racket[_linebreak-sep] does not occur between blocks (see @racket[block-txexpr?]). Why? Because block-level elements automatically display on a new line, so adding @racket[_linebreak] would be superfluous. In that case, @racket[_linebreak-sep] just disappears.

@examples[#:eval my-eval
(detect-linebreaks '(div "Two items:" "\n" (em "Eggs") "\n" (em "Bacon")))
(detect-linebreaks '(div "Two items:" "\n" (div "Eggs") "\n" (div "Bacon")))
]

@defproc[
(detect-paragraphs
[elements (listof xexpr?)]
[#:separator paragraph-sep string? (world:current-paragraph-separator)]
[#:tag paragraph-tag symbol? 'p]
[#:linebreak-proc linebreak-proc ((listof xexpr?) . -> . (listof xexpr?)) detect-linebreaks]
[#:force? force-paragraph? boolean? #f])
(listof xexpr?)]
Find paragraphs within @racket[_elements] and wrap them with @racket[_paragraph-tag]. Also handle linebreaks using @racket[detect-linebreaks].

What counts as a paragraph? Any @racket[_elements] that are either a) explicitly set apart with @racket[_paragraph-sep], or b) adjacent to a @racket[block-txexpr?] (in which case the paragraph-ness is implied).

@examples[#:eval my-eval
(detect-paragraphs '("Explicit para" "\n\n" "Explicit para"))
(detect-paragraphs '("Explicit para" "\n\n" "Explicit para" "\n" "Explicit line"))
(detect-paragraphs '("Implied para" (div "Block") "Implied para"))
]

If @racket[_element] is already a block, it will not be wrapped as a paragraph (because in that case, the wrapping would be superfluous). Thus, as a consequence, if @racket[_paragraph-sep] occurs between two blocks, it will be ignored (as in the example below using two sequential @racket[div] blocks.) Likewise, @racket[_paragraph-sep] will also be ignored if it occurs between a block and a non-block (because a paragraph break is already implied).

@examples[#:eval my-eval
(code:comment @#,t{The explicit "\n\n" makes no difference in these cases})
(detect-paragraphs '((div "First block") "\n\n" (div "Second block")))
(detect-paragraphs '((div "First block") (div "Second block")))
(detect-paragraphs '("Para" "\n\n" (div "Block")))
(detect-paragraphs '("Para" (div "Block")))
]

The @racket[_paragraph-tag] argument sets the tag used to wrap paragraphs. 

@examples[#:eval my-eval
(detect-paragraphs '("First para" "\n\n" "Second para") #:tag 'ns:p)
]

The @racket[_linebreak-proc] argument allows you to use a different linebreaking procedure other than the usual @racket[detect-linebreaks].

@examples[#:eval my-eval
(detect-paragraphs '("First para" "\n\n" "Second para" "\n" "Second line")
#:linebreak-proc (λ(x) (detect-linebreaks x #:insert '(newline))))
]

The @racket[#:force?] option will wrap a paragraph tag around @racket[_elements], even if no explicit or implicit paragraph breaks are found. The @racket[#:force?] option is useful for when you want to guarantee that you always get a list of blocks.

@examples[#:eval my-eval
(detect-paragraphs '("This" (span "will not be") "a paragraph"))
(detect-paragraphs '("But this" (span "will be") "a paragraph") #:force? #t)
]

@defproc[
(wrap-hanging-quotes
[tx txexpr?]
[#:single-preprend single-preprender txexpr-tag? 'squo]
[#:double-preprend double-preprender txexpr-tag? 'dquo]
)
txexpr?]
Find single or double quote marks at the beginning of @racket[_tx] and wrap them in an X-expression with the tag @racket[_single-preprender] or @racket[_double-preprender], respectively. The default values are @racket['squo] and @racket['dquo].

@examples[#:eval my-eval
(wrap-hanging-quotes '(p "No quote to hang."))
(wrap-hanging-quotes '(p "“What? We need to hang quotes?”"))
]

In pro typography, quotation marks at the beginning of a line or paragraph are often shifted into the margin slightly to make them appear more optically aligned with the left edge of the text. With a reflowable layout model like HTML, you don't know where your line breaks will be. 

This function will simply insert the @racket['squo] and @racket['dquo] tags, which provide hooks that let you do the actual hanging via CSS, like so (actual measurement can be refined to taste):

@verbatim{squo {margin-left: -0.25em;}
dquo {margin-left: -0.50em;}
}

Be warned: there are many edge cases this function does not handle well.

@examples[#:eval my-eval
(code:comment @#,t{Argh: this edge case is not handled properly})
(wrap-hanging-quotes '(p "“" (em "What?") "We need to hang quotes?”"))
]
-												doc updates

											
										
										
											11 years ago
+								#lang scribble/manual
-												documentation fixes

											
										
										
											11 years ago
+								@(require scribble/eval pollen/decode pollen/world (prefix-in html: pollen/html) txexpr (for-label racket (except-in pollen #%module-begin) pollen/world pollen/cache pollen/decode txexpr xml pollen/html))
-												doc updates

											
										
										
											11 years ago
 								@(define my-eval (make-base-eval))
-												doc updates

											
										
										
											11 years ago
+								@(my-eval `(require pollen pollen/decode xml racket/list txexpr))
-												updates

											
										
										
											11 years ago
-												documentation updates

											
										
										
											11 years ago
+								@title{Decode}
-												doc updates

											
										
										
											11 years ago
 								@defmodule[pollen/decode]
-												updates

											
										
										
											11 years ago
+								The @racket[doc] export of a Pollen markup file is a simple X-expression. @italic{Decoding} refers to any post-processing of this X-expression. The @racket[pollen/decode] module provides tools for creating decoders.
-												updates

											
										
										
											11 years ago
-												updates

											
										
										
											11 years ago
+								The decode step can happen separately from the compilation of the file. But you can also attach a decoder to the markup file's @racket[root] node, so the decoding happens automatically when the markup is compiled, and thus automatically incorporated into @racket[doc]. (Following this approach, you could also attach multiple decoders to different tags within @racket[doc].)
-												updates

											
										
										
											11 years ago
 								You can, of course, embed function calls within Pollen markup. But since markup is optimized for authors, decoding is useful for operations that can or should be moved out of the authoring layer.
-												updates

											
										
										
											11 years ago
+								One example is presentation and layout. For instance, @racket[detect-paragraphs] is a decoder function that lets authors mark paragraphs in their source simply by using two carriage returns.
-												updates

											
										
										
											11 years ago
-												updates

											
										
										
											11 years ago
+								Another example is conversion of output into a particular data format. Most Pollen functions are optimized for HTML output, but one could write a decoder that targets another format.
-												updates

											
										
										
											11 years ago
-												doc updates

											
										
										
											11 years ago
+								@defproc[
 								(decode
 								[tagged-xexpr txexpr?]
 								[#:txexpr-tag-proc txexpr-tag-proc (txexpr-tag? . -> . txexpr-tag?) (λ(tag) tag)]
 								[#:txexpr-attrs-proc txexpr-attrs-proc (txexpr-attrs? . -> . txexpr-attrs?) (λ(attrs) attrs)]
 								[#:txexpr-elements-proc txexpr-elements-proc (txexpr-elements? . -> . txexpr-elements?) (λ(elements) elements)]
-												allow empty lists in `decode` contracts

											
										
										
											9 years ago
+								[#:block-txexpr-proc block-txexpr-proc (block-txexpr? . -> . (or/c xexpr? (listof xexpr?))) (λ(tx) tx)]
 								[#:inline-txexpr-proc inline-txexpr-proc (txexpr? . -> . (or/c xexpr? (listof xexpr?))) (λ(tx) tx)]
 								[#:string-proc string-proc (string? . -> . (or/c xexpr? (listof xexpr?))) (λ(str) str)]
 								[#:entity-proc entity-proc ((or/c symbol? valid-char?) . -> . (or/c xexpr? (listof xexpr?))) (λ(ent) ent)]
 								[#:cdata-proc cdata-proc (cdata? . -> . (or/c xexpr? (listof xexpr?))) (λ(cdata) cdata)]
-												add #:exclude-attrs to `decode`

											
										
										
											10 years ago
+								[#:exclude-tags tags-to-exclude (listof txexpr-tag?) null]
 								[#:exclude-attrs attrs-to-exclude txexpr-attrs? null]
-												doc updates

											
										
										
											11 years ago
+								)
-												allow empty lists in `decode` contracts

											
										
										
											9 years ago
+								(or/c xexpr/c (listof xexpr/c))]
-												doc updates

											
										
										
											11 years ago
+								Recursively process a @racket[_tagged-xexpr], usually the one exported from a Pollen source file as @racket[doc].
-												updates

											
										
										
											11 years ago
-												doc updates

											
										
										
											11 years ago
+								This function doesn't do much on its own. Rather, it provides the hooks upon which harder-working functions can be hung.
-												doc update

											
										
										
											10 years ago
+								Recall that in Pollen, all @secref["tags-are-functions"]. By default, the @racket[_tagged-xexpr] from a source file is tagged with @racket[root]. So the typical way to use @racket[decode] is to attach your decoding functions to it, and then define @racket[root] to invoke your @racket[decode] function. Then it will be automatically applied to every @racket[doc] during compile.
-												doc updates

											
										
										
											11 years ago
-												documentation edits

											
										
										
											10 years ago
+								For instance, here's how @racket[decode] is attached to @racket[root] in @link["http://practicaltypography.com"]{@italic{Butterick's Practical Typography}}. There's not much to it —
-												updates

											
										
										
											11 years ago
-												add example of 'decode' to docs

											
										
										
											11 years ago
+								@racketblock[
 								(define (root . items)
 								  (decode (make-txexpr 'root '() items)
 								          #:txexpr-elements-proc detect-paragraphs
-												doc typo

											
										
										
											11 years ago
+								          #:block-txexpr-proc (compose1 hyphenate wrap-hanging-quotes)
-												add example of 'decode' to docs

											
										
										
											11 years ago
+								          #:string-proc (compose1 smart-quotes smart-dashes)
 								          #:exclude-tags '(style script)))
 								          ]
-												doc updates

											
										
										
											11 years ago
-												add note about hyphenate package

											
										
										
											11 years ago
+								@margin-note{The @racket[hyphenate] function is not part of Pollen, but rather the @link["http://github.com/mbutterick/hyphenate"]{@racket[hyphenate] package}, which you can install separately.}
-												updates

											
										
										
											11 years ago
+								This illustrates another important point: even though @racket[decode] presents an imposing list of arguments, you're unlikely to use all of them at once. These represent possibilities, not requirements. For instance, let's see what happens when @racket[decode] is invoked without any of its optional arguments.
-												doc updates

											
										
										
											11 years ago
 								@examples[#:eval my-eval
 								(define tx '(root "I wonder" (em "why") "this works."))
 								(decode tx)
 								]
-												updates

											
										
										
											11 years ago
+								Right — nothing. That's because the default value for the decoding arguments is the identity function, @racket[(λ(x)x)]. So all the input gets passed through intact unless another action is specified.
-												doc updates

											
										
										
											11 years ago
-												doc updates

											
										
										
											11 years ago
+								The @racket[_*-proc] arguments of @racket[decode] take procedures that are applied to specific categories of elements within @racket[_txexpr].
-												doc updates

											
										
										
											11 years ago
+								The @racket[_txexpr-tag-proc] argument is a procedure that handles X-expression tags.
 								@examples[#:eval my-eval
 								(define tx '(p "I'm from a strange" (strong "namespace")))
-												doc updates

											
										
										
											11 years ago
+								(code:comment @#,t{Tags are symbols, so a tag-proc should return a symbol})
-												doc updates

											
										
										
											11 years ago
+								(decode tx #:txexpr-tag-proc (λ(t) (string->symbol (format "ns:~a" t))))
 								]
-												doc updates

											
										
										
											11 years ago
+								The @racket[_txexpr-attrs-proc] argument is a procedure that handles lists of X-expression attributes. (The @racketmodname[txexpr] module, included at no extra charge with Pollen, includes useful helper functions for dealing with these attribute lists.)
-												doc updates

											
										
										
											11 years ago
 								@examples[#:eval my-eval
 								(define tx '(p [[id "first"]] "If I only had a brain."))
-												doc updates

											
										
										
											11 years ago
+								(code:comment @#,t{Attrs is a list, so cons is OK for simple cases})
-												doc updates

											
										
										
											11 years ago
+								(decode tx #:txexpr-attrs-proc (λ(attrs) (cons '[class "PhD"] attrs )))
 								]
 								Note that @racket[_txexpr-attrs-proc] will change the attributes of every tagged X-expression, even those that don't have attributes. This is useful, because sometimes you want to add attributes where none existed before. But be careful, because the behavior may make your processing function overinclusive.
 								@examples[#:eval my-eval
 								(define tx '(div (p [[id "first"]] "If I only had a brain.")
 								(p "Me too.")))
-												doc updates

											
										
										
											11 years ago
+								(code:comment @#,t{This will insert the new attribute everywhere})
-												doc updates

											
										
										
											11 years ago
+								(decode tx #:txexpr-attrs-proc (λ(attrs) (cons '[class "PhD"] attrs )))
-												doc updates

											
										
										
											11 years ago
+								(code:comment @#,t{This will add the new attribute only to non-null attribute lists})
-												doc updates

											
										
										
											11 years ago
+								(decode tx #:txexpr-attrs-proc
 								(λ(attrs) (if (null? attrs) attrs (cons '[class "PhD"] attrs ))))
 								]
-												allow `decode` procedures to return lists of values (which get spliced)

											
										
										
											9 years ago
+								The @racket[_txexpr-elements-proc] argument is a procedure that operates on the list of elements that represents the content of each tagged X-expression. Note that each element of an X-expression is subject to two passes through the decoder: once now, as a member of the list of elements, and also later, through its type-specific decoder (i.e., @racket[_string-proc], @racket[_entity-proc], and so on).
-												doc updates

											
										
										
											11 years ago
 								@examples[#:eval my-eval
-												doc updates

											
										
										
											11 years ago
+								(define tx '(div "Double" "\n" "toil" amp "trouble"))
 								(code:comment @#,t{Every element gets doubled ...})
-												allow `decode` procedures to return lists of values (which get spliced)

											
										
										
											9 years ago
+								(decode tx #:txexpr-elements-proc (λ(es) (append-map (λ(e) (list e e)) es)))
-												doc updates

											
										
										
											11 years ago
+								(code:comment @#,t{... but only strings get capitalized})
-												allow `decode` procedures to return lists of values (which get spliced)

											
										
										
											9 years ago
+								(decode tx #:txexpr-elements-proc (λ(es) (append-map (λ(e) (list e e)) es))
-												doc updates

											
										
										
											11 years ago
+								#:string-proc (λ(s) (string-upcase s)))
 								]
-												allow `decode` procedures to return lists of values (which get spliced)

											
										
										
											9 years ago
+								So why do you need @racket[_txexpr-elements-proc]? Because some types of element decoding depend on context, thus it's necessary to handle the elements as a group. For instance, paragraph detection. The behavior is not merely a @racket[map] across each element, because elements are being removed and altered contextually:
-												doc updates

											
										
										
											11 years ago
 								@examples[#:eval my-eval
 								(define (paras tx) (decode tx #:txexpr-elements-proc detect-paragraphs))
-												doc updates

											
										
										
											11 years ago
+								(code:comment @#,t{Context matters. Trailing whitespace is ignored ...})
-												doc updates

											
										
										
											11 years ago
+								(paras '(body "The first paragraph." "\n\n"))
-												doc updates

											
										
										
											11 years ago
+								(code:comment @#,t{... but whitespace between strings is converted to a break.})
-												doc updates

											
										
										
											11 years ago
+								(paras '(body "The first paragraph." "\n\n" "And another."))
-												doc updates

											
										
										
											11 years ago
+								(code:comment @#,t{A combination of both types})
-												doc updates

											
										
										
											11 years ago
+								(paras '(body "The first paragraph." "\n\n" "And another." "\n\n"))
 								]
-												clarification about block-txexpr-proc

											
										
										
											10 years ago
+								The @racket[_block-txexpr-proc] argument and the @racket[_inline-txexpr-proc] arguments are procedures that operate on tagged X-expressions. If the X-expression meets the @racket[block-txexpr?] test, it's processed by @racket[_block-txexpr-proc]. Otherwise, it's inline, so it's processed by @racket[_inline-txexpr-proc]. (Careful, however — these aren't mutually exclusive, because @racket[_block-txexpr-proc] operates on all the elements of a block, including other tagged X-expressions within.)
 								Of course, if you want block and inline elements to be handled the same way, you can set @racket[_block-txexpr-proc] and @racket[_inline-txexpr-proc] to be the same procedure.
-												updates

											
										
										
											11 years ago
 								@examples[#:eval my-eval
 								(define tx '(div "Please" (em "mind the gap") (h1 "Tuesdays only")))
-												doc updates

											
										
										
											11 years ago
+								(define add-ns (λ(tx) (make-txexpr
 								    (string->symbol (format "ns:~a" (get-tag tx)))
 								    (get-attrs tx)
 								    (get-elements tx))))
 								(code:comment @#,t{div and h1 are block elements, so this will only affect them})
-												updates

											
										
										
											11 years ago
+								(decode tx #:block-txexpr-proc add-ns)
-												doc updates

											
										
										
											11 years ago
+								(code:comment @#,t{em is an inline element, so this will only affect it})
-												updates

											
										
										
											11 years ago
+								(decode tx #:inline-txexpr-proc add-ns)
-												doc updates

											
										
										
											11 years ago
+								(code:comment @#,t{this will affect all elements})
-												updates

											
										
										
											11 years ago
+								(decode tx #:block-txexpr-proc add-ns #:inline-txexpr-proc add-ns)
 								]
-												allow `decode` procedures to return lists of values (which get spliced)

											
										
										
											9 years ago
+								The @racket[_string-proc], @racket[_entity-proc], and @racket[_cdata-proc] arguments are procedures that operate on X-expressions that are strings, entities, and CDATA, respectively. Deliberately, the output contracts for these procedures accept any kind of X-expression (meaning, the procedure can change the X-expression type).
-												doc updates

											
										
										
											11 years ago
-												updates

											
										
										
											11 years ago
+								@examples[#:eval my-eval
-												allow `decode` procedures to return lists of values (which get spliced)

											
										
										
											9 years ago
+								(code:comment @#,t{A div with string, entity, and cdata elements})
-												updates

											
										
										
											11 years ago
+								(define tx `(div "Moe" amp 62 ,(cdata #f #f "3 > 2;")))
 								(define rulify (λ(x) '(hr)))
-												doc updates

											
										
										
											11 years ago
+								(code:comment @#,t{The rulify function is selectively applied to each})
 								(print (decode tx #:string-proc rulify))
-												allow `decode` procedures to return lists of values (which get spliced)

											
										
										
											9 years ago
+								(print (decode tx #:entity-proc rulify))
-												doc updates

											
										
										
											11 years ago
+								(print (decode tx #:cdata-proc rulify))
-												updates

											
										
										
											11 years ago
+								]
-												doc updates

											
										
										
											11 years ago
-												allow `decode` procedures to return lists of values (which get spliced)

											
										
										
											9 years ago
+								Note that entities come in two flavors — symbolic and numeric — and @racket[_entity-proc] affects both. If you only want to affect one or the other, you can add a test within @racket[_entity-proc]. Symbolic entities can be detected with @racket[symbol?], and numeric entities with @racket[valid-char?]:
 								@examples[#:eval my-eval
 								(define tx `(div amp 62))
 								(define symbolic-detonate (λ(x) (if (symbol? x) 'BOOM x)))
 								(print (decode tx #:entity-proc symbolic-detonate))
 								(define numeric-detonate (λ(x) (if (valid-char? x) 'BOOM x)))
 								(print (decode tx #:entity-proc numeric-detonate))
 								]
 								The five previous procedures — @racket[_block-txexpr-proc], @racket[_inline-txexpr-proc], @racket[_string-proc], @racket[_entity-proc], and @racket[_cdata-proc] — can return either a single X-expression, or a list of X-expressions, which will be spliced into the parent at the same point.
-												doc updates

											
										
										
											11 years ago
-												allow `decode` procedures to return lists of values (which get spliced)

											
										
										
											9 years ago
+								For instance, earlier we saw how to double elements by using @racket[_txexpr-elements-proc]. But you can accomplish the same thing on a case-by-case basis by returning a list of values:
-												doc updates

											
										
										
											11 years ago
-												allow `decode` procedures to return lists of values (which get spliced)

											
										
										
											9 years ago
+								@examples[#:eval my-eval
 								(code:comment @#,t{A div with string, entity, and inline-txexpr elements})
 								(define tx `(div "Axl" amp (span "Slash")))
 								(define doubler (λ(x) (list x x)))
 								(code:comment @#,t{The doubler function is selectively applied to each type of element})
 								(print (decode tx #:string-proc doubler))
 								(print (decode tx #:entity-proc doubler))
 								(print (decode tx #:inline-txexpr-proc doubler))
 								]
-												contract clarifications

											
										
										
											9 years ago
+								Caution: when returning list values, it's possible to trip over the unavoidable ambiguity between a @racket[txexpr?] and a list of @racket[xexpr?]s that happens to begin with a symbolic entity:
-												allow `decode` procedures to return lists of values (which get spliced)

											
										
										
											9 years ago
 								@examples[#:eval my-eval
 								(code:comment @#,t{An ambiguous expression})
 								(define amb '(guitar "player-name"))
 								(and (txexpr-elements? amb) (txexpr? amb))
 								(code:comment @#,t{Ambiguity in context})
 								(define x '(gnr "Izzy" "Slash"))
 								(define rockit (λ(str) (list 'guitar str)))
 								(code:comment @#,t{Expecting '(gnr guitar "Izzy" guitar "Slash") from next line,
 								but return value will be treated as tagged X-expression})
 								(decode x #:string-proc rockit)
 								(code:comment @#,t{Changing the order makes it unambiguous})
 								(define rockit2 (λ(str) (list str 'guitar)))
 								(decode x #:string-proc rockit2)
 								]
-												doc updates

											
										
										
											11 years ago
-												add #:exclude-attrs to `decode`

											
										
										
											10 years ago
+								The @racket[_tags-to-exclude] argument is a list of tags that will be exempted from decoding. Though you could get the same result by testing the input within the individual decoding functions, that's tedious and potentially slower.
-												doc updates

											
										
										
											11 years ago
 								@examples[#:eval my-eval
 								(define tx '(p "I really think" (em "italics") "should be lowercase."))
-												add #:exclude-attrs to `decode`

											
										
										
											10 years ago
+								(decode tx #:string-proc string-upcase)
 								(decode tx #:string-proc string-upcase #:exclude-tags '(em))
-												doc updates

											
										
										
											11 years ago
+								]
 								The @racket[_tags-to-exclude] argument is useful if you're decoding source that's destined to become HTML. According to the HTML spec, material within a @racket[<style>] or @racket[<script>] block needs to be preserved literally. In this example, if the CSS and JavaScript blocks are capitalized, they won't work. So exclude @racket['(style script)], and problem solved.
 								@examples[#:eval my-eval
 								(define tx '(body (h1 [[class "Red"]] "Let's visit Planet Telex.")
 								(style [[type "text/css"]] ".Red {color: green;}")
 								(script [[type "text/javascript"]] "var area = h * w;")))
-												add #:exclude-attrs to `decode`

											
										
										
											10 years ago
+								(decode tx #:string-proc string-upcase)
 								(decode tx #:string-proc string-upcase #:exclude-tags '(style script))
-												doc updates

											
										
										
											11 years ago
+								]
-												add #:exclude-attrs to `decode`

											
										
										
											10 years ago
+								Finally, the @racket[_attrs-to-exclude] argument works the same way as @racket[_tags-to-exclude], but instead of excluding an element based on its tag, it excludes based on whether the element has a matching attribute/value pair.
 								@examples[#:eval my-eval
 								(define tx '(p (span "No attrs") (span ((id "foo")) "One attr")))
 								(decode tx #:string-proc string-upcase)
 								(decode tx #:string-proc string-upcase #:exclude-attrs '((id "foo")))
 								]
-												add docs for `decode-elements`

											
										
										
											10 years ago
 								@defproc[
 								(decode-elements
 								[elements txexpr-elements?]
 								[#:txexpr-tag-proc txexpr-tag-proc (txexpr-tag? . -> . txexpr-tag?) (λ(tag) tag)]
 								[#:txexpr-attrs-proc txexpr-attrs-proc (txexpr-attrs? . -> . txexpr-attrs?) (λ(attrs) attrs)]
 								[#:txexpr-elements-proc txexpr-elements-proc (txexpr-elements? . -> . txexpr-elements?) (λ(elements) elements)]
-												allow empty lists in `decode` contracts

											
										
										
											9 years ago
+								[#:block-txexpr-proc block-txexpr-proc (block-txexpr? . -> . (or/c xexpr? (listof xexpr?))) (λ(tx) tx)]
 								[#:inline-txexpr-proc inline-txexpr-proc (txexpr? . -> . (or/c xexpr? (listof xexpr?))) (λ(tx) tx)]
 								[#:string-proc string-proc (string? . -> . (or/c xexpr? (listof xexpr?))) (λ(str) str)]
 								[#:entity-proc entity-proc ((or/c symbol? valid-char?) . -> . (or/c xexpr? (listof xexpr?))) (λ(ent) ent)]
 								[#:cdata-proc cdata-proc (cdata? . -> . (or/c xexpr? (listof xexpr?))) (λ(cdata) cdata)]
-												add #:exclude-attrs to `decode`

											
										
										
											10 years ago
+								[#:exclude-tags tags-to-exclude (listof txexpr-tag?) null]
 								[#:exclude-attrs attrs-to-exclude txexpr-attrs? null]
-												add docs for `decode-elements`

											
										
										
											10 years ago
+								)
-												allow empty lists in `decode` contracts

											
										
										
											9 years ago
+								(or/c xexpr/c (listof xexpr/c))]
-												add docs for `decode-elements`

											
										
										
											10 years ago
+								Identical to @racket[decode], but takes @racket[txexpr-elements?] as input rather than a whole tagged X-expression, and likewise returns @racket[txexpr-elements?] rather than a tagged X-expression. A convenience variant for use inside tag functions.
-												doc updates

											
										
										
											11 years ago
+								@section{Block}
-												doc updates

											
										
										
											11 years ago
-												add docs for `decode-elements`

											
										
										
											10 years ago
+								Because it's convenient, Pollen puts tagged X-expressions into two categories: @italic{block} and @italic{inline}. Why is it convenient? When using @racket[decode], you often want to treat the two categories differently. Not that you have to. But this is how you can.
-												doc updates

											
										
										
											11 years ago
-												remove #:value from docs

											
										
										
											11 years ago
+								@defparam[project-block-tags block-tags (listof txexpr-tag?)]{
-												doc updates

											
										
										
											11 years ago
+								A parameter that defines the set of tags that @racket[decode] will treat as blocks. This parameter is initialized with the HTML block tags, namely:
-												doc fix

											
										
										
											11 years ago
+								@code[(format "~a" html:block-tags)]}
-												doc updates

											
										
										
											11 years ago
-												doc updates

											
										
										
											11 years ago
+								@defproc[
 								(register-block-tag
 								[tag txexpr-tag?])
 								void?]
 								Adds a tag to @racket[project-block-tags] so that @racket[block-txexpr?] will report it as a block, and @racket[decode] will process it with @racket[_block-txexpr-proc] rather than @racket[_inline-txexpr-proc].
-												updates

											
										
										
											11 years ago
+								Pollen tries to do the right thing without being told. But this is the rare case where you have to be explicit. If you introduce a tag into your markup that you want treated as a block, you @bold{must} use this function to identify it, or you will get spooky behavior later on.
-												doc updates

											
										
										
											11 years ago
 								For instance, @racket[detect-paragraphs] knows that block elements in the markup shouldn't be wrapped in a @racket[p] tag. So if you introduce a new block element called @racket[bloq] without registering it as a block, misbehavior will follow:
 								@examples[#:eval my-eval
 								(define (paras tx) (decode tx #:txexpr-elements-proc detect-paragraphs))
-												doc updates

											
										
										
											11 years ago
+								(paras '(body "I want to be a paragraph." "\n\n" (bloq "But not me.")))
 								(code:comment @#,t{Wrong: bloq should not be wrapped})
-												doc updates

											
										
										
											11 years ago
+								]
 								But once you register @racket[bloq] as a block, order is restored:
 								@examples[#:eval my-eval
 								(define (paras tx) (decode tx #:txexpr-elements-proc detect-paragraphs))
 								(register-block-tag 'bloq)
-												doc updates

											
										
										
											11 years ago
+								(paras '(body "I want to be a paragraph." "\n\n" (bloq "But not me.")))
 								(code:comment @#,t{Right: bloq is treated as a block})
-												doc updates

											
										
										
											11 years ago
+								]
 								If you find the idea of registering block tags unbearable, good news. The @racket[project-block-tags] include the standard HTML block tags by default. So if you just want to use things like @racket[div] and @racket[p] and @racket[h1–h6], you'll get the right behavior for free.
 								@examples[#:eval my-eval
 								(define (paras tx) (decode tx #:txexpr-elements-proc detect-paragraphs))
-												doc updates

											
										
										
											11 years ago
+								(paras '(body "I want to be a paragraph." "\n\n" (div "But not me.")))
-												doc updates

											
										
										
											11 years ago
+								]
 								@defproc[
 								(block-txexpr?
 								[v any/c])
 								boolean?]
-												updates

											
										
										
											11 years ago
+								Predicate that tests whether @racket[_v] is a tagged X-expression, and if so, whether the tag is among the @racket[project-block-tags]. If not, it is treated as inline. To adjust how this test works, use @racket[register-block-tag].
-												doc updates

											
										
										
											11 years ago
-												documentation updates

											
										
										
											11 years ago
+								@section{Typography}
-												updates

											
										
										
											11 years ago
 								An assortment of typography & layout functions, designed to be used with @racket[decode]. These aren't hard to write. So if you like these, use them. If not, make your own.
 								@defproc[
 								(whitespace?
 								[v any/c])
 								boolean?]
-												doc updates

											
										
										
											11 years ago
+								A predicate that returns @racket[#t] for any stringlike @racket[_v] that's entirely whitespace, but also the empty string, as well as lists and vectors that are made only of @racket[whitespace?] members. Following the @racket[regexp-match] convention, @racket[whitespace?] does not return @racket[#t] for a nonbreaking space. If you prefer that behavior, use @racket[whitespace/nbsp?].
-												updates

											
										
										
											11 years ago
 								@examples[#:eval my-eval
 								(whitespace? "\n\n   ")
 								(whitespace? (string->symbol "\n\n   "))
 								(whitespace? "")
 								(whitespace? '("" "  " "\n\n\n" " \n"))
-												updates

											
										
										
											11 years ago
+								(define nonbreaking-space (format "~a" #\u00A0))
-												updates

											
										
										
											11 years ago
+								(whitespace? nonbreaking-space)
 								]
 								@defproc[
 								(whitespace/nbsp?
 								[v any/c])
 								boolean?]
 								Like @racket[whitespace?], but also returns @racket[#t] for nonbreaking spaces.
 								@examples[#:eval my-eval
 								(whitespace/nbsp? "\n\n   ")
 								(whitespace/nbsp? (string->symbol "\n\n   "))
 								(whitespace/nbsp? "")
 								(whitespace/nbsp? '("" "  " "\n\n\n" " \n"))
-												updates

											
										
										
											11 years ago
+								(define nonbreaking-space (format "~a" #\u00A0))
-												updates

											
										
										
											11 years ago
+								(whitespace/nbsp? nonbreaking-space)
 								]
 								@defproc[
 								(smart-quotes
 								[str string?])
 								string?]
 								Convert straight quotes in @racket[_str] to curly according to American English conventions.
 								@examples[#:eval my-eval
 								(define tricky-string
 								"\"Why,\" she could've asked, \"are we in O‘ahu watching 'Mame'?\"")
 								(display tricky-string)
 								(display (smart-quotes tricky-string))
 								]
 								@defproc[
 								(smart-dashes
 								[str string?])
 								string?]
 								In @racket[_str], convert three hyphens to an em dash, and two hyphens to an en dash, and remove surrounding spaces.
 								@examples[#:eval my-eval
 								(define tricky-string "I had a few --- OK, like 6--8 --- thin mints.")
 								(display tricky-string)
 								(display (smart-dashes tricky-string))
-												doc updates

											
										
										
											11 years ago
+								(code:comment @#,t{Monospaced font not great for showing dashes, but you get the idea})
-												updates

											
										
										
											11 years ago
+								]
-												refactor `merge-newlines` & add docs

											
										
										
											9 years ago
+								@defproc[
 								(merge-newlines
 								[elements (listof xexpr?)])
 								(listof xexpr?)]
 								Within @racket[_elements], merge sequential newline characters (@racket["\n"]) into a single whitespace element. Helper function used by @racket[detect-paragraphs].
 								@examples[#:eval my-eval
 								(merge-newlines '(p "\n" "\n" "foo" "\n" "\n\n" "bar"
 								  (em "\n" "\n" "\n")))]
-												updates

											
										
										
											11 years ago
+								@defproc[
 								(detect-linebreaks
-												refactor `merge-newlines` & add docs

											
										
										
											9 years ago
+								[tagged-xexpr-elements (listof xexpr?)]
-												rename world:get-* to world:current-*

											
										
										
											10 years ago
+								[#:separator linebreak-sep string? (world:current-linebreak-separator)]
-												updates

											
										
										
											11 years ago
+								[#:insert linebreak xexpr? '(br)])
-												refactor `merge-newlines` & add docs

											
										
										
											9 years ago
+								(listof xexpr?)]
-												updates

											
										
										
											11 years ago
+								Within @racket[_tagged-xexpr-elements], convert occurrences of @racket[_linebreak-sep] (@racket["\n"] by default) to @racket[_linebreak], but only if @racket[_linebreak-sep] does not occur between blocks (see @racket[block-txexpr?]). Why? Because block-level elements automatically display on a new line, so adding @racket[_linebreak] would be superfluous. In that case, @racket[_linebreak-sep] just disappears.
 								@examples[#:eval my-eval
 								(detect-linebreaks '(div "Two items:" "\n" (em "Eggs") "\n" (em "Bacon")))
 								(detect-linebreaks '(div "Two items:" "\n" (div "Eggs") "\n" (div "Bacon")))
 								]
 								@defproc[
 								(detect-paragraphs
-												refactor `merge-newlines` & add docs

											
										
										
											9 years ago
+								[elements (listof xexpr?)]
-												rename world:get-* to world:current-*

											
										
										
											10 years ago
+								[#:separator paragraph-sep string? (world:current-paragraph-separator)]
-												updates

											
										
										
											11 years ago
+								[#:tag paragraph-tag symbol? 'p]
-												refactor `merge-newlines` & add docs

											
										
										
											9 years ago
+								[#:linebreak-proc linebreak-proc ((listof xexpr?) . -> . (listof xexpr?)) detect-linebreaks]
-												add #:force? option to `detect-paragraphs`

											
										
										
											10 years ago
+								[#:force? force-paragraph? boolean? #f])
-												refactor `merge-newlines` & add docs

											
										
										
											9 years ago
+								(listof xexpr?)]
-												make `detect-paragraphs` work more intuitively

											
										
										
											10 years ago
+								Find paragraphs within @racket[_elements] and wrap them with @racket[_paragraph-tag]. Also handle linebreaks using @racket[detect-linebreaks].
-												clarify detect-paragraphs behavior in docs

											
										
										
											10 years ago
-												make `detect-paragraphs` work more intuitively

											
										
										
											10 years ago
+								What counts as a paragraph? Any @racket[_elements] that are either a) explicitly set apart with @racket[_paragraph-sep], or b) adjacent to a @racket[block-txexpr?] (in which case the paragraph-ness is implied).
-												updates

											
										
										
											11 years ago
-												make `detect-paragraphs` work more intuitively

											
										
										
											10 years ago
+								@examples[#:eval my-eval
 								(detect-paragraphs '("Explicit para" "\n\n" "Explicit para"))
 								(detect-paragraphs '("Explicit para" "\n\n" "Explicit para" "\n" "Explicit line"))
 								(detect-paragraphs '("Implied para" (div "Block") "Implied para"))
 								]
-												updates

											
										
										
											11 years ago
-												make `detect-paragraphs` work more intuitively

											
										
										
											10 years ago
+								If @racket[_element] is already a block, it will not be wrapped as a paragraph (because in that case, the wrapping would be superfluous). Thus, as a consequence, if @racket[_paragraph-sep] occurs between two blocks, it will be ignored (as in the example below using two sequential @racket[div] blocks.) Likewise, @racket[_paragraph-sep] will also be ignored if it occurs between a block and a non-block (because a paragraph break is already implied).
-												add #:force? option to `detect-paragraphs`

											
										
										
											10 years ago
-												updates

											
										
										
											11 years ago
+								@examples[#:eval my-eval
-												make `detect-paragraphs` work more intuitively

											
										
										
											10 years ago
+								(code:comment @#,t{The explicit "\n\n" makes no difference in these cases})
-												updates

											
										
										
											11 years ago
+								(detect-paragraphs '((div "First block") "\n\n" (div "Second block")))
-												make `detect-paragraphs` work more intuitively

											
										
										
											10 years ago
+								(detect-paragraphs '((div "First block") (div "Second block")))
 								(detect-paragraphs '("Para" "\n\n" (div "Block")))
 								(detect-paragraphs '("Para" (div "Block")))
 								]
 								The @racket[_paragraph-tag] argument sets the tag used to wrap paragraphs.
 								@examples[#:eval my-eval
-												updates

											
										
										
											11 years ago
+								(detect-paragraphs '("First para" "\n\n" "Second para") #:tag 'ns:p)
-												make `detect-paragraphs` work more intuitively

											
										
										
											10 years ago
+								]
 								The @racket[_linebreak-proc] argument allows you to use a different linebreaking procedure other than the usual @racket[detect-linebreaks].
 								@examples[#:eval my-eval
-												updates

											
										
										
											11 years ago
+								(detect-paragraphs '("First para" "\n\n" "Second para" "\n" "Second line")
 								#:linebreak-proc (λ(x) (detect-linebreaks x #:insert '(newline))))
-												make `detect-paragraphs` work more intuitively

											
										
										
											10 years ago
+								]
 								The @racket[#:force?] option will wrap a paragraph tag around @racket[_elements], even if no explicit or implicit paragraph breaks are found. The @racket[#:force?] option is useful for when you want to guarantee that you always get a list of blocks.
 								@examples[#:eval my-eval
 								(detect-paragraphs '("This" (span "will not be") "a paragraph"))
 								(detect-paragraphs '("But this" (span "will be") "a paragraph") #:force? #t)
-												updates

											
										
										
											11 years ago
+								]
-												fix & document wrap-hanging-quotes

											
										
										
											11 years ago
 								@defproc[
 								(wrap-hanging-quotes
 								[tx txexpr?]
 								[#:single-preprend single-preprender txexpr-tag? 'squo]
 								[#:double-preprend double-preprender txexpr-tag? 'dquo]
 								)
 								txexpr?]
 								Find single or double quote marks at the beginning of @racket[_tx] and wrap them in an X-expression with the tag @racket[_single-preprender] or @racket[_double-preprender], respectively. The default values are @racket['squo] and @racket['dquo].
 								@examples[#:eval my-eval
 								(wrap-hanging-quotes '(p "No quote to hang."))
 								(wrap-hanging-quotes '(p "“What? We need to hang quotes?”"))
 								]
 								In pro typography, quotation marks at the beginning of a line or paragraph are often shifted into the margin slightly to make them appear more optically aligned with the left edge of the text. With a reflowable layout model like HTML, you don't know where your line breaks will be.
 								This function will simply insert the @racket['squo] and @racket['dquo] tags, which provide hooks that let you do the actual hanging via CSS, like so (actual measurement can be refined to taste):
-												fix typo in decode docs

											
										
										
											11 years ago
+								@verbatim{squo {margin-left: -0.25em;}
 								dquo {margin-left: -0.50em;}
-												fix & document wrap-hanging-quotes

											
										
										
											11 years ago
+								}
 								Be warned: there are many edge cases this function does not handle well.
 								@examples[#:eval my-eval
 								(code:comment @#,t{Argh: this edge case is not handled properly})
 								(wrap-hanging-quotes '(p "“" (em "What?") "We need to hang quotes?”"))
 								]