txexpr: Tagged X-expressions

6.0.1.6

txexpr: Tagged X-expressions

Matthew Butterick <mb@mbtype.com>

(require txexpr)	package: txexpr
(require (submod txexpr safe))

A set of small but handy functions for improving the readability and reliability of programs that operate on tagged X-expressions (for short, txexprs).

1 Installation

At the command line:

raco pkg install txexpr

After that, you can update the package from the command line:

raco pkg update txexpr

2 Importing the module

The module operates in two modes: fast and safe. Fast mode is the default, which you get by importing the module in the usual way: (require txexpr).

Safe mode enables the function contracts documented below. Use safe mode by importing the module as (require (submod txexpr safe)).

3 What’s a txexpr?

It’s an X-expression with the following grammar:

txexpr	=	(list tag (list attr ...) element ...)
	\|	(cons tag (list element ...))

tag	=	symbol?

attr	=	(list key value)

key	=	symbol?

value	=	string?

element	=	xexpr?

A txexpr is a list with a symbol in the first position — the tag — followed by a series of elements, which are other X-expressions. Optionally, a txexpr can have a list of attributes in the second position.

Examples:

> (txexpr? '(span "Brennan" "Dale"))
#t
> (txexpr? '(span "Brennan" (em "Richard") "Dale"))
#t
> (txexpr? '(span [[class "hidden"][id "names"]] "Brennan" "Dale"))
#t
> (txexpr? '(span lt gt amp))
#t
> (txexpr? '("We really" "should have" "a tag"))
#f
> (txexpr? '(span [[class not-quoted]] "Brennan"))
#f
> (txexpr? '(span [class "hidden"] "Brennan" "Dale"))
#t

The last one is a common mistake. Because the key–value pair is not enclosed in a list, it’s interpreted as a nested txexpr within the first txexpr, as you may not find out until you try to read its attributes:

There’s no way of eliminating this ambiguity, short of always requiring an attribute list — empty if necessary — in your txexpr. See also xexpr-drop-empty-attributes.

Examples:

> (get-attrs '(span [class "hidden"] "Brennan" "Dale"))
'()
> (get-elements '(span [class "hidden"] "Brennan" "Dale"))
'((class "hidden") "Brennan" "Dale")

Tagged X-expressions are most commonly found in HTML & XML documents. Though the notation is different in Racket, the data structure is identical:

Examples:

> (xexpr->string '(span [[id "names"]] "Brennan" (em "Richard") "Dale"))
"<span id=\"names\">Brennan<em>Richard</em>Dale</span>"
> (string->xexpr "<span id=\"names\">Brennan<em>Richard</em>Dale</span>")
'(span ((id "names")) "Brennan" (em () "Richard") "Dale")

After converting to and from HTML, we get back the original X-expression. Well, almost. The brackets turned into parentheses — no big deal, since they mean the same thing in Racket. Also, per its usual practice, string->xexpr added an empty attribute list after em. This is also benign.

4 Why not just use match, quasiquote, and so on?

If you prefer those, please do. But I’ve found two benefits to using module functions:

Readability. In code that already has a lot of matching and quasiquoting going on, these functions make it easy to see where & how txexprs are being used.

Reliability. Because txexprs come in two close but not quite equal forms, careful coders will always have to take both cases into account.

The programming is trivial, but the annoyance is real.

5 Interface

procedure
(txexpr? v) → boolean?
  v : any/c
procedure
(txexpr-tag? v) → boolean?
  v : any/c
procedure
(txexpr-attr? v) → boolean?
  v : any/c
procedure
(txexpr-attr-key? v) → boolean?
  v : any/c
procedure
(txexpr-attr-value? v) → boolean?
  v : any/c
procedure
(txexpr-element? v) → boolean?
  v : any/c

Predicates for txexprs that implement this grammar:

txexpr	=	(list tag (list attr ...) element ...)
	\|	(cons tag (list element ...))

tag	=	symbol?

attr	=	(list key value)

key	=	symbol?

value	=	string?

element	=	xexpr?

procedure
(txexpr-attrs? v) → boolean?
v : any/c
procedure
(txexpr-elements? v) → boolean?
v : any/c

Shorthand for (listof txexpr-attr?) and (listof txexpr-element?).

procedure
(validate-txexpr possible-txexpr) → txexpr?
possible-txexpr : any/c

Like txexpr?, but raises a descriptive error if possible-txexpr is invalid, and otherwise returns possible-txexpr itself.

Examples:

> (validate-txexpr 'root)
validate-txexpr: 'root is not a list starting with a symbol
> (validate-txexpr '(root))
'(root)
> (validate-txexpr '(root ((id "top")(class 42))))
validate-txexpr-attrs: in '(root ((id "top") (class 42))),
'((id "top") (class 42)) is not a valid list of attributes
because '(class 42) is not in the form '(symbol "string")
> (validate-txexpr '(root ((id "top")(class "42"))))
'(root ((id "top") (class "42")))
> (validate-txexpr '(root ((id "top")(class "42")) ("hi")))
validate-txexpr-element: in '(root ((id "top") (class "42"))
("hi")), '("hi") is not a valid element (must be txexpr,
string, symbol, XML char, or cdata)
> (validate-txexpr '(root ((id "top")(class "42")) "hi"))
'(root ((id "top") (class "42")) "hi")

procedure
(can-be-txexpr-attr-key? v) → boolean?
v : any/c
procedure
(can-be-txexpr-attr-value? v) → boolean?
v : any/c

Predicates for input arguments that are trivially converted to an attribute key or value…

procedure
(->txexpr-attr-key v) → txexpr-attr-key?
v : can-be-txexpr-attr-key?
procedure
(->txexpr-attr-value v) → txexpr-attr-value?
v : can-be-txexpr-attr-value?

… with these conversion functions.

procedure
(txexpr->values tx)
→
txexpr-tag? txexpr-attrs? txexpr-elements?
tx : txexpr?

Dissolves a txexpr into its components and returns all three.

Examples:

> (txexpr->values '(div))
'div
'()
'()
> (txexpr->values '(div "Hello" (p "World")))
'div
'()
'("Hello" (p "World"))
> (txexpr->values '(div [[id "top"]] "Hello" (p "World")))
'div
'((id "top"))
'("Hello" (p "World"))

procedure
(txexpr->list tx) →
(list txexpr-tag?
txexpr-attrs?
txexpr-elements?)
tx : txexpr?

Like txexpr->values, but returns the three components in a list.

Examples:

> (txexpr->list '(div))
'(div () ())
> (txexpr->list '(div "Hello" (p "World")))
'(div () ("Hello" (p "World")))
> (txexpr->list '(div [[id "top"]] "Hello" (p "World")))
'(div ((id "top")) ("Hello" (p "World")))

procedure
(xexpr->html x) → string?
x : xexpr?

Convert x to an HTML string. Better than xexpr->string because consistent with the HTML spec, it will not escape text that appears within script or style blocks. For convenience, this function will take any X-expression, not just tagged X-expressions.

Examples:

> (define tx '(root (script "3 > 2") "Why is 3 > 2?"))
> (xexpr->string tx)
"<root><script>3 > 2</script>Why is 3 > 2?</root>"
> (xexpr->html tx)
"<root><script>3 > 2</script>Why is 3 > 2?</root>"
> (map xexpr->html (list "string" 'entity 65))
'("string" "&entity;" "A")

procedure
(get-tag tx) → txexpr-tag?
  tx : txexpr?
procedure
(get-attrs tx) → txexpr-attr?
  tx : txexpr?
procedure
(get-elements tx) → (listof txexpr-element?)
  tx : txexpr?

Accessor functions for the individual pieces of a txexpr.

Examples:

> (get-tag '(div [[id "top"]] "Hello" (p "World")))
'div
> (get-attrs '(div [[id "top"]] "Hello" (p "World")))
'((id "top"))
> (get-elements '(div [[id "top"]] "Hello" (p "World")))
'("Hello" (p "World"))

procedure
(make-txexpr tag [attrs elements]) → txexpr?
  tag : txexpr-tag?
  attrs : txexpr-attrs? = empty
  elements : txexpr-elements? = empty

Assemble a txexpr from its parts. If you don’t have attributes, but you do have elements, you’ll need to pass empty as the second argument. Note that unlike xml->xexpr, if the attribute list is empty, it’s not included in the resulting expression.

Examples:

> (make-txexpr 'div)
'(div)
> (make-txexpr 'div '() '("Hello" (p "World")))
'(div "Hello" (p "World"))
> (make-txexpr 'div '[[id "top"]])
'(div ((id "top")))
> (make-txexpr 'div '[[id "top"]] '("Hello" (p "World")))
'(div ((id "top")) "Hello" (p "World"))
> (define tx '(div [[id "top"]] "Hello" (p "World")))
> (make-txexpr (get-tag tx)
(get-attrs tx) (get-elements tx))
'(div ((id "top")) "Hello" (p "World"))

procedure
(can-be-txexpr-attrs? v) → boolean?
v : any/c

Predicate for functions that handle txexpr-attrs. Covers values that are easily converted into pairs of attr-key and attr-value. Namely: single xexpr-attrs, lists of xexpr-attrs (i.e., what you get from get-attrs), or interleaved symbols and strings (each pair will be concatenated into a single xexpr-attr).

procedure
(attrs->hash x ...) → hash?
x : can-be-txexpr-attrs?
procedure
(hash->attrs h) → txexpr-attrs?
h : hash?

Convert attrs to an immutable hash, and back again.

Examples:

> (define tx '(div [[id "top"][class "red"]] "Hello" (p "World")))
> (attrs->hash (get-attrs tx))
'#hash((class . "red") (id . "top"))
> (hash->attrs '#hash((class . "red") (id . "top")))
'((class "red") (id "top"))

procedure
(attrs-have-key? attrs key) → boolean?
attrs : (or/c txexpr-attrs? txexpr?)
key : can-be-txexpr-attr-key?

Returns #t if the attrs contain a value for the given key, #f otherwise.

Examples:

> (define tx '(div [[id "top"][class "red"]] "Hello" (p "World")))
> (attrs-have-key? tx 'id)
#t
> (attrs-have-key? tx 'grackle)
#f

procedure
(attr-ref tx key) → txexpr-attr-value?
tx : txexpr?
key : can-be-txexpr-attr-key?

Given a key, look up the corresponding value in the attributes of a txexpr. Asking for a nonexistent key produces an error.

Examples:

> (attr-ref tx 'class)
"red"
> (attr-ref tx 'id)
"top"
> (attr-ref tx 'nonexistent-key)
attr-ref: no value found for key 'nonexistent-key

procedure
(attr-set tx key value) → txexpr?
  tx : txexpr?
  key : can-be-txexpr-attr-key?
  value : txexpr-attr-value?

Given a txexpr, set the value of attribute key to value. The function returns the updated txexpr.

Examples:

> (define tx '(div [[class "red"][id "top"]] "Hello" (p "World")))
> (attr-set tx 'id "bottom")
'(div ((class "red") (id "bottom")) "Hello" (p "World"))
> (attr-set tx 'class "blue")
'(div ((class "blue") (id "top")) "Hello" (p "World"))
> (attr-set (attr-set tx 'id "bottom") 'class "blue")
'(div ((class "blue") (id "bottom")) "Hello" (p "World"))

procedure
(merge-attrs attrs ...) → txexpr-attrs?
attrs : (listof can-be-txexpr-attrs?)

Combine a series of attributes into a single txexpr-attrs item. This function addresses three annoyances that surface in working with txexpr attributes.

You can pass the attributes in multiple forms. See can-be-txexpr-attrs? for further details.
Attributes with the same name are merged, with the later value taking precedence (i.e., hash behavior).
Attributes are sorted in alphabetical order.

Examples:

> (define tx '(div [[id "top"][class "red"]] "Hello" (p "World")))
> (define tx-attrs (get-attrs tx))
> tx-attrs
'((id "top") (class "red"))
> (merge-attrs tx-attrs 'editable "true")
'((class "red") (editable "true") (id "top"))
> (merge-attrs tx-attrs 'id "override-value")
'((class "red") (id "override-value"))
> (define my-attr '(id "another-override"))
> (merge-attrs tx-attrs my-attr)
'((class "red") (id "another-override"))
> (merge-attrs my-attr tx-attrs)
'((class "red") (id "top"))

procedure
(remove-attrs tx) → txexpr?
tx : txexpr?

Recursively remove all attributes.

Examples:

> (define tx '(div [[id "top"]] "Hello" (p [[id "lower"]] "World")))
> (remove-attrs tx)
'(div "Hello" (p "World"))

procedure
(map-elements proc tx) → txexpr?
proc : procedure?
tx : txexpr?

Recursively apply proc to all elements, leaving tags and attributes alone. Using plain map will only process elements at the top level of the current txexpr. Usually that’s not what you want.

Examples:

> (define tx '(div "Hello!" (p "Welcome to" (strong "Mars"))))
> (define upcaser (λ(x) (if (string? x) (string-upcase x) x)))
> (map upcaser tx)
'(div "HELLO!" (p "Welcome to" (strong "Mars")))
> (map-elements upcaser tx)
'(div "HELLO!" (p "WELCOME TO" (strong "MARS")))

In practice, most xexpr-elements are strings. But woe befalls those who pass string procedures to map-elements, because an xexpr-element can be any kind of xexpr?, and an xexpr? is not necessarily a string.

Examples:

> (define tx '(p "Welcome to" (strong "Mars" amp "Sons")))
> (map-elements string-upcase tx)
string-upcase: contract violation
expected: string?
given: 'amp
> (define upcaser (λ(x) (if (string? x) (string-upcase x) x)))
> (map-elements upcaser tx)
'(p "WELCOME TO" (strong "MARS" amp "SONS"))

procedure
(map-elements/exclude proc tx exclude-test) → txexpr?
  proc : procedure?
  tx : txexpr?
  exclude-test : (txexpr? . -> . boolean?)

Like map-elements, but skips any txexprs that evaluate to #t under exclude-test. The exclude-test gets a whole txexpr as input, so it can test any of its parts.

Examples:

> (define tx '(div "Hello!" (p "Welcome to" (strong "Mars"))))
> (define upcaser (λ(x) (if (string? x) (string-upcase x) x)))
> (map-elements upcaser tx)
'(div "HELLO!" (p "WELCOME TO" (strong "MARS")))
> (map-elements/exclude upcaser tx (λ(x) (equal? (get-tag x) 'strong)))
'(div "HELLO!" (p "WELCOME TO" (strong "Mars")))

Be careful with the wider consequences of exclusion tests. When exclude-test is true, the txexpr is excluded, but so is everything underneath that txexpr. In other words, there is no way to re-include (un-exclude?) elements nested under an excluded element.

Examples:

> (define tx '(div "Hello!" (p "Welcome to" (strong "Mars"))))
> (define upcaser (λ(x) (if (string? x) (string-upcase x) x)))
> (map-elements upcaser tx)
'(div "HELLO!" (p "WELCOME TO" (strong "MARS")))
> (map-elements/exclude upcaser tx (λ(x) (equal? (get-tag x) 'p)))
'(div "HELLO!" (p "Welcome to" (strong "Mars")))
> (map-elements/exclude upcaser tx (λ(x) (equal? (get-tag x) 'div)))
'(div "Hello!" (p "Welcome to" (strong "Mars")))

procedure
(splitf-txexpr tx pred) →
txexpr? (listof txexpr-element?)
tx : txexpr?
pred : procedure?

Recursively descend through txexpr and extract all elements that match pred. Returns two values: a txexpr with the matching elements removed, and the list of matching elements. Sort of esoteric, but I’ve needed it more than once, so here it is.

Examples:

> (define tx '(div "Wonderful day" (meta "weather" "good") "for a walk"))
> (define remove? (λ(x) (and (txexpr? x) (equal? 'meta (get-tag x)))))
> (splitf-txexpr tx remove?)
'(div "Wonderful day" "for a walk")
'((meta "weather" "good"))

6 License & source code

This module is licensed under the LGPL.

Source repository at http://github.com/mbutterick/txexpr. Suggestions & corrections welcome.

1	Installation
2	Importing the module
3	What’s a txexpr?
4	Why not just use match, quasiquote, and so on?
5	Interface
6	License & source code