Hyphenate
1 How to use it
2 Interface
hyphenate
hyphenatef
unhyphenate
6.0.0.1

Hyphenate

Matthew Butterick (mb@mbtype.com)

 (require hyphenate) package: hyphenate

A simple hyphenation module that uses the Knuth–Liang hyphenation algorithm and patterns originally developed for TeX. This implementation was ported from Ned Batchelder’s Python version.

I originally developed this module to handle hyphenation for my web-based book Butterick’s Practical Typography. Even though support for CSS-based hyphenation is still iffy among web browsers, soft hyphens work reliably.

1 How to use it

2 Interface

procedure

(hyphenate text    
  [joiner    
  #:exceptions exceptions    
  #:min-length length])  string?
  text : string?
  joiner : (or/c char? string?) = (integer->char 173)
  exceptions : (listof string?) = empty
  length : (or/c integer? false?) = 5
Hyphenate text by calculating hyphenation points and inserting joiner at those points. By default, joiner is the soft hyphen. Words shorter than length will not be hyphenated. To hyphenate words of any length, use #:min-length #f.

The REPL will display a soft hyphen as #\u00AD. But in ordinary use, you only see a soft hyphen when it appears at the end of a line or page as part of a hyphenated word. Otherwise it’s invisible.

Using the #:exceptions keyword, you can pass hyphenation exceptions as a list of words with regular hyphen characters ("-") marking the permissible hyphenation points. If an exception word contains no hyphens, that word will never be hyphenated.

Examples:

> (hyphenate "polymorphism" #\-)

hyphenate: undefined;

 cannot reference undefined identifier

> (hyphenate "polymorphism" #\- #:exceptions '("polymo-rphism"))

hyphenate: undefined;

 cannot reference undefined identifier

> (hyphenate "polymorphism" #\- #:exceptions '("polymorphism"))

hyphenate: undefined;

 cannot reference undefined identifier

Knuth & Liang were sufficiently confident about their algorithm that they originally released it with only 14 exceptions: associate[s], declination, obligatory, philanthropic, present[s], project[s], reciprocity, recognizance, reformation, retribution, and table. While their bravado is admirable, it’s easy to discover words they missed.

Don’t send raw HTML through hyphenate. It can’t distinguish HTML tags and attributes from textual content, but it will hyphenate them anyhow, which will break the markup. Run your textual content through hyphenate before you put it into your page template. Or convert your HTML to an X-expression and process it selectively.

procedure

(hyphenatef text    
  pred    
  [joiner    
  #:exceptions exceptions    
  #:min-length length])  string?
  text : string?
  pred : procedure?
  joiner : (or/c char? string?) = (integer->char 173)
  exceptions : (listof string?) = empty
  length : (or/c integer? false?) = 5
Like hyphenate, but only words matching pred are hyphenated. Convenient if you want to filter out, say, capitalized words.

procedure

(unhyphenate text [joiner])  string?

  text : string?
  joiner : (or/c char? string?) = (integer->char 173)
Remove joiner from text. Essentially equivalent to (string-replace text joiner "").

A side effect of using hyphenate is that soft hyphens (or whatever the joiner is) are being embedded in the output text. If you’re building an application that needs to support, for instance, copying of text in a graphical interface, you probably want to strip out the hyphenation before the copied text is moved to the clipboard.

Keep in mind, however, that unhyphenate won’t produce the input originally passed to hyphenate if the joiner was part of the original input text.