You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
typesetting/hyphenate/scribblings/hyphenate.scrbl

63 lines
3.7 KiB
Racket

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

#lang scribble/manual
@(require scribble/eval "../main.rkt" (for-label racket "../main.rkt"))
@defmodule[hyphenate]
@title{Hyphenate}
@author{Matthew Butterick (mb@"@"mbtype.com)}
A simple hyphenation module that uses the KnuthLiang hyphenation algorithm and patterns originally developed for TeX. This implementation was ported from Ned Batchelder's @link["http://nedbatchelder.com/code/modules/hyphenate.html"]{Python version}.
I originally developed this module to handle hyphenation for my web-based book @link["http://practicaltypography.com"]{Butterick's Practical Typography}. Even though support for CSS-based hyphenation is still iffy among web browsers, soft hyphens work reliably.
@section{How to use it}
@section{Interface}
@defproc[
(hyphenate
[text string?]
[joiner (or/c char? string?) @(integer->char #x00AD)]
[#:exceptions exceptions (listof string?) empty]
[#:min-length length (or/c integer? false?) 5])
string?]
Hyphenate @racket[_text] by calculating hyphenation points and inserting @racket[_joiner] at those points. By default, @racket[_joiner] is the soft hyphen. Words shorter than @racket[_length] will not be hyphenated. To hyphenate words of any length, use @racket[#:min-length] @racket[#f].
@margin-note{The REPL will display a soft hyphen as #\u00AD. But in ordinary use, you only see a soft hyphen when it appears at the end of a line or page as part of a hyphenated word. Otherwise it's invisible.}
Using the @racket[#:exceptions] keyword, you can pass hyphenation exceptions as a list of words with regular hyphen characters (@racket["-"]) marking the permissible hyphenation points. If an exception word contains no hyphens, that word will never be hyphenated.
@examples[
(hyphenate "polymorphism" #\-)
(hyphenate "polymorphism" #\- #:exceptions '("polymo-rphism"))
(hyphenate "polymorphism" #\- #:exceptions '("polymorphism"))
]
Knuth & Liang were sufficiently confident about their algorithm that they originally released it with only 14 exceptions: @italic{associate[s], declination, obligatory, philanthropic, present[s], project[s], reciprocity, recognizance, reformation, retribution}, and @italic{table}. While their bravado is admirable, it's easy to discover words they missed.
Don't send raw HTML through @racket[hyphenate]. It can't distinguish HTML tags and attributes from textual content, but it will hyphenate them anyhow, which will break the markup. Run your textual content through @racket[hyphenate] before you put it into your page template. Or convert your HTML to an X-expression and process it selectively.
@defproc[
(hyphenatef
[text string?]
[pred procedure?]
[joiner (or/c char? string?) @(integer->char #x00AD)]
[#:exceptions exceptions (listof string?) empty]
[#:min-length length (or/c integer? false?) 5])
string?]
Like @racket[hyphenate], but only words matching @racket[_pred] are hyphenated. Convenient if you want to filter out, say, capitalized words.
@defproc[
(unhyphenate
[text string?]
[joiner (or/c char? string?) @(integer->char #x00AD)])
string?]
Remove @racket[_joiner] from @racket[_text]. Essentially equivalent to (@racket[string-replace] @racket[_text] @racket[_joiner] "").
A side effect of using @racket[hyphenate] is that soft hyphens (or whatever the @racket[_joiner] is) are being embedded in the output @racket[_text]. If you're building an application that needs to support, for instance, copying of text in a graphical interface, you probably want to strip out the hyphenation before the copied text is moved to the clipboard.
Keep in mind, however, that @racket[unhyphenate] won't produce the input originally passed to @racket[hyphenate] if the @racket[_joiner] was part of the original input @racket[_text].