more

9 years ago · dc37388072
parent 7faad4f96a
commit dc37388072
2 changed files with 34 additions and 3 deletions
--- a/quad/quad/scribblings/quad.html
+++ b/quad/quad/scribblings/quad.html
--- a/quad/quad/scribblings/quad.scrbl
+++ b/quad/quad/scribblings/quad.scrbl
@ -41,7 +41,7 @@ Instead, let's take its good ideas — there are many — and terraform a new p

 In principle, it's possible to generate PDF documents from a web browser. Support for paper-based layouts has been part of the CSS concept @link["https://www.w3.org/People/howcome/p/cascade.html"]{since the beginning} (though it's been lightly used).

-But web browsers have a few limitations. First, web browsers only render HTML, and many typesetting concepts (e.g., footnotes) don't correspond to any HTML entity. So there is a narrowing of possiblities. Second, browsers are built for speed, so high-quality typesetting (e.g., the Knuth–Plass linebreaking algorithm) is off the table. Third, browsers are  inconsistent in how they render pages. Fourth — taking off my typography-snob tiara here — browsers are unstable. What seems well supported today can be broken or removed tomorrow. So browsers can't be a part of a dependable publishing workflow that produces reproducible results.
+But web browsers have a few limitations. First, web browsers only render HTML, and many typesetting concepts (e.g., footnotes) don't correspond to any HTML entity. So there is a narrowing of possiblities. Second, browsers are built for speed, so high-quality typesetting (e.g., the Knuth–Plass linebreaking algorithm) is off the table. Third, browsers are  inconsistent in how they render pages. Fourth — taking off my typography-snob tiara here — browsers are unstable. What seems well supported today can be broken or removed tomorrow. So browsers can't be a part of a dependable publishing workflow that yields reproducible results.


@section{What does Quad do?}
@ -58,6 +58,35 @@ Quad produces finished document layouts using three ingredients:

 While there's no reason Quad couldn't produce an HTML layout, that's an easier problem, because most of the document-layout chores can (and should) be delegated to the web browser. For now, most of Quad's apparatus is devoted to its layout engine so it can produce layouts for PDF.

+@section{Theory of operation}
+
+A document processor starts with input that we can think of as one giant line of text. It breaks this into smaller lines, and then distributes these lines across pages. Conceptually, it's a bin-packing problem.
+
+@itemlist[#:style 'ordered
+  @item{Quad starts with an input file written in the @code{#lang quad} markup language. For the most part, it's text with markup codes (though it may also include things like diagrams and images).}
+
+  @item{Each markup entity is called a @defterm{quad}. A quad roughly corresponds to a box. ``Roughly'' because quads can have zero or negative dimension. Also, at the input stage, the contents of some quads may end up being spread across multiple non-overlapping boxes (e.g., a quad containing a word might be hyphenated to appear on two lines). The more precise description of a quad is therefore ``contiguous markup region.'' Quads can be recursively nested inside other quads, thus the input file is tree-shaped.}
+
+  @item{This tree-shaped input file is flattened into a list of atomic  quads. ``Atomic'' because these are the smallest items the typesetter can manipulate. (For instance, the word @italic{bar} would become three one-character quads. An image or other indivisible box would remain as is.) During the flattening, tags from higher in the tree are propagated downward by copying them into the atomic quads. The result is a ``stateless'' representation of the input, in the sense that all the information needed to typeset an atomic quad is contained within the quad itself.}
+
+  @item{Atomic quads are composed into lines using one of three algorithms. (A line is just a quad of a certain width.) The first-fit algorithm puts as many quads onto a line as it can before moving on to the next. The best-fit algorithm minimizes the total looseness of all the lines in a paragraph (aka the Knuth–Plass linebreaking algorithm developed for TeX). Because best-fit is more expensive, Quad also has an adaptive-fit algorithm that uses a statistical heuristic to guess whether the paragraph will benefit from best-fit; if not, it uses first-fit.}
+
+  @item{If a typeset paragraph still exceeds certain looseness tolerances, it is hyphenated and the lines recalculated.}
+
+  @item{Once the lines are broken, extra space is distributed within each line according to whether the line should appear centered, left-aligned, justified, etc. The result is a list of quads that fills the full column width.}
+
+  @item{Lines are composed into columns. (A column is just a quad of a certain height.) To support things like footnotes, columns are composed using a backtracking constraint-satisfaction algorithm.}
+
+  @item{Columns are composed into pages.}
+
+  @item{This completes the typesetting phase. Note that at every step in the process, the document is represented in the Quad markup language. There isn't a distinction between the public and private markup interface.}
+
+  @item{Before the typeset markup is passed to the renderer, it goes through a simplification phase — a lot of adjacent quads will have the same formatting characteristics, and these can be consolidated into runs of text.}
+
+  @item{The renderer walks through the markup and draws each quad, using information in the markup attributes to determine position, color, font size & style, etc.}
+
+]
+

@section{The markup language}

@ -71,11 +100,13 @@ Quad programs can be written directly, or generated as the output of other progr



+
@section{The layout engine}

@section{The rendering engine}


+@section{Bottlenecks, roadblocks, irritations}

@section{Why is it called Quad?}