Provide make-replacer from decode.rkt / improve performance on heavily crosslinked pages #25

Closed
opened 10 years ago by chipotle · 12 comments
chipotle commented 10 years ago (Migrated from github.com)

It's possible people might want to write their own versions of smart-quotes or smart-dashes, or write similar simple regex-replacing functions in directory-require.rkt. (For instance, I have a long-standing habit of typing just two dashes to mean an em dash.) It'll be easier to do this if make-replacer is defined with define+provide.

It's possible people might want to write their own versions of `smart-quotes` or `smart-dashes`, or write similar simple regex-replacing functions in `directory-require.rkt`. (For instance, I have a long-standing habit of typing just two dashes to mean an em dash.) It'll be easier to do this if `make-replacer` is defined with `define+provide`.
mbutterick commented 10 years ago (Migrated from github.com)

If you’re asking for smart-quotes and smart-dashes to be customizable, that could be done.

I wrote make-replacer as a private helper function, so it’s neither general enough nor simple enough to be useful as part of the public interface. I’m not averse to making a better version of it. I’d just need someone who wants it to explain how it should work.

If you’re asking for `smart-quotes` and `smart-dashes` to be customizable, that could be done. I wrote `make-replacer` as a private helper function, so it’s neither general enough nor simple enough to be useful as part of the public interface. I’m not averse to making a better version of it. I’d just need someone who wants it to explain how it should work.
chipotle commented 10 years ago (Migrated from github.com)

Well, what I was thinking of is essentially "run this regex style transform on the contents of the tag it's associated with," like smart-quotes on root. A while ago I was noodling around with an idea for an extensible markup thingy (so far only in notes rather than code, and likely to remain that way), and one of its notions was a "transform" command that basically did a regex search/replace on the document forward from the point the command occurred. I suppose in Racket this would be something similar to, for example,

(regexp-replace* #px"(\\*|_)(.*?)\\1" str "<em>\\2</em>")

To do a (probably dangerously naive) Markdown-ish italics conversion. It's possible that the best way to do this is for users to simply wrap regexp-replace* in something that #:string-proc handles, but you can probably think of a better interface. (I really know very little Racket or Lisp, and I'm sort of poking things with sticks until they seem to be the right shape.)

Well, what I was thinking of is essentially "run this regex style transform on the contents of the tag it's associated with," like `smart-quotes` on `root`. A while ago I was noodling around with an idea for an extensible markup thingy (so far only in notes rather than code, and likely to remain that way), and one of its notions was a "transform" command that basically did a regex search/replace on the document forward from the point the command occurred. I suppose in Racket this would be something similar to, for example, ``` racket (regexp-replace* #px"(\\*|_)(.*?)\\1" str "<em>\\2</em>") ``` To do a (probably dangerously naive) Markdown-ish italics conversion. It's possible that the best way to do this is for users to simply wrap `regexp-replace*` in something that `#:string-proc` handles, but you can probably think of a better interface. (I really know very little Racket or Lisp, and I'm sort of poking things with sticks until they seem to be the right shape.)
mbutterick commented 10 years ago (Migrated from github.com)

make-replacer simply chains together calls to regexp-replace*, so it’s not doing any heavy lifting.

Moreover, the simplicity can be deceiving — smart-quotes has edge cases that make-replacer can’t reach. Encouraging shortcuts is bad policy. ;)

BTW, if you’re asking a broader question about whether markup can be parsed with regular expressions — it’s tempting, but it leads to anguish. Even though markup looks like a string, it’s really a tree encoded in a string. Regexps aren’t designed to process trees, and fall apart when presented with anything but simple cases. See also this.

`make-replacer` simply chains together calls to `regexp-replace*`, so it’s not doing any heavy lifting. Moreover, the simplicity can be deceiving — `smart-quotes` has edge cases that `make-replacer` can’t reach. Encouraging shortcuts is bad policy. ;) BTW, if you’re asking a broader question about whether markup can be parsed with regular expressions — it’s tempting, but it leads to anguish. Even though markup looks like a string, it’s really a tree encoded in a string. Regexps aren’t designed to process trees, and fall apart when presented with anything but simple cases. [See also this.](http://stackoverflow.com/a/1732454/1486915)
chipotle commented 10 years ago (Migrated from github.com)

No, it wasn't that question, really. I've done conversions between markup styles with regexes before, but there was nothing I'd describe as "parsing" involved -- when I've actually had to write code that parses HTML, I've used a parsing library. :)

Really, these are just things that are coming up as I'm poking around with a project I'm trying with Pollen -- whether there are simple (or not-so-simple) tasks that could either be made easier, or maybe just put no a list of things to add to a "cookbook" section of the docs eventually. (The language I used over the last few days trying to figure out how to create a table of contents from a list was, ah, colorful.)

No, it wasn't that question, really. I've done conversions between markup styles with regexes before, but there was nothing I'd describe as "parsing" involved -- when I've actually had to write code that _parses_ HTML, I've used a parsing library. :) Really, these are just things that are coming up as I'm poking around with a project I'm trying with Pollen -- whether there are simple (or not-so-simple) tasks that could either be made easier, or maybe just put no a list of things to add to a "cookbook" section of the docs eventually. (The language I used over the last few days trying to figure out how to create a table of contents from a list was, ah, colorful.)
mbutterick commented 10 years ago (Migrated from github.com)

If you post a sample of the task, either a) I’ll have a better idea for how to do it or b) it will suggest something that should be added to the core Pollen functions to make life easier.

Either way, you win.

If you post a sample of the task, either a) I’ll have a better idea for how to do it or b) it will suggest something that should be added to the core Pollen functions to make life easier. Either way, you win.
chipotle commented 10 years ago (Migrated from github.com)

Well, while this is rather changing the subject from the initial issue title, the table of contents is actually the bigger stumbling block so far. I suspect I don't really understand how to use pagetrees particularly effectively yet -- the examples don't include making a table of contents or really using hierarchies (which I suspect is really what I want here). My attempts to create TOC-generating functions from pagetrees have so far ended in dismal failure. I have one that works from a list that you pass it:

(define (link-chapter mydoc)
  (make-txexpr 'li '()
               `((a [[href ,mydoc]] ,@(select-from-doc 'h1 mydoc)))))

(define (make-toc pt)
  `(ul [[class "toc"]]
       ,@(map
          ;; You could write the line below as simply `link-chapter` — MB
          (lambda (mydoc) (link-chapter mydoc)) 
          pt)))

But I suspect there's a better way of doing it. I don't want everything in the page tree, though (the table of contents page itself doesn't need to be listed, for instance, but it needs to be in the navigation on subsequent pages, ideally as its own "go up" link). Also, the TOC page renders really slowly with that.

This doesn't strike me as a core Pollen thing, but possibly something that -- down the road -- could be part of an optional library, sort of LaTeX is to TeX, if that makes any sense.

Well, while this is rather changing the subject from the initial issue title, the table of contents is actually the bigger stumbling block so far. I suspect I don't really understand how to use pagetrees particularly effectively yet -- the examples don't include making a table of contents or really using hierarchies (which I suspect is really what I want here). My attempts to create TOC-generating functions from pagetrees have so far ended in dismal failure. I have one that works from a list that you pass it: ``` racket (define (link-chapter mydoc) (make-txexpr 'li '() `((a [[href ,mydoc]] ,@(select-from-doc 'h1 mydoc))))) (define (make-toc pt) `(ul [[class "toc"]] ,@(map ;; You could write the line below as simply `link-chapter` — MB (lambda (mydoc) (link-chapter mydoc)) pt))) ``` But I suspect there's a better way of doing it. I don't want everything in the page tree, though (the table of contents page itself doesn't need to be listed, for instance, but it needs to be in the navigation on subsequent pages, ideally as its own "go up" link). Also, the TOC page renders _really_ slowly with that. This doesn't strike me as a core Pollen thing, but possibly something that -- down the road -- could be part of an optional library, sort of LaTeX is to TeX, if that makes any sense.
mbutterick commented 10 years ago (Migrated from github.com)

You're on the right track. The problem with a TOC — which I have also come up against, and have not adequately solved yet — is when it relies on something like (select-from-doc 'h1 mydoc), you have to load and compile every page that's referenced. That's why it's slow. OTOH, the overall structure of a document doesn't tend to change as often as, say, the content of any individual source file, so an always-dynamic TOC is usually superfluous.

On practicaltypography.com, I've worked around this problem by using a static TOC and updating by hand. This is terrible, but not slow.

I think the solution might be to have a "caching pagetree" that is generated dynamically but not in real time. So it would store not just pagenodes but also the values from, say, (select-from-doc 'h1 mydoc). Generating a TOC from this file would be fast because you wouldn't need to load the source files. But you could refresh it anytime from the source files (still slowly, but you control the timing).

As to your issues:

Generating the TOC from a subset of the main pagetree. Two choices.

  1. You can split your main.ptree into toc.ptree and navigation.ptree, and have the second one incorporate toc.ptree by reference.
  2. You can select a subset of nodes within main.ptree by using (select* 'top-node-name main.ptree), and pass these to your TOC generator.

TOC function. Your code has the right idea. What you're missing is recursion (which is what you need to handle hierarchical data). I would probably do it like this:

(define (link href target)
  `(a [[href ,(~a href)]] ,(~a target)))

(define (make-toc x)
  (cond
    [(list? x) `(ul ,@(map make-toc x))]
    [(pagenode? x) `(li ,(link x (select-from-doc 'h1 x)))]))

This use of cond with type-detecting branches is the idiomatic way of doing what the pros would call recursive descent. (The key innovation is callng make-toc again from within make-toc.) But this will work on a pagetree with any number of hierarchical levels.

You're on the right track. The problem with a TOC — which I have also come up against, and have not adequately solved yet — is when it relies on something like `(select-from-doc 'h1 mydoc)`, you have to load and compile every page that's referenced. That's why it's slow. OTOH, the overall structure of a document doesn't tend to change as often as, say, the content of any individual source file, so an always-dynamic TOC is usually superfluous. On practicaltypography.com, I've worked around this problem by using a static TOC and updating by hand. This is terrible, but not slow. I think the solution might be to have a "caching pagetree" that is generated dynamically but not in real time. So it would store not just pagenodes but also the values from, say, `(select-from-doc 'h1 mydoc)`. Generating a TOC from this file would be fast because you wouldn't need to load the source files. But you could refresh it anytime from the source files (still slowly, but you control the timing). As to your issues: **Generating the TOC from a subset of the main pagetree.** Two choices. 1. You can split your `main.ptree` into `toc.ptree` and `navigation.ptree`, and have the second one incorporate `toc.ptree` by reference. 2. You can select a subset of nodes within `main.ptree` by using `(select* 'top-node-name main.ptree)`, and pass these to your TOC generator. **TOC function.** Your code has the right idea. What you're missing is recursion (which is what you need to handle hierarchical data). I would probably do it like this: ``` racket (define (link href target) `(a [[href ,(~a href)]] ,(~a target))) (define (make-toc x) (cond [(list? x) `(ul ,@(map make-toc x))] [(pagenode? x) `(li ,(link x (select-from-doc 'h1 x)))])) ``` This use of `cond` with type-detecting branches is the idiomatic way of doing what the pros would call _recursive descent_. (The key innovation is callng `make-toc` again from within `make-toc`.) But this will work on a pagetree with any number of hierarchical levels.
mbutterick commented 10 years ago (Migrated from github.com)

BTW Pollen does cache source files during a project-server session or batch rendering command. So if you were rendering every page anyhow, rendering the TOC after that would be fast. The slowness is evident when the TOC is rendered first, or alone, because it triggers the compilation of the other files.

BTW Pollen does cache source files during a project-server session or batch rendering command. So if you were rendering every page anyhow, rendering the TOC after that would be fast. The slowness is evident when the TOC is rendered first, or alone, because it triggers the compilation of the other files.
mbutterick commented 10 years ago (Migrated from github.com)

Instead of introducing a new concept like a caching pagetree, perhaps I could just change the cache so it writes its data to disk when it changes. Then, when you start up, it could load this file, and you can continue from where you left off.

Instead of introducing a new concept like a caching pagetree, perhaps I could just change the cache so it writes its data to disk when it changes. Then, when you start up, it could load this file, and you can continue from where you left off.
mbutterick commented 10 years ago (Migrated from github.com)

OK, the file-based cache is added. It works as expected. I'm seeing a big speedup (10–20x) on pages with a lot of links to other pages, like this. Time to convert my TOC to work dynamically …

OK, the file-based cache is added. It works as expected. I'm seeing a big speedup (10–20x) on pages with a lot of links to other pages, [like this](http://practicaltypography.com/type-composition.html). Time to convert my TOC to work dynamically …
chipotle commented 10 years ago (Migrated from github.com)

Cool. I'd actually switched my project to a manual TOC but will play around with this. (My current fights are mostly relating to CSS, and aren't the sorts of things that Pollen can help with, unless there is a "make browser manufacturers implement CSS3 faster" feature you're working on.)

Cool. I'd actually switched my project to a manual TOC but will play around with this. (My current fights are mostly relating to CSS, and aren't the sorts of things that Pollen can help with, unless there is a "make browser manufacturers implement CSS3 faster" feature you're working on.)
mbutterick commented 10 years ago (Migrated from github.com)

Using the Pollen preprocessor on your CSS files can save a lot of browser-related headaches. When you have a specific example in mind, post another issue.

Using the Pollen preprocessor on your CSS files can save a lot of browser-related headaches. When you have a specific example in mind, post another issue.
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mbutterick/pollen#25
Loading…
There is no content yet.