cache-watchlist generates wrong keys, effectively turning caching off #192

Closed
opened 5 years ago by sorawee · 11 comments
sorawee commented 5 years ago (Migrated from github.com)

In my project I have:

root/
├─ pollen.rkt
├─ rkt/
|  └─ tags.rkt
├─ common
|  └─ common.html.pm
├─ subA
|  └─ a.html.pm
└─ subB
   └─ b.html.pm

and the following setup:

  (define watchlist '("rkt/tags.rkt"))
  (define cache-watchlist (map resolve-module-path watchlist))

When I render page subA/a.html.pm which calls get-doc of common/common.html.pm, the key that is used for caching (as in cache-file in cache-utils.rkt) contains (subA/rkt/tags.rkt . 0).

This is bad because:

  1. It's wrong: tags.rkt is in rkt, not subA/rkt. The modified time is 0 because subA/rkt/tags.rkt doesn't exist.
  2. Say that after this I render subB/b.html.pm which similarly calls get-doc of common/common.html.pm. Even though common.html is already cached, the cache is not getting hit because the keys are different. E.g., (subA/rkt/tags.rkt . 0) vs (subB/rkt/tags.rkt . 0).

Empirically, in my project after I take cache-watchlist out, raco pollen reset; raco pollen render speeds up by at least 20 seconds.

In my project I have: ``` root/ ├─ pollen.rkt ├─ rkt/ | └─ tags.rkt ├─ common | └─ common.html.pm ├─ subA | └─ a.html.pm └─ subB └─ b.html.pm ``` and the following setup: ``` (define watchlist '("rkt/tags.rkt")) (define cache-watchlist (map resolve-module-path watchlist)) ``` When I render page `subA/a.html.pm` which calls `get-doc` of `common/common.html.pm`, the key that is used for caching (as in `cache-file` in `cache-utils.rkt`) contains `(subA/rkt/tags.rkt . 0)`. This is bad because: 1. It's wrong: `tags.rkt` is in `rkt`, not `subA/rkt`. The modified time is 0 because `subA/rkt/tags.rkt` doesn't exist. 2. Say that after this I render `subB/b.html.pm` which similarly calls `get-doc` of `common/common.html.pm`. Even though `common.html` is already cached, the cache is not getting hit because the keys are different. E.g., `(subA/rkt/tags.rkt . 0)` vs `(subB/rkt/tags.rkt . 0)`. Empirically, in my project after I take `cache-watchlist` out, `raco pollen reset; raco pollen render` speeds up by at least 20 seconds.
mbutterick commented 5 years ago (Migrated from github.com)

I know I recommended the resolve-module-path technique in the Pollen docs. On reflection there seems to be a wrinkle: I see now that it resolves path strings like "rkt/tags.rkt" relative to the current directory (which moves around during a pollen render). So maybe it’s not a reliable technique. Does this version of the setup module fix your problem?

#lang racket/base

(module setup racket/base
  (provide (all-defined-out))
  (require racket/runtime-path)
  (define-runtime-path tags "rkt/tags.rkt")
  (define cache-watchlist (list tags)))
I know I recommended the `resolve-module-path` technique in the Pollen docs. On reflection there seems to be a wrinkle: I see now that it resolves path strings like `"rkt/tags.rkt"` relative to the current directory (which moves around during a pollen render). So maybe it’s not a reliable technique. Does this version of the `setup` module fix your problem? ```racket #lang racket/base (module setup racket/base (provide (all-defined-out)) (require racket/runtime-path) (define-runtime-path tags "rkt/tags.rkt") (define cache-watchlist (list tags))) ```
sorawee commented 5 years ago (Migrated from github.com)

That works, thanks! However, I also discover another bug.

Consider the above situation except that I rename subB/b.html.pm to subB/b.xml.pm. When rendering subB/b.xml.pm, the cache key for common/common.html seems to be:

(#f xml #f (common/common.html.pm . 1545602934) #f (pollen.rkt . 1545603962))

But when rendering subA/a.html.pm, the cache key for common/common.html is:

(#f html #f (common/common.html.pm . 1545602934) #f (pollen.rkt . 1545603962))

preventing the cache to get hit.

Is there any reason that the extension is a part of the cache key?

That works, thanks! However, I also discover another bug. Consider the above situation except that I rename `subB/b.html.pm` to `subB/b.xml.pm`. When rendering `subB/b.xml.pm`, the cache key for `common/common.html` seems to be: ``` (#f xml #f (common/common.html.pm . 1545602934) #f (pollen.rkt . 1545603962)) ``` But when rendering `subA/a.html.pm`, the cache key for `common/common.html` is: ``` (#f html #f (common/common.html.pm . 1545602934) #f (pollen.rkt . 1545603962)) ``` preventing the cache to get hit. Is there any reason that the extension is a part of the cache key?
mbutterick commented 5 years ago (Migrated from github.com)

Right — for poly files the cache key includes the current-poly-target because it might affect how the doc or metas is rendered.

Right — for `poly` files the cache key includes the `current-poly-target` because it might affect how the `doc` or `metas` is rendered.
mbutterick commented 5 years ago (Migrated from github.com)

Oh I see, it’s not a poly source. Well let me check.

Oh I see, it’s not a `poly` source. Well let me check.
mbutterick commented 5 years ago (Migrated from github.com)

I’m not seeing the file extension in my cache key for common.html (assuming I’ve simulated your situation accurately)

#hash(((#f #f #f ("/Users/···/root/subA/../common/common.html.pm" . 1545611617) #f ("/Users/···/root/pollen.rkt" . 1545611425) ("/Users/···/root/rkt/tags.rkt" . 1545611464)) . "cached15456116361545611636181"))

BTW #lang multi-file is a nice way of making test cases that need to generate multiple files & directories.

I’m not seeing the file extension in my cache key for `common.html` (assuming I’ve simulated your situation accurately) ``` #hash(((#f #f #f ("/Users/···/root/subA/../common/common.html.pm" . 1545611617) #f ("/Users/···/root/pollen.rkt" . 1545611425) ("/Users/···/root/rkt/tags.rkt" . 1545611464)) . "cached15456116361545611636181")) ``` BTW [`#lang multi-file`](http://docs.racket-lang.org/multi-file-lang/index.html) is a nice way of making test cases that need to generate multiple files & directories.
sorawee commented 5 years ago (Migrated from github.com)

It's actually a poly file. I simplified it to .html.pm when I wrote the comment. So you are totally right. I guess it makes sense...

Thanks for pointing out to #lang multi-file. This looks really useful.

Would you mind updating the documentation regarding resolve-module-path? After that I would considered this issue resolved.

A warning regarding the interaction between poly and caching in the documentation would also be nice.

It's actually a `poly` file. I simplified it to `.html.pm` when I wrote the comment. So you are totally right. I guess it makes sense... Thanks for pointing out to `#lang multi-file`. This looks really useful. Would you mind updating the documentation regarding `resolve-module-path`? After that I would considered this issue resolved. A warning regarding the interaction between `poly` and caching in the documentation would also be nice.
sorawee commented 5 years ago (Migrated from github.com)

I ended up writing this macro:

  (define-syntax (define-cache-watchlist stx)
    (syntax-case stx ()
      [(_ files ...)
       (with-syntax ([(ids ...) (generate-temporaries #'(files ...))]
                     [cache-watchlist (datum->syntax stx 'cache-watchlist)])
         #'(begin
             (define-runtime-path ids files) ...
             (define cache-watchlist (list ids ...))))]))

so that I don't need to define-runtime-path over and over and need to come up with id names. Do you think that the library should provide a macro like this? It's also more safe (if you mistype define-cache-watchlist, it would result in an unbound id error, rather than a silent "configuration doesn't work"... speaking of it, can't we convert all setups to use parameters to make them safer?)

I ended up writing this macro: ```racket (define-syntax (define-cache-watchlist stx) (syntax-case stx () [(_ files ...) (with-syntax ([(ids ...) (generate-temporaries #'(files ...))] [cache-watchlist (datum->syntax stx 'cache-watchlist)]) #'(begin (define-runtime-path ids files) ... (define cache-watchlist (list ids ...))))])) ``` so that I don't need to `define-runtime-path` over and over and need to come up with id names. Do you think that the library should provide a macro like this? It's also more safe (if you mistype `define-cache-watchlist`, it would result in an unbound id error, rather than a silent "configuration doesn't work"... speaking of it, can't we convert all setups to use parameters to make them safer?)
mbutterick commented 5 years ago (Migrated from github.com)

No, because the define-runtime-path technique is one possibility of many. The cache-watchlist doesn’t care how the complete path is generated. For instance, resolve-module-path may still be the better choice, if you want to depend on an installed module.

(PS this macro won’t work as written.)

No, because the `define-runtime-path` technique is one possibility of many. The `cache-watchlist` doesn’t care how the complete path is generated. For instance, `resolve-module-path` may still be the better choice, if you want to depend on an installed module. (PS this macro won’t work as written.)
sorawee commented 5 years ago (Migrated from github.com)

(PS this macro won’t work as written.)

How so? It seems to work for me.

> (PS this macro won’t work as written.) How so? It seems to work for me.
mbutterick commented 5 years ago (Migrated from github.com)

Somewhere (either in the macro definition environment or the macro itself) there needs to be a (require racket/runtime-path).

Somewhere (either in the macro definition environment or the macro itself) there needs to be a `(require racket/runtime-path)`.
sorawee commented 5 years ago (Migrated from github.com)

Right, I did require racket/runtime-path outside of the macro.

Thanks again for all your help!

Right, I did `require racket/runtime-path` outside of the macro. Thanks again for all your help!
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mbutterick/pollen#192
Loading…
There is no content yet.