Debugging performance #36

At some point in one of my projects in the past month or so, the total time to build the site increased by about 30 seconds. Specifically, when doing a parallel render (and this is after preheating the cache) the first batch of pages render in ≈3sec each, and the rest render in ≈1.5–3 sec each. I can’t be sure if it was a change in my code, or the new Racket version, or (less likely) a change in Pollen. I upgraded to Racket 7.6 but have stuck with the non-CS variant.

This has brought me back to the question: what is a good way to investigate performance in a Pollen project? I am trying to learn how to use DrRacket’s profiling tools, but not getting very far; there isn’t much documentation about them, and the most “interesting” functions listed in the profiler are designated << unknown >>. I’ve so far tried using the profiler on a .poly.pm source file, and on a separate .rkt that calls render-to-file on that source file.

Additionally, it would be nice to see messages when there is a caching miss, so I can rule that out. Maybe another command line switch?

At some point in one of my projects in the past month or so, the total time to build the site increased by about 30 seconds. Specifically, when doing a parallel render (and this is after preheating the cache) the first batch of pages render in ≈3sec each, and the rest render in ≈1.5–3 sec each. I can’t be sure if it was a change in my code, or the new Racket version, or (less likely) a change in Pollen. I upgraded to Racket 7.6 but have stuck with the non-CS variant. This has brought me back to the question: what is a good way to investigate performance in a Pollen project? I am trying to learn how to use DrRacket’s profiling tools, but not getting very far; there isn’t much documentation about them, and the most “interesting” functions listed in the profiler are designated `<< unknown >>`. I’ve so far tried using the profiler on a `.poly.pm` source file, and on a separate `.rkt` that calls `render-to-file` on that source file. Additionally, it would be nice to see messages when there is a caching miss, so I can rule that out. Maybe another command line switch?

I’ve never used the DrRacket profiler. If I notice a large drop in performance, I usually just reset the branch in git (either of the project, or Pollen itself) until I find the commit that caused the slowdown, and then investigate further.

Sometimes there is a bug. Sometimes a performance drop is a consequence of fixing another bug. For the parallel-rendering system generally, there’s definitely been a tradeoff between getting it to work reliably, and going fast (obviously, when it skips work it ought to be doing, it’s faster).

I’m sure there are plenty of dumb ideas still lingering in the code. It’s harder to test parallel processing that other parts of the system, because it introduces a new class of erratic / occasional errors. The biggest annoyances come from managing access to shared resources (e.g., a file cache) by multiple parallel operations.

I’ve never used the DrRacket profiler. If I notice a large drop in performance, I usually just reset the branch in git (either of the project, or Pollen itself) until I find the commit that caused the slowdown, and then investigate further. Sometimes there is a bug. Sometimes a performance drop is a consequence of fixing another bug. For the parallel-rendering system generally, there’s definitely been a tradeoff between getting it to work reliably, and going fast (obviously, when it skips work it ought to be doing, it’s faster). I’m sure there are plenty of dumb ideas still lingering in the code. It’s harder to test parallel processing that other parts of the system, because it introduces a new class of erratic / occasional errors. The biggest annoyances come from managing access to shared resources (e.g., a file cache) by multiple parallel operations.

Additionally, it would be nice to see messages when there is a caching miss, so I can rule that out. Maybe another command line switch?

I think the better way to do this is to rely on the existing logging infrastructure. For instance, standard Pollen messages are logged at level info. We could start emitting other messages at debug level (like cache miss). Then you could start your session like so:

PLTSTDERR=debug@pollen raco pollen start

And the additional debug messages would be shown.

> Additionally, it would be nice to see messages when there is a caching miss, so I can rule that out. Maybe another command line switch? I think the better way to do this is to rely on the existing logging infrastructure. For instance, standard Pollen messages are logged at level `info`. We could start emitting other messages at `debug` level (like cache miss). Then you could start your session like so: `PLTSTDERR=debug@pollen raco pollen start` And the additional `debug` messages would be shown.

I’m sure there are plenty of dumb ideas still lingering in the code. It’s harder to test parallel processing that other parts of the system, because it introduces a new class of erratic / occasional errors.

I am pretty sure the cause of the behaviour I’m seeing is in my code and not in Pollen. It's a complicated project and I’ve made a lot of changes in the last several weeks.

Doing Advent of Code these last couple of years has been very helpful in getting a sense for where there can be big performance hits and gains in Racket overall. But I'm still hazy on where these optimization soft spots are in a Pollen project, particularly where templates are involved. For example, when a module is only needed within templates, I have a notion (which I have yet to test) that it's faster to require and re-provide a module from within pollen.rkt than to local-require it from within a template.

We could start emitting other messages at debug level (like cache miss).

I like that idea as well!

> I’m sure there are plenty of dumb ideas still lingering in the code. It’s harder to test parallel processing that other parts of the system, because it introduces a new class of erratic / occasional errors. I am pretty sure the cause of the behaviour I’m seeing is in my code and not in Pollen. It's a complicated project and I’ve made a lot of changes in the last several weeks. Doing Advent of Code these last couple of years has been very helpful in getting a sense for where there can be big performance hits and gains in Racket overall. But I'm still hazy on where these optimization soft spots are in a Pollen project, particularly where templates are involved. For example, when a module is only needed within templates, I have a notion (which I have yet to test) that it's faster to `require` and re-`provide` a module from within `pollen.rkt` than to `local-require` it from within a template. > We could start emitting other messages at debug level (like cache miss). I like that idea as well!

I’ve pushed an update that allows you to start Pollen commands in debug logging mode using the PLTSTDERR environment variable, for instance:

PLTSTDERR=debug@pollen raco pollen start
PLTSTDERR=debug@pollen raco pollen render

For now the only function that emits debug-level log messages is the file-cache manager (which will report every attempt to cache a file, and whether there is a cache miss). If there are other Pollen events you’re interested in seeing logged under debug, let me know.

More broadly, the message-debug function in pollen/private/log is what publishes the debug messages. I suppose it would be possible to use this within your own project code to emit debug messages. Though I think you’d probably want to create a separately named logger, so you could set individual log levels, e.g. —

PLTSTDERR="debug@joel-project info@pollen" raco pollen start

I’ve pushed an update that allows you to start Pollen commands in `debug` logging mode using the `PLTSTDERR` environment variable, for instance: `PLTSTDERR=debug@pollen raco pollen start` `PLTSTDERR=debug@pollen raco pollen render` For now the only function that emits debug-level log messages is the file-cache manager (which will report every attempt to cache a file, and whether there is a cache miss). If there are other Pollen events you’re interested in seeing logged under `debug`, let me know. More broadly, the `message-debug` function in `pollen/private/log` is what publishes the debug messages. I suppose it would be possible to use this within your own project code to emit debug messages. Though I think you’d probably want to create a separately named logger, so you could set individual log levels, e.g. — `PLTSTDERR="debug@joel-project info@pollen" raco pollen start`

Coda to my original post: I found that adding raco make pollen.rkt as a step prior to raco pollen setup -p and raco pollen render -p reduced the build time back down to where it was, erasing the 30-second slowdown I’d seen.

Note that the scenarios I’m comparing all involve essentially raco pollen reset and rm *.html as preparatory steps. So no cache or pre-existing output files.

I suspect that, when starting with no cache, raco pollen setup -p starts up parallel processes that are each attempting to compile pollen.rkt and its dependencies. And in my project at least, this causes a lot of infighting between places.

Also for what it’s worth, the from-scratch build times on this project when using Racket 7.6 (non-CS) were 97 seconds without the raco make step and 67 seconds with it. When using the recently-released Racket 7.7 (CS variant) the difference was 157 seconds without and 97 seconds with (yikes).

Coda to my original post: I found that adding `raco make pollen.rkt` as a step prior to `raco pollen setup -p` and `raco pollen render -p` reduced the build time back down to where it was, erasing the 30-second slowdown I’d seen. Note that the scenarios I’m comparing all involve essentially `raco pollen reset` and `rm *.html` as preparatory steps. So no cache or pre-existing output files. I suspect that, when starting with no cache, `raco pollen setup -p` starts up parallel processes that are each attempting to compile `pollen.rkt` and its dependencies. And in my project at least, this causes a lot of infighting between places. Also for what it’s worth, the from-scratch build times on this project when using Racket 7.6 (non-CS) were 97 seconds without the `raco make` step and 67 seconds with it. When using the recently-released Racket 7.7 (CS variant) the difference was 157 seconds without and 97 seconds with (yikes).

Thanks for that clue — though the Pollen caching system will convert necessary rkt files to bytecode (same as raco make), it looks like I had set it up with the wrong compiler function so it was recompiling those files more often than it needed to. I have pushed a fix for that.

Thanks for that clue — though the Pollen caching system will convert necessary `rkt` files to bytecode (same as `raco make`), it looks like I had set it up with the wrong compiler function so it was recompiling those files more often than it needed to. I have pushed a fix for that.

As for the difference in Racket BC vs CS, CS is definitely slower (for now, though we can expect it to improve). One area of especially degenerate performance however is parallel processing — I found that for now, CS performance degrades on more than four cores. So I would recommend limiting raco pollen setup and render to four cores with the -j 4 switch.

As for the difference in Racket BC vs CS, CS is definitely slower (for now, though we can expect it to improve). One area of especially degenerate performance however is parallel processing — I found that for now, CS performance degrades on more than four cores. So I would recommend limiting `raco pollen setup` and `render` to four cores with the `-j 4` switch.

After updating Pollen, a fresh build of my project is a few seconds faster with my manual raco make step (64 seconds) but significantly slower without it (151 seconds)! ¯_(ツ)_/¯

So I would recommend limiting raco pollen setup and render to four cores with the -j 4 switch.

My 2015 MacBook has only four cores, so for me -p ought to be equivalent to -j 4. But I do notice that on the latest Racket CS, telling it to use 4 cores still doesn’t actually max out all four cores like it does on Racket BC. I’ll definitely be sticking with BC for now.

After updating Pollen, a fresh build of my project is a few seconds faster _with_ my manual `raco make` step (64 seconds) but significantly slower without it (151 seconds)! ¯\_(ツ)_/¯ > So I would recommend limiting raco pollen setup and render to four cores with the -j 4 switch. My 2015 MacBook has only four cores, so for me `-p` ought to be equivalent to `-j 4`. But I do notice that on the latest Racket CS, telling it to use 4 cores still doesn’t actually max out all four cores like it does on Racket BC. I’ll definitely be sticking with BC for now.

Strange. I’ll look into this more. The only file you’re passing to raco make is pollen.rkt?

Strange. I’ll look into this more. The only file you’re passing to `raco make` is `pollen.rkt`?

Yes…if you’re curious, here is the line in my makefile. The process I’m testing is a make zap followed by make web. If I manually do just raco make pollen.rkt, I get .dep and .zo files inside of compiled/ for pollen.rkt and all the files it requires.

Yes…if you’re curious, [here is the line in my `makefile`](https://thelocalyarn.com/code/artifact?udc=1&ln=35&name=3781701581f3d51b). The process I’m testing is a `make zap` followed by `make web`. If I manually do just `raco make pollen.rkt`, I get `.dep` and `.zo` files inside of `compiled/` for `pollen.rkt` and all the files it requires.

Labels Milestones

Debugging performance #36