Rendering poly sources in directory mode ignores --target #248

In a project containing only the following:

A pollen.rkt containing only a setup module that defines/provides a value for poly-targets of '(html txt)
A subfolder posts containing a simple test.poly.pm file (#lang pollen)

Rendering everything in the subfolder posts and specifying --target txt should result in only .txt files getting rendered; instead the sources get rendered to all formats:

> raco pollen render --target txt posts
pollen: rendering generated pagetree for directory /Users/joel/Documents/code/sandbox/posts
pollen: rendering /test.poly.pm as html
pollen: rendered /test.html (522 ms)
pollen: rendering /test.poly.pm as txt
pollen: rendered /test.txt (567 ms))

It works as expected if you add a file glob:

> raco pollen render --target txt posts/*.poly.pm
pollen: rendering posts/test.poly.pm
pollen: rendering /posts/test.poly.pm as txt
pollen: rendered /posts/test.txt (524 ms)

In a project containing only the following: * A `pollen.rkt` containing only a `setup` module that defines/provides a value for `poly-targets` of `'(html txt)` * A subfolder `posts` containing a simple `test.poly.pm` file (`#lang pollen`) Rendering everything in the subfolder `posts` and specifying `--target txt` should result in only `.txt` files getting rendered; instead the sources get rendered to all formats: > raco pollen render --target txt posts pollen: rendering generated pagetree for directory /Users/joel/Documents/code/sandbox/posts pollen: rendering /test.poly.pm as html pollen: rendered /test.html (522 ms) pollen: rendering /test.poly.pm as txt pollen: rendered /test.txt (567 ms)) It works as expected if you add a file glob: > raco pollen render --target txt posts/*.poly.pm pollen: rendering posts/test.poly.pm pollen: rendering /posts/test.poly.pm as txt pollen: rendered /posts/test.txt (524 ms)

I’m not getting the error on the attached test case. Am I missing something?

> raco pollen render --target txt posts
pollen: rendering generated pagetree for directory /Users/mb/Desktop/248-test-case/posts
pollen: rendering /test.poly.pm as txt
pollen: rendered /test.txt (354 ms)

248-test-case.zip

I’m not getting the error on the attached test case. Am I missing something? ``` > raco pollen render --target txt posts pollen: rendering generated pagetree for directory /Users/mb/Desktop/248-test-case/posts pollen: rendering /test.poly.pm as txt pollen: rendered /test.txt (354 ms) ``` [248-test-case.zip](https://github.com/mbutterick/pollen/files/6043038/248-test-case.zip)

Hmm, seems like my steps to reproduce were incomplete.

On a fresh unzip of that test case I get the same. Call this (A):

❯ raco pollen render -t txt posts
pollen: rendering generated pagetree for directory /Users/joel/Downloads/248-test-case/posts
pollen: rendering /test.poly.pm as txt
pollen: rendered /test.txt (544 ms)

If I then try it with HTML the problem surfaces. Call this (B):

❯ raco pollen render -t html posts
pollen: rendering generated pagetree for directory /Users/joel/Downloads/248-test-case/posts
pollen: rendering /test.poly.pm as html
pollen: rendered /test.html (572 ms)
pollen: rendering /test.poly.pm as txt
pollen: rendered /test.txt (404 ms)

If after doing a reset (raco pollen reset; rm posts/*.html; rm posts/*.txt) I do (A), then just raco pollen reset before doing (B), same behavior.

If after doing the reset above I do (A) (with html), and then delete the just-generated file with rm posts/test.html before doing (B) (with txt), the problem does not occur.

It looks to me like the presence of an output file for a particular format somehow causes that output file to be re-rendered in addition to the --target-specified format when rendering poly sources in directory mode.

Hmm, seems like my steps to reproduce were incomplete. On a fresh unzip of that test case I get the same. Call this (A): ❯ raco pollen render -t txt posts pollen: rendering generated pagetree for directory /Users/joel/Downloads/248-test-case/posts pollen: rendering /test.poly.pm as txt pollen: rendered /test.txt (544 ms) If I then try it with HTML the problem surfaces. Call this (B): ❯ raco pollen render -t html posts pollen: rendering generated pagetree for directory /Users/joel/Downloads/248-test-case/posts pollen: rendering /test.poly.pm as html pollen: rendered /test.html (572 ms) pollen: rendering /test.poly.pm as txt pollen: rendered /test.txt (404 ms) If after doing a reset (`raco pollen reset; rm posts/*.html; rm posts/*.txt`) I do (A), then just `raco pollen reset` before doing (B), same behavior. If after doing the reset above I do (A) (with html), and then delete the just-generated file with `rm posts/test.html` before doing (B) (with txt), the problem *does not* occur. It looks to me like the presence of an output file for a particular format somehow causes that output file to be re-rendered in addition to the `--target`-specified format when rendering poly sources in directory mode.

Simpler way to get there: in your test case, do touch posts/test.html before doing raco pollen render -t txt posts.

Simpler way to get there: in your test case, do `touch posts/test.html` before doing `raco pollen render -t txt posts`.

I think what’s happening is that the generated pagetree includes test.html, so the inferred source is test.poly.pm, and thus test.poly.pm is rendered to test.html. But the generated pagetree also includes test.poly.pm, and the target is set to txt, so test.poly.pm is rendered to test.txt. This also explains why your variant with posts/*.poly.pm works correctly (in that case, there is a list of files rather than a generated pagetree, and the list does not include test.html).

So I don’t think this is a bug so much as an unexpected result of current rendering policies that apply to poly sources.

What result would you rather see here?
What policy would have to be adopted to accomplish that?

I think what’s happening is that the generated pagetree includes `test.html`, so the inferred source is `test.poly.pm`, and thus `test.poly.pm` is rendered to `test.html`. But the generated pagetree also includes `test.poly.pm`, and the target is set to `txt`, so `test.poly.pm` is rendered to `test.txt`. This also explains why your variant with `posts/*.poly.pm` works correctly (in that case, there is a list of files rather than a generated pagetree, and the list does not include `test.html`). So I don’t think this is a bug so much as an unexpected result of current rendering policies that apply to `poly` sources. What result would you rather see here? What policy would have to be adopted to accomplish that?

OK, that makes sense. I’ll think about it some more, but in general when you specify a target with --target it seems reasonable to expect that those are the only kinds of files that will be rendered.

One thought …the docs say:

Directory mode: raco pollen render directory renders all preprocessor source files and then all pagetree files found in the specified directory. If none of these files are found, a pagetree will be generated for the directory (which will include all source files) and then rendered.

From what you are saying it, it seems like the parenthetical would be more accurate to say “which will include all source files and any existing output files”. But if the generated pagetree actually should contain only source files, then fixing it so that is indeed the case (at least when doing a render in directory mode) seems like it would fix the problem.

If the contents of the generated pagetree should not change, maybe it’s worth thinking about the general case of what happens whenever render is told to --target txt and then handed a pagetree containing either source files that can’t be rendered into the target format, or output files from some other format.

OK, that makes sense. I’ll think about it some more, but in general when you specify a target with `--target` it seems reasonable to expect that those are the only kinds of files that will be rendered. One thought …the docs [say][1]: > **Directory mode:** `raco pollen render directory` renders all preprocessor source files and then all pagetree files found in the specified directory. If none of these files are found, a pagetree will be generated for the directory (which will include all source files) and then rendered. From what you are saying it, it seems like the parenthetical would be more accurate to say “which will include all source files and any existing output files”. But if the generated pagetree actually _should_ contain _only_ source files, then fixing it so that is indeed the case (at least when doing a render in directory mode) seems like it would fix the problem. If the contents of the generated pagetree should not change, maybe it’s worth thinking about the general case of what happens whenever render is told to `--target txt` and then handed a pagetree containing either source files that can’t be rendered into the target format, or output files from some other format. [1]: https://docs.racket-lang.org/pollen/raco-pollen.html?q=pollen#%28part._raco_pollen_render%29

But if the generated pagetree actually should contain only source files

I see your point, but the CLI for Pollen has always allowed you to invoke raco pollen render on source files or output files (on the idea that each of these names uniquely implies a source file).

It would be weird to disable this behavior in the context of relying on a generated pagetree, or make the behavior contingent on whether the source is a regular source file or poly file.

If the contents of the generated pagetree should not change …

Well, it has to change — that’s what it means to be generated. I’m open to refining what happens in this situation. But historically the generated pagetree has only been meant to provide some dumb-but-unsurprising default behavior for prototyping or simple projects.

> But if the generated pagetree actually **should** contain **only** source files I see your point, but the CLI for Pollen has always allowed you to invoke `raco pollen render` on source files or output files (on the idea that each of these names uniquely implies a source file). It would be weird to disable this behavior in the context of relying on a generated pagetree, or make the behavior contingent on whether the source is a regular source file or `poly` file. > If the contents of the generated pagetree should not change … Well, it has to change — that’s what it means to be generated. I’m open to refining what happens in this situation. But historically the generated pagetree has only been meant to provide some dumb-but-unsurprising default behavior for prototyping or simple projects.

BTW consider also the docs for the generated (aka automatic) pagetree:

In situations where Pollen needs a pagetree but can’t find one, it will automatically synthesize a pagetree from a listing of files in the directory. … As usual, convenience has a cost. Pollen doesn’t know anything about which files in your directory are relevant to the project, so it includes all of them. (emphasis added)

IOW the generated pagetree basically promises a directory listing. Not because that’s always the most useful thing, but because it’s easy to reason about.

The other way to address this issue would be to trigger the sought-after semantics not by adjusting the default behavior of the generated pagetree, but by adding some kind of flag or setup value (though I’m not clear yet what that new value ought to denote)

BTW consider also the docs for the [generated (aka automatic) pagetree](https://docs.racket-lang.org/pollen/Pagetree.html?q=automatic%20pagetree#%28part._.The_automatic_pagetree%29): > In situations where Pollen needs a pagetree but can’t find one, it will automatically synthesize a pagetree from a **listing of files in the directory.** … As usual, convenience has a cost. Pollen doesn’t know anything about which files in your directory are relevant to the project, so it **includes all of them.** (emphasis added) IOW the generated pagetree basically promises a directory listing. Not because that’s always the most useful thing, but because it’s easy to reason about. The other way to address this issue would be to trigger the sought-after semantics not by adjusting the default behavior of the generated pagetree, but by adding some kind of flag or `setup` value (though I’m not clear yet what that new value ought to denote)

IOW the generated pagetree basically promises a directory listing.

OK yes, I’m convinced it makes more sense for the generated pagetree continue to work this way, regardless of what it‘s being used for (project server, raco pollen render etc).

The question I’m left with is, should --target be honored when a pagetree is involved? What if, when rendering with --target txt and using a generated pagetree that includes .html files, Pollen simply skipped anything it couldn’t render as txt (perhaps with a warning)?

(By the way, didn’t it used to be possible to specify a pagetree file explicitly, like raco pollen render posts.ptree? Maybe I’m hallucinating but I thought I had done this before. Now when I try this I get [no paths to render].)

> IOW the generated pagetree basically promises a directory listing. OK yes, I’m convinced it makes more sense for the generated pagetree continue to work this way, regardless of what it‘s being used for (project server, `raco pollen render` etc). The question I’m left with is, should `--target` be honored when a pagetree is involved? What if, when rendering with `--target txt` and using a generated pagetree that includes `.html` files, Pollen simply skipped anything it couldn’t render as txt (perhaps with a warning)? (By the way, didn’t it used to be possible to specify a pagetree file explicitly, like `raco pollen render posts.ptree`? Maybe I’m hallucinating but I thought I had done this before. Now when I try this I get `[no paths to render]`.)

The question I’m left with is, should --target be honored when a pagetree is involved?

What are you trying to accomplish, in the end? You want to be able to render in two passes — say, html then txt — without re-rendering everything that was done during the first pass?

By the way, didn’t it used to be possible to specify a pagetree file explicitly, like raco pollen render posts.ptree

Yes, that should work. I will look into that.

> The question I’m left with is, should --target be honored when a pagetree is involved? What are you trying to accomplish, in the end? You want to be able to render in two passes — say, `html` then `txt` — without re-rendering everything that was done during the first pass? > By the way, didn’t it used to be possible to specify a pagetree file explicitly, like raco pollen render posts.ptree Yes, that should work. I will look into that.

What are you trying to accomplish, in the end?

Personally: one of my output targets is PDF, which is much more expensive to render than HTML. So a thing I do is render only one output format at a time (HTML much more frequently than PDF). Again, I’m fully aware that there’s already a way to accomplish this (globbing), so this part is not so important.

The other thing I’m trying to accomplish is being able to have (or point to) clear explanations of Pollen’s behavior when teaching people about it, which is happening more often these days. In that position I’m selfishly interested in eliminating un-obvious gotchas. The existence of cases in which --target fmt can sometimes (depending on a combo of how the command line path is specified and what output files happen to exist) result in files of other formats being rendered feels like a gotcha right now, mostly because it seems likely that anyone I teach is going to think of --target as a kind of limit or filter.

Adding a couple of margin notes to the docs — one for the section on --target and probably another to the docs for the automatic pagetree — would be one solution to this second thing. Another good solution would be the skip-with-warning thing I described above. That would give users the behavior they’d ordinarily expect from --target but alert them that there is some underlying complexity that they might need to dig into. I guess it might break existing projects that have come to depend on the current behavior, although that would not be unprecendented.

> What are you trying to accomplish, in the end? Personally: one of my output targets is PDF, which is much more expensive to render than HTML. So a thing I do is render only one output format at a time (HTML much more frequently than PDF). Again, I’m fully aware that there’s already a way to accomplish this (globbing), so this part is not so important. The other thing I’m trying to accomplish is being able to have (or point to) clear explanations of Pollen’s behavior when teaching people about it, which is happening more often these days. In that position I’m selfishly interested in eliminating un-obvious gotchas. The existence of cases in which `--target fmt` can sometimes (depending on a combo of how the command line path is specified and what output files happen to exist) result in files of other formats being rendered feels like a gotcha right now, mostly because it seems likely that anyone I teach is going to think of `--target` as a kind of limit or filter. Adding a couple of margin notes to the docs — one for the section on `--target` and probably another to the docs for the automatic pagetree — would be one solution to this second thing. Another good solution would be the skip-with-warning thing I described above. That would give users the behavior they’d ordinarily expect from `--target` but alert them that there is some underlying complexity that they might need to dig into. I guess it might break existing projects that have come to depend on the current behavior, although that would not be unprecendented.

it seems likely that anyone I teach is going to think of --target as a kind of limit or filter.

OK, I can see the rationale for filter-like behavior. But suppose you use --target txt — what happens to:

foo.html, which is generated from foo.poly.pm. I guess you’re saying it would just be ignored, instead of regenerated.
But what about bar.html, generated from bar.html.pm? --target txt would suppress rendering of this file, even though it doesn’t involve a poly source?

It feels like what you’re describing is a somewhat heavier hammer than just --target — maybe it should be called --filter — that has the semantics of --target and then some.

> it seems likely that anyone I teach is going to think of --target as a kind of limit or filter. OK, I can see the rationale for filter-like behavior. But suppose you use `--target txt` — what happens to: * `foo.html`, which is generated from `foo.poly.pm`. I guess you’re saying it would just be ignored, instead of regenerated. * But what about `bar.html`, generated from `bar.html.pm`? `--target txt` would suppress rendering of this file, even though it doesn’t involve a `poly` source? It feels like what you’re describing is a somewhat heavier hammer than just `--target` — maybe it should be called `--filter` — that has the semantics of `--target` and then some.

what happens to foo.html, which is generated from foo.poly.pm

You’d see a warning like foo.html cannot target txt, skipping

But what about bar.html, generated from bar.html.pm?

Same as above.

So if the generated pagetree for posts/ looked like '(pagetree-root foo.poly.pm foo.html bar.html.pm bar.html) you’d get (roughly):

❯ raco pollen render --target txt posts
pollen: rendering generated pagetree for directory /Users/joel/Documents/code/sandbox/posts
pollen: rendering /foo.poly.pm as txt
pollen: rendering /foo.poly.pm as txt
pollen: rendered /foo.txt (524 ms)
pollen: foo.html cannot target txt, skipping...
pollen: bar.html.pm cannot target txt, skipping...
pollen: bar.html cannot target txt, skipping...

And the way I imagine it, it would work the same way for rendering normal (non-automatic) pagetrees as well.

If you wanted to put this behavior in a separate flag, maybe you could call it --target-only?

> what happens to `foo.html`, which is generated from `foo.poly.pm` You’d see a warning like `foo.html cannot target txt, skipping` > But what about `bar.html`, generated from `bar.html.pm`? Same as above. So if the generated pagetree for `posts/` looked like `'(pagetree-root foo.poly.pm foo.html bar.html.pm bar.html)` you’d get (roughly): ```bash ❯ raco pollen render --target txt posts pollen: rendering generated pagetree for directory /Users/joel/Documents/code/sandbox/posts pollen: rendering /foo.poly.pm as txt pollen: rendering /foo.poly.pm as txt pollen: rendered /foo.txt (524 ms) pollen: foo.html cannot target txt, skipping... pollen: bar.html.pm cannot target txt, skipping... pollen: bar.html cannot target txt, skipping... ``` And the way I imagine it, it would work the same way for rendering normal (non-automatic) pagetrees as well. If you wanted to put this behavior in a separate flag, maybe you could call it `--target-only`?

Again, if you prefer, making some additions to the docs explaining the existing behavior, and leaving it at that, would be a resolution too (since I can get the behavior of --target-only simply by not relying on the automatic pagetree). Then at least it's cleared up that this is the intended behavior. An example edit (additions in bold):

The optional --target or -t switch specifies the default render target for multi-output source files. If the target is omitted, the renderer will use whatever target appears first in (setup:poly-targets) as the default. Note that if source is a pagetree, the renderer will attempt to render all files in that pagetree regardless of their output format.

Again, if you prefer, making some additions to the docs explaining the existing behavior, and leaving it at that, would be a resolution too (since I can get the behavior of `--target-only` simply by not relying on the automatic pagetree). Then at least it's cleared up that this is the intended behavior. An example edit (additions in **bold**): > The optional --target or -t switch specifies the **default** render target for multi-output source files. If the target is omitted, the renderer will use whatever target appears first in (setup:poly-targets) **as the default**. **Note that if `source` is a pagetree, the renderer will attempt to render all files in that pagetree regardless of their output format.**

I don’t use poly files at all so I don’t think my preference should be determinative in this case 😉

If you want a flag to limit output, I think should it be called --filter and all it should do is limit output files to a certain extension. It would have no bearing on the meaning of --target and no special meaning for poly files. So in a way it would be a cousin of --null.
If not, you’re welcome to make whatever clarification to the docs about the current behavior, like what you suggest above.

I don’t use `poly` files at all so I don’t think my preference should be determinative in this case 😉 1. If you want a flag to limit output, I think should it be called `--filter` and all it should do is limit output files to a certain extension. It would have no bearing on the meaning of `--target` and no special meaning for `poly` files. So in a way it would be a cousin of `--null`. 2. If not, you’re welcome to make whatever clarification to the docs about the current behavior, like what you suggest above.

Labels Milestones

Rendering poly sources in directory mode ignores --target #248