Wanted: simpler support for URLs without extensions #50

Let me just preface everything I say here and for the foreseeable future (I anticipate submitting copious issues over time) that I am very attracted by Pollen's design aesthetic, and that I am hoping it will morph into a more general web publishing tool — and to accomplish that, I think it could use direct support for some publishing features that it currently doesn't support.

And with that, let me get to the first issue I am eyeing: I want to publish a website with extensionless URLs for its pages. Instead of http://example.com/page.html, I want http://example.com/page. Now, this can be accomplished by a variety of server-side configuration options which allow /page.html to be accessed at /page, but the more server-agnostic way of doing this is to generate page/index.html from page.html.pp.

Right now, the only way to accomplish this is to have /page/index.html.pp in my source. I would prefer to have /page.html.pp in my source tree, from which /page/index.html is generated, but this is impossible because pagetree rendering (by means of ->output-path and ->stem-source-path) assumes a particular relationship between source and output paths, and this relationship is not configurable.

So, in terms of end result, what I would like to see is support for pretty URLs.

The way I would like to get there is that I would like to see pagetrees extended to (optionally) specify a (source-path output-path) pair, instead of just a source path or just an output path. This would be useful in other ways — for example, if I want to keep various auxiliary files (like sitemaps and robots files) squirreled away somewhere to avoid clutter in the top level of my source tree, I could use a pagetree like '((robots.txt etc/robots.txt.pp) (sitemap.xml etc/sitemap.xml.pp)).

Let me just preface everything I say here and for the foreseeable future (I anticipate submitting copious issues over time) that I am very attracted by Pollen's design aesthetic, and that I am hoping it will morph into a more general web publishing tool — and to accomplish that, I think it could use direct support for some publishing features that it currently doesn't support. And with that, let me get to the first issue I am eyeing: I want to publish a website with extensionless URLs for its pages. Instead of http://example.com/page.html, I want http://example.com/page. Now, this can be accomplished by a variety of server-side configuration options which allow /page.html to be accessed at /page, but the more server-agnostic way of doing this is to generate page/index.html from page.html.pp. Right now, the only way to accomplish this is to have /page/index.html.pp in my source. I would prefer to have /page.html.pp in my source tree, from which /page/index.html is generated, but this is impossible because pagetree rendering (by means of ->output-path and ->stem-source-path) assumes a particular relationship between source and output paths, and this relationship is not configurable. So, in terms of end result, what I would like to see is support for pretty URLs. The way I would like to get there is that I would like to see pagetrees extended to (optionally) specify a (source-path output-path) pair, instead of just a source path or just an output path. This would be useful in other ways — for example, if I want to keep various auxiliary files (like sitemaps and robots files) squirreled away somewhere to avoid clutter in the top level of my source tree, I could use a pagetree like '((robots.txt etc/robots.txt.pp) (sitemap.xml etc/sitemap.xml.pp)).

An interesting idea, but there are some thorny side effects to consider. For instance,

If a source file corresponds to an output file at a different directory level, how should relative URLs within the page be resolved — relative to the source directory, or the destination directory? And which directory-require.rkt file should the source file rely on?
As for the pagetree specifying new output paths, what happens if you have two source files mapped to the same output path?

An interesting idea, but there are some thorny side effects to consider. For instance, 1. If a source file corresponds to an output file at a different directory level, how should relative URLs within the page be resolved — relative to the source directory, or the destination directory? And which `directory-require.rkt` file should the source file rely on? 2. As for the pagetree specifying new output paths, what happens if you have two source files mapped to the same output path?

1a. directory-require.rkt files seem to me like a source-side abstraction. That is, they participate in transforming an input into an X-expression, and therefore I would expect the source-location one to be used.

1b. Contrariwise, relative URLs within a page already refer to output files, by virtue of using …/page.html, not …page.html.pp. So, I see those as an output abstraction, and therefore would expect relative URLs to be output-relative.

Don't do that. Less facetiously, .ptree files are already validated for uniqueness of output paths, and this would just be another validation error.

Thoughts?

1a. `directory-require.rkt` files seem to me like a source-side abstraction. That is, they participate in transforming an input into an X-expression, and therefore I would expect the source-location one to be used. 1b. Contrariwise, relative URLs within a page already refer to output files, by virtue of using `…/page.html`, not `…page.html.pp`. So, I see those as an output abstraction, and therefore would expect relative URLs to be output-relative. 1. Don't do that. Less facetiously, .ptree files are already validated for uniqueness of output paths, and this would just be another validation error. Thoughts?

Using different conventions for source and output abstractions goes against the grain of the design thus far, which is to unify the two views.

Further on that point, even if you could remap URLs as you suggest in the pagetree files, the project server either a) wouldn’t know about these remappings (making the project server useless) or b) would require some dynamic remapping of URLs (complicated and fiddly).

All of this seems like a lot of trouble simply for the pleasure of having /page/index.html.pp, which will work fine, live one directory higher in the source tree as /page.html.pp.

Using different conventions for source and output abstractions goes against the grain of the design thus far, which is to unify the two views. Further on that point, even if you could remap URLs as you suggest in the pagetree files, the project server either a) wouldn’t know about these remappings (making the project server useless) or b) would require some dynamic remapping of URLs (complicated and fiddly). All of this seems like a lot of trouble simply for the pleasure of having `/page/index.html.pp`, which will work fine, live one directory higher in the source tree as `/page.html.pp`.

That is a fair point, but since on authoring side one spends a lot of time staring at their source, the pleasure is not insignificant. For example, if many of my source files are named index.html.pp then all my editor tabs have identical titles; search results give me a long list of identically named files; etc. Sure, in every one of those contexts there is some way that I can get more information about each file / tab / search hit / etc, but doing so adds friction to authoring workflow.

So, even if you want to avoid the whole business of arbitrary source-to-output mappings (and I trust your assessment that that would go against the grain of the design), do you have any other proposals for how I could keep reasonable file names (like page.html.pp) while producing pretty URLs (like …/page) without resorting to server-config gimmicks?

That is a fair point, but since on authoring side one spends a lot of time staring at their source, the pleasure is not insignificant. For example, if many of my source files are named `index.html.pp` then all my editor tabs have identical titles; search results give me a long list of identically named files; etc. Sure, in every one of those contexts there is some way that I can get more information about each file / tab / search hit / etc, but doing so adds friction to authoring workflow. So, even if you want to avoid the whole business of arbitrary source-to-output mappings (and I trust your assessment that that would go against the grain of the design), do you have any other proposals for how I could keep reasonable file names (like `page.html.pp`) while producing pretty URLs (like `…/page`) without resorting to server-config gimmicks?

Here’s one way to approach the problem.

Consider the file naming you need on the build side. Extension-free URLs are atypical and thus inevitably require some kind of “server-config gimmick.” If you’re using Apache for your production server, then that gimmick is an .htaccess file that rewrites certain URLs of the form domain.com/page to domain.com/page.html. So first, I would set up the production server with an .htaccess file, and some simple test files, to verify that works as you expect. (Racket’s web server does not support .htaccess files.)
Then you can consider the file naming you need on the dev side. If you’re mapping domain.com/page to domain.com/page.html, then it seems fair that your source file would be called domain.com/page.html.pmd (or .pp or .pm). During development, this will let you prototype and preview your source files with the project server normally (using the fully-qualified link names like domain.com/page.html.
Then you can think about how to convert the dev names to build names. Rather than relying on raco pollen render ... to build your site, you can make a tiny build script that puts your rendering into “production mode” and renders your pages so that all internal URLs of the form domain.com/page.html come out as domain.com/page.
How do you do that? The Rackety way is to use parameterize, which is almost like setting a global variable. So in your directory-require.rkt you might do this:

(define use-extension-free-urls? (make-parameter #f))

(define (make-internal-url url-with-extension)
  (if (use-extension-free-urls?)
    (remove-extension-from-url)
    url-with-extension)) ; in dev mode, you'll always end up here

Then in your build script, you can change the behavior of make-internal-url by resetting the parameter:

(parameterize ([use-extension-free-urls? #t])
  (render-pages-to-build-directory)) ; now `make-internal-url` will remove the extension

In short, use your dev and build environments in their most idiomatic manner, and then patch over the differences.

Here’s one way to approach the problem. 1. Consider the file naming you need on the build side. Extension-free URLs are atypical and thus inevitably require some kind of “server-config gimmick.” If you’re using Apache for your production server, then that gimmick is an `.htaccess` file that rewrites certain URLs of the form `domain.com/page` to `domain.com/page.html`. So first, I would set up the production server with an `.htaccess` file, and some simple test files, to verify that works as you expect. (Racket’s web server does not support `.htaccess` files.) 2. Then you can consider the file naming you need on the dev side. If you’re mapping `domain.com/page` to `domain.com/page.html`, then it seems fair that your source file would be called `domain.com/page.html.pmd` (or `.pp` or `.pm`). During development, this will let you prototype and preview your source files with the project server normally (using the fully-qualified link names like `domain.com/page.html`. 3. Then you can think about how to convert the dev names to build names. Rather than relying on `raco pollen render ...` to build your site, you can make a tiny build script that puts your rendering into “production mode” and renders your pages so that all internal URLs of the form `domain.com/page.html` come out as `domain.com/page`. 4. How do you do that? The Rackety way is to use [`parameterize`](http://docs.racket-lang.org/guide/parameterize.html?q=parameter), which is almost like setting a global variable. So in your `directory-require.rkt` you might do this: ``` racket (define use-extension-free-urls? (make-parameter #f)) (define (make-internal-url url-with-extension) (if (use-extension-free-urls?) (remove-extension-from-url) url-with-extension)) ; in dev mode, you'll always end up here ``` Then in your build script, you can change the behavior of `make-internal-url` by resetting the parameter: ``` racket (parameterize ([use-extension-free-urls? #t]) (render-pages-to-build-directory)) ; now `make-internal-url` will remove the extension ``` In short, use your dev and build environments in their most idiomatic manner, and then patch over the differences.

PS. I don’t recommend pursuing your original suggestion of mapping /page.html.pp to /page/index.html. I think that kind of directory-hopping will lead to madness.

PS. I don’t recommend pursuing your original suggestion of mapping `/page.html.pp` to `/page/index.html`. I think that kind of directory-hopping will lead to madness.

Labels Milestones

Wanted: simpler support for URLs without extensions #50