default view in browser uses Western instead of Unicode encoding
#44
Closed
opened 10 years ago by gour
·
17 comments
Loading…
Reference in New Issue
There is no content yet.
Delete Branch '%!s(<nil>)'
Deleting a branch is permanent. It CANNOT be undone. Continue?
Hello,
doing my 1st steps with Pollen, but testing with the text having Croatian native characters (č,ć,đ,š,ž) and although my source file is using UTF-8:
stil when I render it in browser (Firefox) it is rendered using Western encoding and the text is not correct until I select Unicode encoding?
Strangely enough, rendering in the terminal generates correct file.
What's the matter?
Under the HTTP 1.1 specification, browsers use ISO-8859-1 (Western) text encoding unless told otherwise (e.g. by an HTTP header set by the server, or an encoding header in the document).
PS. The
file
command in the terminal doesn’t know the encoding either. But it uses a more elaborate set of tests to make an educated guess. As a rule, web browsers never do this — they only decode content according to explicit instructions.What about browser's settings to use Unicode encoding?
That’s really more of a question about browser UI. I have no special expertise. As I understand Firefox, you can set a fallback encoding in the preferences that’s used when a file doesn’t declare an encoding. As for the “Character Encoding” option in the main menu, it’s unclear to me whether this merely displays the current page encoding, or allows you to override the encoding of all pages during a browser session (in my fiddling around with it, I haven’t detected a consistent pattern).
But the general point remains: a web browser expects a file to declare its own encoding.
OK. Thank you for your input.
I think you can declare the character encoding from the HTTP headers, even if your file is plaintext and doesn't have a charset HTML attribute: https://www.w3.org/International/articles/definitions-characters/#httpheader
Setting that would take place I guess in the server. Is there a way for us to modify the headers from the pollen server?
No, because the files aren’t going to be served dynamically from the Pollen server.
Who's doing the serving then? Whether or not the files are generated dynamically shouldn't affect the http headers on the response from the server.
The Pollen project server is just a convenience for previewing files during development.
The idea is that when you’re done, you move your rendered files over to your production server (for instance, I use Apache).
IOW, though it would be possible to modify the Pollen project server to do what you suggest, it doesn’t solve the problem in a portable way.
Thus, the best practice is for each file to declare its own encoding.
Text files can't declare their own encodings though, and it seems like at least Firefox has trouble detecting the correct encoding automatically. Therefore for text files you have to rely on the HTTP header.
Also, considering that it seems like a goal of Pollen to have good support for Unicode, and that the Pollen project server is supposed to be used as a convenience for previewing files, it would be a good feature to automatically set the charset in the MIME-TYPE to
utf-8
.I've created a pull request to make the necessary changes in #165. Most of the work involves switching from the wrapper
serve/servlet
to the underlyingdispatcher-sequence
andserve/launch/wait
so that we can manually specify a function for computing the mime-type. That's the only semantic change I meant to make, but its a big diff because it's a little bit of a copy-paste of the implementation ofserve/servlet
, albeit simplified.Let me know if you disagree and don't think this is a worthwile feature.
After a little more research (of course after I already did the work and submitted the pull request!) I realized I was wrong that text files can't declare their own encodings. You can use a byte order mark , which means inserting a special character at the beginning of the file. The
BOM
character in UTF-8 has the encoding0xEF 0xBB 0xBF
. I've tested this in Firefox myself, and Pollen passes it through no problem. I'll close my pull request also.True, though a higher goal is minimizing magic behavior.
A nice way to add the byte order mark to text files automatically if you want it is to use the following
template.txt.p
:True, but still magical. I’m ready to formulate some kind of Murphy’s Law of default software behavior — as soon as you impose a change like this, the next bug filed will be “Pollen is erroneously putting a BOM at the front of my text output”.
I agree for sure! I was just putting it here to share the solution in case somebody else finds this looking for a solution to the same problem. Would definitely be weird to automatically stick in the BOM without telling anyone.
Maybe you could find an existing place (or propose a new place) in the Pollen documentation for this information? I’m fully in favor of preserving discoveries. But anything in a GH issue is unlikely to be found again.
Or, depending on how strongly you feel about text handling, it could become a
->text
function in a newpollen/template/text
module that would parallel->html
. One of the options could be encoding, that would handle appending the BOM.