Wed 22 Jul 2009

Text encoding strikes again

I'm doing some work with Clojure and Compojure. I use VimClojure for editing, which uses Nailgun to evaluate code snippets. My files are all encoded as UTF-8.

I was quite surprised to see my UTF-8 characters appearing as garbage in the pages served by Jetty. My files were UTF-8, Clojure reads as UTF-8 by default, I was setting the Content-Type correctly… so what was going on?

The answer is that the Nailgun process was using the default Java character set on Mac OS X: MacRoman. Everything looked fine from Vim, but apparently the strings were getting chewed up as they passed into the Nailgun server. Loading the .clj files from disk made the problem go away, as did following these instructions on changing the character set when launching ng-server.

