Some weeks ago I had a problem where someone complained about a 404 error which turned out to be an encoding issue. They were making a request for a file called ‘testäöü’ but the umlauts on the server were not the unicode codepoints for ä, ö, ü respectively but for an ‘a’ + combining diaresis, and so on. Back then I found out, that according to someone on Stackoverflow, Mac OS’ HFS+ was normalizing umlauts into this form.
Today I remembered this issue and tried to reproduce it unsuccessfully:
% echo ö | hexdump 0000000 c3 b6 0a 0000003 % echo o | hexdump 0000000 6f 0a 0000002 % touch ö % ls ö ö % ls ö | hexdump 0000000 c3 b6 0a 0000003
Same on a Linux machine:
-- Matthias Schütz 23 Apr 2012
% echo ö | hexdump 0000000 b6c3 000a 0000003 % touch ö && ls ö | hexdump 0000000 b6c3 000a 0000003 % echo o | hexdump 0000000 0a6f 0000002