Some weeks ago I had a problem where someone complained about a 404 error which turned out to be an encoding issue. They were making a request for a file called ‘testäöü’ but the umlauts on the server were not the unicode codepoints for ä, ö, ü respectively but for an ‘a’ + combining diaresis, and so on. Back then I found out, that according to someone on Stackoverflow, Mac OS’ HFS+ was normalizing umlauts into this form.
Today I remembered this issue and tried to reproduce it unsuccessfully:
% echo ö | hexdump
0000000 c3 b6 0a
0000003
% echo o | hexdump
0000000 6f 0a
0000002
% touch ö
% ls ö
ö
% ls ö | hexdump
0000000 c3 b6 0a
0000003
Same on a Linux machine:
% echo ö | hexdump
0000000 b6c3 000a
0000003
% touch ö && ls ö | hexdump
0000000 b6c3 000a
0000003
% echo o | hexdump
0000000 0a6f
0000002
-- Matthias Schütz 23 Apr 2012