Umlaut Normalization in HFS+

Some weeks ago I had a problem where someone complained about a 404 error which turned out to be an encoding issue. They were making a request for a file called ‘testäöü’ but the umlauts on the server were not the unicode codepoints for ä, ö, ü respectively but for an ‘a’ + combining diaresis, and so on. Back then I found out, that according to someone on Stackoverflow, Mac OS’ HFS+ was normalizing umlauts into this form.

Today I remembered this issue and tried to reproduce it unsuccessfully:

% echo ö | hexdump
0000000 c3 b6 0a    
0000003
% echo o | hexdump
0000000 6f 0a    
0000002
% touch ö
% ls ö
ö
% ls ö | hexdump
0000000 c3 b6 0a    
0000003

Same on a Linux machine:

% echo ö | hexdump
0000000 b6c3 000a                              
0000003
% touch ö && ls ö | hexdump
0000000 b6c3 000a                              
0000003
% echo o | hexdump
0000000 0a6f                                   
0000002
-- Matthias Schütz 23 Apr 2012

Impressum