Thursday, March 24, 2005

random information about ISO-8859-1 useful to webmasters

Far too many web pages have an incorrect ``charset'' specified, then use decimal ``Numeric reference''. (I suspect that most of the time the HTML editor is to blame, not the web page author -- it looks fine on his machine ...). This causes decimal references between  and ÿ inclusive (since they are different between Macintosh and Windows, and vary even between different versions of Windows), to display completely different symbols on different machines.

When I originally wrote this page (back in 1995), The default character set for pages on the web was the ISO-8859-1 character set.

Now it seems everyone's moving to Unicode , making this section more and more obsolete.

( "HTML uses the much more complete character set called the Universal Character Set (UCS), defined in [ISO10646]. This standard defines a repertoire of thousands of characters used by communities all over the world. The character set defined in [ISO10646] is character-by-character equivalent to Unicode 2.0 ([UNICODE]) ." -- )

What is Unicode?: "Unicode provides a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.
Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use."


