The following are notes I took during a Unicode tech talk by Dave Cross.
What’s the problem?
ASCII has 128 characters. Extended ASCII character sets can have 256 characters, e.g. ISO-8859-1. This is the limit of one byte.
Unicode has 110,000 characters; we need more bytes!
UCS (Universal Character Set) Transformation Format – 8 bit; represents Unicode characters as 1 – 4 bytes and is the de facto standard encoding on the web. It also has excellent support in Perl (as of 5.14).