Unicode Perl Best Practices

The following are notes I took during a Unicode tech talk by Dave Cross.

What’s the problem?

ASCII has 128 characters. Extended ASCII character sets can have 256 characters, e.g. ISO-8859-1. This is the limit of one byte.

Unicode has 110,000 characters; we need more bytes!


UCS (Universal Character Set) Transformation Format – 8 bit; represents Unicode characters as 1 – 4 bytes and is the de facto standard encoding on the web. It also has excellent support in Perl (as of 5.14).

