Unicode

During the 1980's, since most machines had standardized on storing information in 8-bit bytes (which can hold a value from 0 to 255), manufacturers started using the extra range of values between 128 and 255 for their own encoding of fancy characters that had not been included in the ASCII standard. This began with IBM PCs which used it for so-called graphics characters that could be used to build simple line drawings and tables, and continued with Apple and Adobe using them to encode fancy characters (like ligatures, and accented characters for foreign languages) that were useful in high-quality typesetting.

In the late 1980's a new standard was proposed called Unicode in order to tame the situation once again. It was decided, however, to go much further than just standardizing the upper 128 values though. Instead, Unicode uses 2-byte (16-bit) values, which gives it a range of 0 to 65,535. With this many character codes, Unicode has enough range to specify standard encodings for all the characters in every known human writing system. In fact, it was originally developed by researchers at Xerox who were working on multi-lingual word processing.

While Unicode has not been widely implemented yet, many believe it will become an important standard as the world-wide-web forces us to create documents viewable in browsers all over the world (where many different encoding standards are in use). Because of its connection to the web, Java is the first language to include support for Unicode as part of the core language specification. To specify a Unicode character you give its code as a 4-digit hexadecimal value following a \u. Since the first 128 Unicode codes are the same as ASCII, the exclamation point can also be written \u0021 (since 33 decimal is 21 in hexadecimal).

(It turns out that while Unicode is part of the Java specification, some current implementations only allow you to use the first 256 Unicode characters.)

	This page copyright ©1998 by Joshua S. Hodas. It was built with Frontier on a Macintosh . Last rebuilt on Sat, Sep 5, 1998 at 1:20:22 AM.
http://www.cs.hmc.edu/~hodas/courses/cs5/week_02/lecture/unicode.html

Unicode

This page copyright ©1998 by Joshua S. Hodas. It was built with Frontier on a Macintosh . Last rebuilt on Sat, Sep 5, 1998 at 1:20:22 AM.

http://www.cs.hmc.edu/~hodas/courses/cs5/week_02/lecture/unicode.html