Unicode is a specific implementation of the something called the Universal Character Set (UCS).
Characters (00)00-7F correspond to ASCII; 00-FF correspond to Latin-1 (Code Page 850). Characters E000-F8FF are set aside for private character sets.
Although characters often found in accented form have their own characters, there are provisions to add accents and similar markings to any character. Presumably these can also be used to build Chinese characters. The Universal Character Set can be implemented without allowing combining characters. Unicode, however, requires that the UCS be implemented with support for combining characters.
refer to: http://www.cl.cam.ac.uk/~mgk25/unicode.html#ucs, UTF-8 and Unicode FAQ for Unix/Linux by Markus Kuhn
Return To Index Copyright 1994-2008 by Donald Kenney.