Go to the previous, next chapter.

Unicode

[ed: This is a summary of the Unicode info I've gleaned from the net recently, the whole Unicode issue needs to be addressed better by the FAQ...someday... someday...I'll get to reorganize the whole thing]

What Is Unicode?

Charles A. Bigelow notes:

The authors of the Unicode standard emphasize the fact that Unicode is a character encoding, not a glyph encoding. This might seem like a metaphysical distinction, in which characters have some ``semantic'' content (that is, they signify something to literates) and and glyphs are particular instantiations or renderings of characters---Plato talked about this kind of stuff---but in practice it means that most ligatures are not represented in Unicode, nor swash variants, nor figure variants (except for superior and inferior, which are semantically distinct from baseline figures), and so on.

For further information, consult The Unicode Standard: Worldwide Character Encoding Version 1.0, Vol. 1 (alphabets & symbols) and Vol 2. (Chinese, Japanese, Korean characters), by The Unicode Consortium, Addison Wesley Publishing Co, 1991, ISBN 0-201-56788-1, 0-201-60845-6.

What is the Unicode Consortium?

The Unicode Consortium is an international body responsible for maintaining the Unicode standard. Their email address is <unicode-inc@unicode.org>

To obtain more information on Unicode or to order their printed material and/or diskettes contact:

Steven A. Greenfield Unicode Office Manager 1965 Charleston Road Mountain View, CA 94043 Tel. 415-966-4189 Fax. 415-966-1637

Unicode Editing

James Matthew Farrow contributes:

I use `sam' for all by text editing. It is X editor based on an editor for the blit called jim. Papers describing sam as well as a distribution of sam itself are available for ftp from research.att.com. The sam there is a Unix port of the Plan 9 version. Plan 9 is a full unicode operating system, even around before NT! The libraries sam is built upon therefore support 16 bit wide characters. The graphics library, supplied with it at present does not. However they may be planning to distribute a new version which does soon. The library just plugs in replacing the library that comes with sam. No modification is necessary. Character are stored using the utf-2 encoding.

All of the files I had before I started working with sam were 7 bit ascii so no conversion was needed. Now I have ditched xterm in favour of 9term: a terminal emulator in the style of 81/2 (the Plan 9 interface). This lets me type Unicode characters on the command line, as part of filenames, in mail, wherever and most Unix utilities cope without modification. This is about to be released. I'm looking for beta testers. ;-)

Is a special keyboard required?

No. ASCII Characters are typed as normal. Common characters above 0x7f are typed using two letter abbreviations. The table is similar to the troff special character codes, e.g, Alt-12 gives you a 1/2, Alt-'e gives you e acute, Alt-bu a bullet and so on. This table is hardwired into the library at present but is trivial to change. Other codes are accessed by typing their hex value, for instance the smiley is Alt-X263a (0x263a being a smiley character in the Unicode character set).

Is roman-to-Unicode conversion available?

All normal 7 bit ascii characters are encoded as themselves so no translation is needed. There are conversion routines in the library (runetochar and chartorune) which will do the conversion and it should be pretty simple to convert files already in another format. You would have to write something to do the transliteration yourself. A small patch to the system would let you enter different language `modes' for text entry.

Are there PostScript or TrueType fonts available?

Apparently there is a version of the Lucida fonts by Bigelow and Holmes which support Unicode. This is the information I have on them.

[ed: quoting another source]

[Windows NT] will ship with a Unicode TrueType font containing approximately 1,500 characters. The font is called "Lucida Sans Unicode" and was specifically designed by Bigelow and Holmes for Microsoft to contain the following Unicode sets:

ASCII
Latin 1
European Latin
Extended Latin
Standard Phonetic
Modifier Letters
Generic Diacritical
Greek 
Cyrillic 
Extended Cyrillic
Hebrew
Currency Symbols
Letterlike Symbols
Arrows
Mathematical Operators
Super & Subscript
Form & Chart Components
Blocks
Geometric Shapes
Miscellaneous Technical
Miscellaneous Dingbats

The bitmap fonts which comes with the utf version of the libXg graphics library (the library upon which sam is built) support a sparse subset of the full character set. That is, only a few of them have glyphs at present. A font editor such as xfedor would let you add more. The list of those currently available is pretty much as the above list.

I use 9term and sam as a matter of course now and have for several months. I enjoy the convenience of putting special characters and accented characters in my mail as well as being able to do some phonetic work all in the one terminal/editor suite.


Excerpted from The comp.fonts FAQ, Copyright © 1992-96 by Norman Walsh