Xfsft and font encodings

This document describes the new handling of font encodings introduced with xfsft-1.0.3.

In many cases, installing fonts with xfsft only requires creating an entry in the `fonts.scale' file (either by hand or with the `ttfmkfdir' utility) with the required encoding and running `mkfontdir'. In some cases, however, when using incorrect homebrew fonts, this will lead to wrong results. Typical symptoms are missing glyphs or glyphs at unexpected positions. The `xfd' utility (distributed with X11) can be used to examine X fonts.

This document tries to give enough information for users to fix such problems when they occur.

Background

Glyphs and characters

A character is an abstract description of a unit of a writing system. Examples of characters include the Latin capital letter A, the Arabic letter jim, and the dingbat black scissors

A glyph is a picture that might be included in a font, and can be displayed by a window system or printed by a printer.

While glyphs roughly correspond to characters, this correspondence is not, in general, one to one. For example, a font may have many variant forms of the capital letter A; a single fi ligature may correspond to the letters f and i.

As X11 does not currently distinguish between glyphs and characters, the rest of this document uses the two terms (more or less) interchangeably.

Character sets

A coded character set is a set of characters together with a function mapping integer codes -- known as codepoints -- to characters. Examples of coded character sets include US-ASCII, ISO 8859-1, KOI8-R, and JIS X 208 1990. More information on a number of common coded character sets can be found on the Character Soup page.

A coded character set need not use 8-bit integers to index characters. Many early mainframes used 6-bit character sets, and 16-bit character sets are necessary for ideographic writing systems.

The coded character set known as ISO 10646 or Unicode is a 32-bit character set that has the ambition of being the universal character set for all known writing systems, living and dead.

Font encodings

A font is a collection of glyphs. Those glyphs need to be identified in some way; the way this is done depends on the type of font.

Type 1 fonts

Glyphs in a Type 1 font are identified by glyph name. Adobe maintains a standard collection of glyph names to be used in Type 1 fonts.

A Type 1 font also contains a default encoding vector, which maps 256 codepoints to glyph names. The default encoding vector is a suggestion from the font designer as to how the font should be encoded by default; however, it is only a suggestion, and users of the font may re-encode the font to fit their needs.

It is to be noted that many fonts, especially those designed for languages used outside of Western Europe, do not follow the Adobe guidelines with respect to glyph naming.

A Type 1 font that uses incorrect glyph names is said to be incorrectly encoded.

Font designers should ensure that they use proper Adobe glyph names for all the alphabetic fonts they create.

TrueType fonts

Glyphs in a TrueType font are indexed by a number of distinct mappings, known as cmaps. cmaps are identified by two small integers, known as the platform id (`pid') and encoding id (`eid'). A font must contain at least one cmap; which cmaps are used is platform dependent.

On Microsoft platforms, text fonts contain a `Microsoft Unicode' cmap (pid=3, eid=1), while symbol fonts contain a `Microsoft Symbol' cmap (pid=3, eid=0); on the Apple platform, fonts contain an `Apple Unicode' cmap (pid=0, eid irrelevant) and/or a number of language-specific cmaps.

Due to a shortcoming of the TrueType specification, and a number of flaws in a number of systems with support for TrueType fonts, some TrueType fonts contain cmaps that misassign codepoints to glyphs. A TrueType font that lies about the identity of the glyphs that it contains is said to be incorrectly encoded.

Furthermore, a TrueType font may optionally contain a table mapping Adobe glyph names to glyphs in the font. This table is typically only used when converting to a PostScript font format (such as Type 1 or Type 42), usually for printing to a PostScript printer. Xfsft does not use this table, and we shall not speak about it any more.

Xfsft and encodings

Under X, fonts are named by their X logical font description (XLFD). An XLFD consists of a dash (`-') followed by 14 fields separated by dashes. A typical XLFD for a scalable font would be

-adobe-times-medium-r-normal--0-0-0-0-p-0-iso8859-1

Only the last two fields `iso8859-1' are of interest here; they specify the font's desired encoding. Xfsft goes out of its way to reencode the font to this encoding. This section describes how this is done.

What xfsft knows about encodings

The view that xfsft takes of an encoding consists of the following data:

the name of the encoding, together with a number of aliases (alternate names);
the size of the encoding;
an ordered collection of mappings; a mapping maps codepoints to glyph names, Unicode codepoints, or codepoints in a given cmap.

Xfsft contains a table that maps a large subset of Unicode to Adobe glyph names and Speedo indices; this allows Type 1 and Speedo fonts to be used with encodings which only contain a mapping to Unicode codepoints.

Where does encoding information come from?

Out of the box, xfsft knows about the following encodings:

ISO 10646 (Unicode)
ISO 8859-1 to 9 and 15;
KOI8-R, U, E and UNI;
microsoft-symbol and macintosh-roman; these are only expected to be useful with symbol fonts.

Furthermore, in the case of Type 1 fonts, the font's default encoding can be used by specifying an XLFD name with the encoding `adobe-fontspecific'. This is most useful with symbol fonts.

Additional encodings can be provided by using encoding files; a number of such files are provided in the `encodings' directory of the xfsft source distribution. These are plain text files, suitable for editing with any text editor (but why settle for anything but the best?). Xfsft will read them even if they are compressed or gzipped. The format of the encoding files is documented in the `encodings/README' file in the source distribution of xfsft.

Installing fonts in xfsft

Properly encoded alphabetic fonts

If the font contains proper encoding information, and xfsft already knows about the encoding you wish to use, it should be enough to add an entry for the font to the `fonts.scale' file, and run `mkfontdir'. Typical entries will look like

alamakota.pfa -ogonki-alamakota-medium-r-normal--0-0-0-0-p-0-iso8859-2

alamakota.pfa -ogonki-alamakota-medium-r-normal--0-0-0-0-p-0-iso8859-2

Symbol fonts

Type 1 symbol fonts should be installed using the `adobe-fontspecific' encoding.

In an ideal world, all TrueType symbol fonts would be installed using one of the `microsoft-symbol' and `apple-roman' encodings. A number of symbol fonts, however, are not marked as such; such fonts should be installed using `microsoft-cp1252', or, for older fonts, `microsoft-win3.1'.

In order to guarantee consistent results (especially between Type 1 and TrueType versions of the same font), it is possible to define a special encoding for a given font. This has already been done for the ZapfDingbats font; see the file `encodings/adobe-dingbats.enc' in the xfsft source distribution.

Incorrectly encoded text fonts

A number of text fonts are incorrectly encoded. Incorrect encoding is sometimes done by design, in order to make a font for an exotic script appear like an ordinary Western text font. It is often due to the font designer's laziness or incompetence; in particular, most people seem to find it easier to invent idiosyncratic glyph names rather than follow the Adobe glyph list.

There are two ways of dealing with such fonts: using them with the encoding they were designed for, and creating an ad hoc encoding file. These methods are analoguous to the methods described above for dealing with symbol fonts.

Of course, the proper fix would be to hit the font designer very hard on the head with the PLRM (preferably the first edition, as it was published as a hardcover).

Using fonts with the designer's encoding

In the case of Type 1 fonts, the font designer can specify a default encoding; this encoding is requested in xfsft by using the `adobe-fontspecific' encoding in the XLFD name. Sometimes, the font designer omitted to specify a reasonable default encoding; in this case, you should experiment with `iso8859-1', `microsoft-cp1252', `microsoft-win3.1', and `apple-roman' (`microsoft-symbol' doesn't make sense for Type 1 fonts).

TrueType fonts do not have a default encoding. However, most TrueType fonts are designed with either Microsoft or Apple platforms in mind, so one of `microsoft-cp1252', `microsoft-win3.1', or `apple-roman' should yield reasonable results.

Specifying an ad hoc encoding file

It is always possible to define an encoding file to reorder the glyphs in a font in any desired order. Again, see the `encodings/adobe-dingbats.enc' file to see how this is done.

Specifying font aliases

By following the directions above, you will find yourself with a number of fonts with unusual names -- specifying encodings such as `adobe-fontspecific', `microsoft-win3.1' etc. In order to use these fonts with standard applications, it may be useful to remap them to their proper names.

This is done by writing a `fonts.alias' file. The format of this file is similar to the format of the `fonts.dir' file, except that it maps XLFD names to XLFD names. A `fonts.alias' file might look as follows:

1
-ogonki-alamakota-medium-r-normal--0-0-0-0-p-0-iso8859-2 "-ogonki-alamakota-medium-r-normal--0-0-0-0-p-0-adobe-fontspecific"

The `fonts.alias' file is described in the mkfontdir(1) manual page.

Adobe, PostScript, TrueType, Apple, Microsoft, and possibly others are registered trademarks of their respective owners.

Juliusz Chroboczek.

Permission is hereby granted, free of charge, to any person obtaining a copy of this material, to deal in it without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of this material, and to permit persons to whom this material is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of this material.

THIS MATERIAL IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THIS MATERIAL OR THE USE OR OTHER DEALINGS IN THIS MATERIAL.