FreeType » Docs » Core API » Character Mapping
Character Mapping¶
Synopsis¶
This section holds functions and structures that are related to mapping character input codes to glyph indices.
Note that for many scripts the simplistic approach used by FreeType of mapping a single character to a single glyph is not valid or possible! In general, a higher-level library like HarfBuzz or ICU should be used for handling text strings.
FT_CharMap¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
  typedef struct FT_CharMapRec_*  FT_CharMap;
A handle to a character map (usually abbreviated to ‘charmap’). A charmap is used to translate character codes in a given encoding into glyph indexes for its parent's face. Some font formats may provide several charmaps per font.
Each face object owns zero or more charmaps, but only one of them can be ‘active’, providing the data used by FT_Get_Char_Index or FT_Load_Char.
The list of available charmaps in a face is available through the face->num_charmaps and face->charmaps fields of FT_FaceRec.
The currently active charmap is available as face->charmap. You should call FT_Set_Charmap to change it.
note
When a new face is created (either through FT_New_Face or FT_Open_Face), the library looks for a Unicode charmap within the list and automatically activates it. If there is no Unicode charmap, FreeType doesn't set an ‘active’ charmap.
also
See FT_CharMapRec for the publicly accessible fields of a given character map.
FT_CharMapRec¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
  typedef struct  FT_CharMapRec_
  {
    FT_Face      face;
    FT_Encoding  encoding;
    FT_UShort    platform_id;
    FT_UShort    encoding_id;
  } FT_CharMapRec;
The base charmap structure.
fields
| face | A handle to the parent face object. | 
| encoding | An  | 
| platform_id | An ID number describing the platform for the following encoding ID. This comes directly from the TrueType specification and gets emulated for other formats. | 
| encoding_id | A platform-specific encoding number. This also comes from the TrueType specification and gets emulated similarly. | 
FT_Encoding¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
  typedef enum  FT_Encoding_
  {
    FT_ENC_TAG( FT_ENCODING_NONE, 0, 0, 0, 0 ),
    FT_ENC_TAG( FT_ENCODING_MS_SYMBOL, 's', 'y', 'm', 'b' ),
    FT_ENC_TAG( FT_ENCODING_UNICODE,   'u', 'n', 'i', 'c' ),
    FT_ENC_TAG( FT_ENCODING_SJIS,    's', 'j', 'i', 's' ),
    FT_ENC_TAG( FT_ENCODING_PRC,     'g', 'b', ' ', ' ' ),
    FT_ENC_TAG( FT_ENCODING_BIG5,    'b', 'i', 'g', '5' ),
    FT_ENC_TAG( FT_ENCODING_WANSUNG, 'w', 'a', 'n', 's' ),
    FT_ENC_TAG( FT_ENCODING_JOHAB,   'j', 'o', 'h', 'a' ),
    /* for backward compatibility */
    FT_ENCODING_GB2312     = FT_ENCODING_PRC,
    FT_ENCODING_MS_SJIS    = FT_ENCODING_SJIS,
    FT_ENCODING_MS_GB2312  = FT_ENCODING_PRC,
    FT_ENCODING_MS_BIG5    = FT_ENCODING_BIG5,
    FT_ENCODING_MS_WANSUNG = FT_ENCODING_WANSUNG,
    FT_ENCODING_MS_JOHAB   = FT_ENCODING_JOHAB,
    FT_ENC_TAG( FT_ENCODING_ADOBE_STANDARD, 'A', 'D', 'O', 'B' ),
    FT_ENC_TAG( FT_ENCODING_ADOBE_EXPERT,   'A', 'D', 'B', 'E' ),
    FT_ENC_TAG( FT_ENCODING_ADOBE_CUSTOM,   'A', 'D', 'B', 'C' ),
    FT_ENC_TAG( FT_ENCODING_ADOBE_LATIN_1,  'l', 'a', 't', '1' ),
    FT_ENC_TAG( FT_ENCODING_OLD_LATIN_2, 'l', 'a', 't', '2' ),
    FT_ENC_TAG( FT_ENCODING_APPLE_ROMAN, 'a', 'r', 'm', 'n' )
  } FT_Encoding;
  /* these constants are deprecated; use the corresponding `FT_Encoding` */
  /* values instead                                                      */
#define ft_encoding_none            FT_ENCODING_NONE
#define ft_encoding_unicode         FT_ENCODING_UNICODE
#define ft_encoding_symbol          FT_ENCODING_MS_SYMBOL
#define ft_encoding_latin_1         FT_ENCODING_ADOBE_LATIN_1
#define ft_encoding_latin_2         FT_ENCODING_OLD_LATIN_2
#define ft_encoding_sjis            FT_ENCODING_SJIS
#define ft_encoding_gb2312          FT_ENCODING_PRC
#define ft_encoding_big5            FT_ENCODING_BIG5
#define ft_encoding_wansung         FT_ENCODING_WANSUNG
#define ft_encoding_johab           FT_ENCODING_JOHAB
#define ft_encoding_adobe_standard  FT_ENCODING_ADOBE_STANDARD
#define ft_encoding_adobe_expert    FT_ENCODING_ADOBE_EXPERT
#define ft_encoding_adobe_custom    FT_ENCODING_ADOBE_CUSTOM
#define ft_encoding_apple_roman     FT_ENCODING_APPLE_ROMAN
An enumeration to specify character sets supported by charmaps. Used in the FT_Select_Charmap API function.
note
Despite the name, this enumeration lists specific character repertoires (i.e., charsets), and not text encoding methods (e.g., UTF-8, UTF-16, etc.).
Other encodings might be defined in the future.
values
| FT_ENCODING_NONE | The encoding value 0 is reserved for all formats except BDF, PCF, and Windows FNT; see below for more information. | 
| FT_ENCODING_UNICODE | The Unicode character set. This value covers all versions of the Unicode repertoire, including ASCII and Latin-1. Most fonts include a Unicode charmap, but not all of them. For example, if you want to access Unicode value U+1F028 (and the font contains it), use value 0x1F028 as the input value for  | 
| FT_ENCODING_MS_SYMBOL | Microsoft Symbol encoding, used to encode mathematical symbols and wingdings. For more information, see ‘https://www.microsoft.com/typography/otspec/recom.htm#non-standard-symbol-fonts’, ‘http://www.kostis.net/charsets/symbol.htm’, and ‘http://www.kostis.net/charsets/wingding.htm’. This encoding uses character codes from the PUA (Private Unicode Area) in the range U+F020-U+F0FF. | 
| FT_ENCODING_SJIS | Shift JIS encoding for Japanese. More info at ‘https://en.wikipedia.org/wiki/Shift_JIS’. See note on multi-byte encodings below. | 
| FT_ENCODING_PRC | Corresponds to encoding systems mainly for Simplified Chinese as used in People's Republic of China (PRC). The encoding layout is based on GB 2312 and its supersets GBK and GB 18030. | 
| FT_ENCODING_BIG5 | Corresponds to an encoding system for Traditional Chinese as used in Taiwan and Hong Kong. | 
| FT_ENCODING_WANSUNG | Corresponds to the Korean encoding system known as Extended Wansung (MS Windows code page 949). For more information see ‘https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit949.txt’. | 
| FT_ENCODING_JOHAB | The Korean standard character set (KS C 5601-1992), which corresponds to MS Windows code page 1361. This character set includes all possible Hangul character combinations. | 
| FT_ENCODING_ADOBE_LATIN_1 | Corresponds to a Latin-1 encoding as defined in a Type 1 PostScript font. It is limited to 256 character codes. | 
| FT_ENCODING_ADOBE_STANDARD | Adobe Standard encoding, as found in Type 1, CFF, and OpenType/CFF fonts. It is limited to 256 character codes. | 
| FT_ENCODING_ADOBE_EXPERT | Adobe Expert encoding, as found in Type 1, CFF, and OpenType/CFF fonts. It is limited to 256 character codes. | 
| FT_ENCODING_ADOBE_CUSTOM | Corresponds to a custom encoding, as found in Type 1, CFF, and OpenType/CFF fonts. It is limited to 256 character codes. | 
| FT_ENCODING_APPLE_ROMAN | Apple roman encoding. Many TrueType and OpenType fonts contain a charmap for this 8-bit encoding, since older versions of Mac OS are able to use it. | 
| FT_ENCODING_OLD_LATIN_2 | This value is deprecated and was neither used nor reported by FreeType. Don't use or test for it. | 
| FT_ENCODING_MS_SJIS | Same as FT_ENCODING_SJIS. Deprecated. | 
| FT_ENCODING_MS_GB2312 | Same as FT_ENCODING_PRC. Deprecated. | 
| FT_ENCODING_MS_BIG5 | Same as FT_ENCODING_BIG5. Deprecated. | 
| FT_ENCODING_MS_WANSUNG | Same as FT_ENCODING_WANSUNG. Deprecated. | 
| FT_ENCODING_MS_JOHAB | Same as FT_ENCODING_JOHAB. Deprecated. | 
note
When loading a font, FreeType makes a Unicode charmap active if possible (either if the font provides such a charmap, or if FreeType can synthesize one from PostScript glyph name dictionaries; in either case, the charmap is tagged with FT_ENCODING_UNICODE). If such a charmap is synthesized, it is placed at the first position of the charmap array.
All other encodings are considered legacy and tagged only if explicitly defined in the font file. Otherwise, FT_ENCODING_NONE is used.
FT_ENCODING_NONE is set by the BDF and PCF drivers if the charmap is neither Unicode nor ISO-8859-1 (otherwise it is set to FT_ENCODING_UNICODE). Use FT_Get_BDF_Charset_ID to find out which encoding is really present. If, for example, the cs_registry field is ‘KOI8’ and the cs_encoding field is ‘R’, the font is encoded in KOI8-R.
FT_ENCODING_NONE is always set (with a single exception) by the winfonts driver. Use FT_Get_WinFNT_Header and examine the charset field of the FT_WinFNT_HeaderRec structure to find out which encoding is really present. For example, FT_WinFNT_ID_CP1251 (204) means Windows code page 1251 (for Russian).
FT_ENCODING_NONE is set if platform_id is TT_PLATFORM_MACINTOSH and encoding_id is not TT_MAC_ID_ROMAN (otherwise it is set to FT_ENCODING_APPLE_ROMAN).
If platform_id is TT_PLATFORM_MACINTOSH, use the function FT_Get_CMap_Language_ID to query the Mac language ID that may be needed to be able to distinguish Apple encoding variants. See
https://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/Readme.txt
to get an idea how to do that. Basically, if the language ID is 0, don't use it, otherwise subtract 1 from the language ID. Then examine encoding_id. If, for example, encoding_id is TT_MAC_ID_ROMAN and the language ID (minus 1) is TT_MAC_LANGID_GREEK, it is the Greek encoding, not Roman. TT_MAC_ID_ARABIC with TT_MAC_LANGID_FARSI means the Farsi variant of the Arabic encoding.
FT_ENC_TAG¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
#ifndef FT_ENC_TAG
#define FT_ENC_TAG( value, a, b, c, d )                             \
          value = ( ( FT_STATIC_BYTE_CAST( FT_UInt32, a ) << 24 ) | \
                    ( FT_STATIC_BYTE_CAST( FT_UInt32, b ) << 16 ) | \
                    ( FT_STATIC_BYTE_CAST( FT_UInt32, c ) <<  8 ) | \
                      FT_STATIC_BYTE_CAST( FT_UInt32, d )         )
#endif /* FT_ENC_TAG */
This macro converts four-letter tags into an unsigned long. It is used to define ‘encoding’ identifiers (see FT_Encoding).
note
Since many 16-bit compilers don't like 32-bit enumerations, you should redefine this macro in case of problems to something like this:
  #define FT_ENC_TAG( value, a, b, c, d )  value
to get a simple enumeration without assigning special numbers.
FT_Select_Charmap¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
  FT_EXPORT( FT_Error )
  FT_Select_Charmap( FT_Face      face,
                     FT_Encoding  encoding );
Select a given charmap by its encoding tag (as listed in freetype.h).
inout
| face | A handle to the source face object. | 
input
| encoding | A handle to the selected encoding. | 
return
FreeType error code. 0 means success.
note
This function returns an error if no charmap in the face corresponds to the encoding queried here.
Because many fonts contain more than a single cmap for Unicode encoding, this function has some special code to select the one that covers Unicode best (‘best’ in the sense that a UCS-4 cmap is preferred to a UCS-2 cmap). It is thus preferable to FT_Set_Charmap in this case.
FT_Set_Charmap¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
  FT_EXPORT( FT_Error )
  FT_Set_Charmap( FT_Face     face,
                  FT_CharMap  charmap );
Select a given charmap for character code to glyph index mapping.
inout
| face | A handle to the source face object. | 
input
| charmap | A handle to the selected charmap. | 
return
FreeType error code. 0 means success.
note
This function returns an error if the charmap is not part of the face (i.e., if it is not listed in the face->charmaps table).
It also fails if an OpenType type 14 charmap is selected (which doesn't map character codes to glyph indices at all).
FT_Get_Charmap_Index¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
  FT_EXPORT( FT_Int )
  FT_Get_Charmap_Index( FT_CharMap  charmap );
Retrieve index of a given charmap.
input
| charmap | A handle to a charmap. | 
return
The index into the array of character maps within the face to which charmap belongs. If an error occurs, -1 is returned.
FT_Get_Char_Index¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
Return the glyph index of a given character code. This function uses the currently selected charmap to do the mapping.
input
| face | A handle to the source face object. | 
| charcode | The character code. | 
return
The glyph index. 0 means ‘undefined character code’.
note
If you use FreeType to manipulate the contents of font files directly, be aware that the glyph index returned by this function doesn't always correspond to the internal indices used within the file. This is done to ensure that value 0 always corresponds to the ‘missing glyph’. If the first glyph is not named ‘.notdef’, then for Type 1 and Type 42 fonts, ‘.notdef’ will be moved into the glyph ID 0 position, and whatever was there will be moved to the position ‘.notdef’ had. For Type 1 fonts, if there is no ‘.notdef’ glyph at all, then one will be created at index 0 and whatever was there will be moved to the last index – Type 42 fonts are considered invalid under this condition.
FT_Get_First_Char¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
Return the first character code in the current charmap of a given face, together with its corresponding glyph index.
input
| face | A handle to the source face object. | 
output
| agindex | Glyph index of first character code. 0 if charmap is empty. | 
return
The charmap's first character code.
note
You should use this function together with FT_Get_Next_Char to parse all character codes available in a given charmap. The code should look like this:
  FT_ULong  charcode;
  FT_UInt   gindex;
  charcode = FT_Get_First_Char( face, &gindex );
  while ( gindex != 0 )
  {
    ... do something with (charcode,gindex) pair ...
    charcode = FT_Get_Next_Char( face, charcode, &gindex );
  }
Be aware that character codes can have values up to 0xFFFFFFFF; this might happen for non-Unicode or malformed cmaps. However, even with regular Unicode encoding, so-called ‘last resort fonts’ (using SFNT cmap format 13, see function FT_Get_CMap_Format) normally have entries for all Unicode characters up to 0x1FFFFF, which can cause a lot of iterations.
Note that *agindex is set to 0 if the charmap is empty. The result itself can be 0 in two cases: if the charmap is empty or if the value 0 is the first valid character code.
FT_Get_Next_Char¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
Return the next character code in the current charmap of a given face following the value char_code, as well as the corresponding glyph index.
input
| face | A handle to the source face object. | 
| char_code | The starting character code. | 
output
| agindex | Glyph index of next character code. 0 if charmap is empty. | 
return
The charmap's next character code.
note
You should use this function with FT_Get_First_Char to walk over all character codes available in a given charmap. See the note for that function for a simple code example.
Note that *agindex is set to 0 when there are no more codes in the charmap.
FT_Load_Char¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
Load a glyph into the glyph slot of a face object, accessed by its character code.
inout
| face | A handle to a target face object where the glyph is loaded. | 
input
| char_code | The glyph's character code, according to the current charmap used in the face. | 
| load_flags | A flag indicating what to load for this glyph. The  | 
return
FreeType error code. 0 means success.
note
This function simply calls FT_Get_Char_Index and FT_Load_Glyph.
Many fonts contain glyphs that can't be loaded by this function since its glyph indices are not listed in any of the font's charmaps.
If no active cmap is set up (i.e., face->charmap is zero), the call to FT_Get_Char_Index is omitted, and the function behaves identically to FT_Load_Glyph.