How to get Character Map of a TTF Font?

TTF fonts are available with different character maps.

Tamil has tons of TTF fonts with different maps.

For project, we are receiving ebooks in Tamil

with various fonts and with various encoding.

Converting them into unicode is a tough project.

There is no open source font converter for Tamil that works well.

Though there are few online converters available, they loss all the formatting of the content.

We have to create a Tamil Font converter.

As the first step, I am exploring open-tamil library, which gives TSCII to unicode conversion utilities.

Exploring the fonts now.

First step is to generate the character map of a file.

The utility named “ttx” is helping for this.

Install this in ubuntu.

sudo apt-get install fonttools.

To get a charecter map of a TTF font, use the following command.

ttx -t cmap myfont.ttf

this will give a xml file in the name, myfont.ttx

By reading this file, we can get all the character map of a font.


<?xml version=”1.0″ encoding=”ISO-8859-1″?>
<ttFont sfntVersion=”\x00\x01\x00\x00″ ttLibVersion=”2.4″>


<tableVersion version=”0″/>

<cmap_format_4 platformID=”0″ platEncID=”3″ language=”0″>
<map code=”0x0″ name=”.null”/><!– &lt;control> –>
<map code=”0xc” name=”nonmarkingreturn”/><!– &lt;control> –>

<map code=”0x20″ name=”space”/><!– SPACE –>
<map code=”0x21″ name=”exclam”/><!– EXCLAMATION MARK –>
<map code=”0x22″ name=”quotedbl”/><!– QUOTATION MARK –>

<map code=”0x23″ name=”numbersign”/><!– NUMBER SIGN –>
<map code=”0x24″ name=”dollar”/><!– DOLLAR SIGN –>
<map code=”0x25″ name=”percent”/><!– PERCENT SIGN –>

<map code=”0x26″ name=”ampersand”/><!– AMPERSAND –>
<map code=”0x27″ name=”quotesingle”/><!– APOSTROPHE –>
<map code=”0x28″ name=”parenleft”/><!– LEFT PARENTHESIS –>

<map code=”0x29″ name=”parenright”/><!– RIGHT PARENTHESIS –>
<map code=”0x2a” name=”asterisk”/><!– ASTERISK –>
<map code=”0x2b” name=”plus”/><!– PLUS SIGN –>

<map code=”0x2c” name=”comma”/><!– COMMA –>
<map code=”0x2d” name=”hyphen”/><!– HYPHEN-MINUS –>
<map code=”0x2e” name=”period”/><!– FULL STOP –>

<map code=”0x2f” name=”slash”/><!– SOLIDUS –>
<map code=”0x30″ name=”zero”/><!– DIGIT ZERO –>
<map code=”0x31″ name=”one”/><!– DIGIT ONE –>

<map code=”0x32″ name=”two”/><!– DIGIT TWO –>
<map code=”0x33″ name=”three”/><!– DIGIT THREE –>
<map code=”0x34″ name=”four”/><!– DIGIT FOUR –>

<map code=”0x35″ name=”five”/><!– DIGIT FIVE –>
<map code=”0x36″ name=”six”/><!– DIGIT SIX –>
<map code=”0x37″ name=”seven”/><!– DIGIT SEVEN –>

<map code=”0x38″ name=”eight”/><!– DIGIT EIGHT –>
<map code=”0x39″ name=”nine”/><!– DIGIT NINE –>
<map code=”0x3a” name=”colon”/><!– COLON –>

<map code=”0x3b” name=”semicolon”/><!– SEMICOLON –>
<map code=”0x3c” name=”less”/><!– LESS-THAN SIGN –>



With this, we can explore further to remap to the unicode character list.



Some links for exploring further.




