Notes on Tamil Internet Conference 2016

I attended Tamil Internet Conference 2016 at Gandhigram Rural Institute, Dindigul on Sep 9,10,11 of 2016

This time, I attended the conference with Family. Nithya and Viyan accompanied me. Nithya and Myself conducted a workshop on Python Programming for the students as Pre-Conference workshop. Happy to see that Nithya’s training method on Python is simple and easy for beginners. She is against presentations, slides. She directly jumps into handson. Once students get some taste of how easy the python programs are, they get much interest to follow the further session.

It is a paid workshop. Still the registrations were around 100. we deliberately rejected many students as we wanted to have a one computer-one person kind of handson workshop. It is a good news that many people in rural areas know about python and even readty to pay for a workshop.  Thinking on conducting more workshops there in coming days.

On the first day of conference, I presented about the project “Open-Tamil” It is a python library for to process tamil text. Mr. Muthu from Boston is a key developer. my brother Arulalan contributed a font conversion features for open-tamil. I am trying to contribute few features. We can create word games in tamil using this open-tamil. Audience appreciated on this feature.

Then, I attended other sessions related to Language technology. There were many talks on OCR, TTS, spell checker, ontology dictionary, mobile apps. Learnt that Hidden Markov Model, we can do text to speech and speech to text. Have to explore more on this.

Like previous INFITT conferences, most of the papers were to demonstrate their products. Not much internals, algorithms are discussed. None of them are in open-source. So, no way to learn, contribute, use these products. This is very sad part for the tamil development. We can see all the important needs of tamil computing. But all of these are in hidden racks. If this situation continues, the same topics will be discussed on 100th conference too. I request the academicians and research people to release their works as open source software, so that many people can contribute and create wonderful tools for public usage.

The third day had a long demonstrations of Machine translation and Text to speech. The Machine translation worked a bit. But the TTS by Prof. T.Nagarajan, from SSN Engg college, is a great tool. Gives almost native sound of a tamil speaker. But again, it was just a demo. I, as a developer, user can not use, contribute to the TTS.

All the TTS and other research are funded by the TAX money from public, by the government. But these academicians, prevent the public access for these tools. Dont know whom to contact for releasing all govt funded development works as free/open source software. Reply here if you know how to proceed further on this.

Gandhigram Rural University agreed to have Chair for INFITT on their premises.

it is a good initiative. Hope we can have continuous events, trainings, workshops and research with the university.

More than the conference papers, the preconference workshops and the half day length tutorials are much useful as they give more internals of the subjects. We have to add these events on the future conferences too.

Met many friends there. Udhayan, Badri Shesadri, Durai Manikandan, SelavaMurali Elantamil, Mugilan Murugan, Dhanesh to name a few. Discussions with these people always inspire me to do more on Tamil Computing.

Started to read on the Conference Book. It is around 500 pages. Planning with  INFITT to release this book and old conference books as epub, mobi, HTML formats.

Thanks for the INFITT team for the conference. Special Thanks to selvamurali for adding me on the organizing tasks. Got lot of experience on handling people, managing tasks on eleventh hour, planning and executing events.

Thanks to my team at my company for managing critical tasks and issues when I am on the conference.

Special Thanks to Nithya and Viyan for accompanying me all the times.








Home Sweet Home, Indian Linux Users Group, Chennai

After several months, Today attended ILUGC meeting.

Felt like being at hometown. Yes. ILUGC is where I born and  grown in Free Software world.

At todays meet, ShanthaKumar explained about Haskell and its testing methods. Haskell is a functional language. He is using genetic algorithms and AI to parse english text with a POS tagger.

Saai Akash from Jaya Enginneering College explained about Elastic Search Engine.

Both Shantha kumar and Saai are Final year engineering students. It is a good thing that students talk in tech communities.

Then Shakthi Kannan explored about Par Edit is an Emacs minor mode for editing S-expressions. He is living with Emacs. Inspired by regular talks and writings only I started to use Emacs and enjoying its benefits.

Saw three elder people attended the meet and asked good questions.

After long time, met Mohan, Stylesen, Joe Steve. The very long time ilugc friends.
As usual, the stand up meeting after the event was much informative. We went to a nearby canteen and had fun with food.

ILUGC is one of the oldest GNU/Linux users groups in the world. Yes. Indian Linux Users Group, Chennai [ ILUGC ] is spreading awareness on Free/Open Source Software (F/OSS) in Chennai since January 1998. We usually meet on the second Saturday of every month at IIT Madras, Aero space Engineering Building.

If you are in chennai on second saturdays, dont miss this meetings. You will learn tons of things and meet great people.

Thanks to all ILUGC friends for moving forward on building a great community for FOSS.






Need help – HTML to PDF with Custom Fonts

We are looking for a solution to convert html pages to A4 pdf and B7 pdf for project.

Training authors to create ebooks themself using

They can export epub, mobi, xhtml from pressbooks.

Now, few volunteers are converting xhtml to PDF by printing from Firefox.

by changing the margin and printer settings in Firefox.

Many authors find that this is difficult.

Looking for a solution to automate the process of converting XHTML to A4 and B7 size PDFs so that we add a web interface, host in server, ask authors to upload epub or xhtml file to get PDF files as outputs.

We want to use custom TTF fonts for Tamil.
Ila Sundaram-10.TTF is the font we want to use.
Get this font from
Tried to set this font via CSS using @font-face.

But the PDFs are not using this font.

Explored wkhtmltopdf

It is not rendering B7 size properly and can not set custom font.

Looking for volunteers to explore the PhantomJS or wkhtmltopdf to generate PDF files from HTML with custom font.

reply here or contact me if you are interested to volunteer.


Few issues and solutions to install AtoM

AtoM stands for Access to Memory. It is a web-based, open source application for standards-based archival description and access in a multilingual, multi-repository environment. See the AtoM homepage for more information.

I am installing this along with archivematica, an open source digital preservation system.

I followed the instructions here to install atom.

I have already installed ‘archivematica’ from
it was running on port 80.

As atom uses nginx, I changed its port to 8080

File : /etc/nginx/sites-enabled/atom

original :   listen 80;
change :   listen 8080;

Then executed
sudo service nginx restart

Now, accessed http://<ip-address&gt;:8080

But, it throwed 500 internal error. Checked /var/log/nginx/error.log

it said as ” *8 FastCGI sent in stderr: “PHP message: Unable to open PDO connection [wrapped: SQLSTATE[28000] [1045] Access denied for user ‘root’@’localhost’ (using password: NO)]” while reading response header from upstream, client:, server: _, request: “GET / HTTP/1.1”, upstream: “fastcgi://unix:/var/run/php5-fpm.atom.sock:”, host: “”

Solution: delete the file /usr/share/nginx/atom/config/config.php

Now, the web interface to configure atom is displayed.

When giving the username and password for the database, it gave the following error.

The following errors must be resolved before you can continue the installation process:

Unable to open PDO connection [wrapped: SQLSTATE[28000] [1045] Access denied for user ‘root’@’localhost’ (using password: NO)]

sudo chown -R www-data:www-data /usr/share/nginx/atom
sudo service php5-fpm restart

Now, the data are saved and atom installation is completed.

Thanks for the atom mailing list for the answers.!msg/ica-atom-users/L3jB7FQMaN8/z9zoV0GhefEJ


Run many versions of ubuntu with lxc

I am working on a connector between Google Drive OCR and WikiSource projects.

When I am developing in Ubuntu 15.04 laptop, everything works fine. But many issues were reported with the tools mutool and pdfunite.

Could not find the reasons for the issues for long time. Finally found that the users are using in Ubuntu 12.04

mutool is not available and pdfunite is older versions in ubuntu 12.04, which is working differently then ubuntu 15.04

Wanted to try ubuntu 12.04. Searched for any free VPS. But there is no free VPS to try anything quickly.

But, LXC container helped here.

We can install any ubuntu version as a mini VPS inside in our ubuntu.

sudo lxc-create -t download -n ubuntu1204  –dist ubuntu –release precise –arch amd64

sudo lxc-start -n ubuntu1204

sudo lxc-attach -n ubuntu1204

with –release  option, we can give any older version of ubuntu. It downloads that version and install a minimal version.

Using this, I checked and found the issue with the pdfunite. Changed the program to work with ubuntu 12.04

Users are happy now 🙂

Thanks for Ravi, jayantanth, Sibi, Omshivaprakash for continuous testing and giving ideas for enhancements. Realised the  importance of testing and tasting the true spirits of collaborative contributions.

See here to learn more about LXC containers.






Announcing OCR4wikisource

There are many PDF files and DJVU files in WikiSource in various languages. In many wikisource projects, those files are splited into individual page as an Image, using proofRead extension.

Contributors see those images and type them manually.

This project helps the wikisource team to OCR the entire PDF or DJVU file, using the google drive OCR. Then it will update the relevant page in the wikisource with the text.

Grab the python code from here and run in your GNU/linux machines.

It is based on

Reply here with your suggestions and improvements.

solution for ” too long for Unix domain socket ” with ansible and amazon ec2

fatal: [] => SSH Error: unix_listener: "/home/shrinivasan/.ansible/cp/" too long for Unix domain socket
while connecting to x.x.x.x:22
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.

I got the above error on ansible, when used huge hostnames ( amazon ec2 names) instead of IP addresses, in hosts file for ansible.

Ansible can not log in the the machines via ssh.

To solve this, in /etc/ansible/ansible.cfg file, enable the following.
control_path = %(directory)s/%%h-%%r

After this, ansible can login to remote servers and run the scripts.