Looking for Commercial Tamil Translators


I got a request for translating slideshows in Tamil for the following subjects, from a college professor.

  1. Embedded Systems
  2. Discrete Structures

The Embedded systems subject has electronics and programming concepts. The Discrete Structures has full of high level mathematics. Each book have around 500 pages.

The style can be with mixed of tamil and English(for tech terms). It is a paid job. We may get more content on various subjects to translate, if we complete this.

If you are interested in this translation work, comment here or send an email to tshrinivasan@gmail.com with your profile and translation experience information.

Share this info to your known Tamil Translators.

 

Minutes – ILUGC Feb 2017 meet


Indian Linux Users Group, Chennai community is meeting every month second saturday at Aerospace Engineering, IIT Madras.

Yesterday, we had out feb 2017 meeting.

Ajay started with Various open source licenses available and explored their pros and cons. Explained how the Open Core business model is helping many companies to do their business by open sourcing the core of their software with releasing the other components as proprietary software.

See the slides here

http://slides.com/danatic/licensing#/

 

Then, Viswaprasath from Mozilla Tamilnadu community explained about Firefox’s new web extensions api. Now we can build cross browser extensions using simple HTML/JavaScript/CSS stack. No need to play around with XUL. He explained the architecture of a simple plugin he developed.

Few links to explore on this

http://thehackernews.com/2015/08/mozilla-firefox-web-extensions.html

https://developer.mozilla.org/en-US/Add-ons/WebExtensions

https://wiki.mozilla.org/WebExtensions

https://hacks.mozilla.org/2015/09/lets_write_a_webextension/

https://developer.mozilla.org/en-US/Add-ons/WebExtensions/Your_first_WebExtension

 

Then, Karthik from Mozilla Tamilnadu, explored webVR. Virtual reality using browser. Now with Three.js we can create 3d worlds that can be viewed on browser itself.

A-frame is a Javascript framework on top of threejs.

Few links

https://aframe.io/

https://aframe.io/docs/0.5.0/introduction/

https://aframe.io/aframe-presentation-kit/

Then, he explained how the mozilla Tamilnadu community is working to spread Free/Open Software on colleges and organizations.

Join the awesome community to learn and contribute for free software.

https://mozillatn.github.io/

https://www.facebook.com/MozillaTN

https://web.telegram.org/#/im?p=@mozillatnc

 

Then, I gave a lightning talk on few projects ideas to do. A firefox plugin to help proofreading tamil wikisource, flipboard alternate in tamil, epub cleaning for FreeTamilEbooks.com, Download report for wikisource ebooks, web application for OCR4WikiSource, Translating city/street names in Tamil for building maps in Tamil. Will write a new post with all the details of these project ideas.

Asked for contributors and ideas. Students from S.Joseph Institute of Tech accepted to help on these projects. We can have a hackathon to do these projects.

Reply here if you know any place to conduct a one day hackathon.

Finally, asked all to join in ILUGC mailing list at https://www.freelists.org/list/ilugc

Our meetings will end on the cafe nearby. old mohan, new mohan, yogesh, myself and other one(sorry dude, still trying to get your name) had great discussions at cafe on building Tamil Text to Speech engine, Advantages of go over python and a lot.

Returned to home by bus with Mohan, discussing various tech, social, academic, industry trends. Interacting with energetic young people makes me feel encouraged to learn new things always.

Thanks for all the speakers and participants for building a wonderful community for GNU/Linux. Let us hope to have more events like hackathon, FossConf etc, this year.

Few photos

https://goo.gl/photos/T3TSFw6vfcMFfwyN8

 

 

 

 

 

 

 

 

 

Notes on Tamil Internet Conference 2016


I attended Tamil Internet Conference 2016 at Gandhigram Rural Institute, Dindigul on Sep 9,10,11 of 2016

This time, I attended the conference with Family. Nithya and Viyan accompanied me. Nithya and Myself conducted a workshop on Python Programming for the students as Pre-Conference workshop. Happy to see that Nithya’s training method on Python is simple and easy for beginners. She is against presentations, slides. She directly jumps into handson. Once students get some taste of how easy the python programs are, they get much interest to follow the further session.

It is a paid workshop. Still the registrations were around 100. we deliberately rejected many students as we wanted to have a one computer-one person kind of handson workshop. It is a good news that many people in rural areas know about python and even readty to pay for a workshop.  Thinking on conducting more workshops there in coming days.

On the first day of conference, I presented about the project “Open-Tamil” It is a python library for to process tamil text. Mr. Muthu from Boston is a key developer. my brother Arulalan contributed a font conversion features for open-tamil. I am trying to contribute few features. We can create word games in tamil using this open-tamil. Audience appreciated on this feature.

Then, I attended other sessions related to Language technology. There were many talks on OCR, TTS, spell checker, ontology dictionary, mobile apps. Learnt that Hidden Markov Model, we can do text to speech and speech to text. Have to explore more on this.

Like previous INFITT conferences, most of the papers were to demonstrate their products. Not much internals, algorithms are discussed. None of them are in open-source. So, no way to learn, contribute, use these products. This is very sad part for the tamil development. We can see all the important needs of tamil computing. But all of these are in hidden racks. If this situation continues, the same topics will be discussed on 100th conference too. I request the academicians and research people to release their works as open source software, so that many people can contribute and create wonderful tools for public usage.

The third day had a long demonstrations of Machine translation and Text to speech. The Machine translation worked a bit. But the TTS by Prof. T.Nagarajan, from SSN Engg college, is a great tool. Gives almost native sound of a tamil speaker. But again, it was just a demo. I, as a developer, user can not use, contribute to the TTS.

All the TTS and other research are funded by the TAX money from public, by the government. But these academicians, prevent the public access for these tools. Dont know whom to contact for releasing all govt funded development works as free/open source software. Reply here if you know how to proceed further on this.

Gandhigram Rural University agreed to have Chair for INFITT on their premises.

it is a good initiative. Hope we can have continuous events, trainings, workshops and research with the university.

More than the conference papers, the preconference workshops and the half day length tutorials are much useful as they give more internals of the subjects. We have to add these events on the future conferences too.

Met many friends there. Udhayan, Badri Shesadri, Durai Manikandan, SelavaMurali Elantamil, Mugilan Murugan, Dhanesh to name a few. Discussions with these people always inspire me to do more on Tamil Computing.

Started to read on the Conference Book. It is around 500 pages. Planning with  INFITT to release this book and old conference books as epub, mobi, HTML formats.

Thanks for the INFITT team for the conference. Special Thanks to selvamurali for adding me on the organizing tasks. Got lot of experience on handling people, managing tasks on eleventh hour, planning and executing events.

Thanks to my team at my company https://tvfplay.com for managing critical tasks and issues when I am on the conference.

Special Thanks to Nithya and Viyan for accompanying me all the times.

 

 

 

 

 

 

 

How to use ibus in KDE5?


I use KDE Desktop environment for long time.

The recent KDE5 is good, sleek and beautiful.

To type in Tamil, I use ibus.
Installed it and configured Tamil99 keyboard layouts.

sudo apt-get install ibus-m17n ibus m17n-db m17n-contrib ibus-gtk ibus-qt4

This links helped to setup ibus.
https://abstract2paradox.wordpress.com/2011/06/14/typing-tamil-on-linux/
http://askubuntu.com/questions/129407/how-do-i-turn-on-phonetic-typing-for-tamil
https://www.youtube.com/watch?v=Q6fYn3OvfUE

Still in KDE5 , ibus is not working only in KDE applications. But, it works well with GTK based applications like Firefox.

In order to resolve this problem add

export QT_IM_MODULE=ibus

to ~/.xprofile and restart your X user session. ie. logout and login.

After doing this, I can type in tamil using ibus in all KDE applications.

Thanks to https://wiki.archlinux.org/index.php/IBus#Troubleshooting

Few Ebooks on Free Software, GNU/Linux, MySQL, HTML5 in Tamil


“Nithya, Please stop seeing TV Serials. I can not tolerate them.”

I requested my wife.

“Hmm. OK. I will stop it. But, I have to do something useful on the saved time”. Nithya said.

“Good. Can you share your knowledge in Tamil? Can you write few articles for the e-magazine Kaniyam“?

“Will give a try”.

This is how she started to write on Tamil about the technologies, she is strong and the new stuff she is learning. Her articles are published in Kaniyam site in 3 years. Compiled them and released as Free Ebooks.

Here is the list of Ebooks written by Nithya.

Nithya

MySQL – Part 1

MySQL – part 2

GNU/Linux Part 1

GNU/Linux Part 2

HTML

 My friend Amachu a.k.a Ramadhas wrote a book on Free Software. It came as print edition 7 years ago.
Released it as a free Ebook.

Ramadhas

 

Free Software

All these books are released in Creative Commons License.
You can download, read and share with all.

Nithya completed a book on GNU/Linux administration. Now writing on CSS.
My friend Priya wrote on Ruby.
Mr. Kuppan wrote on OpenOffice/LibreOffice.

These books are under spell checking and proof reading.

Kathirvel is writing on PHP
Tamil on WordPress.
I am writing on Python

We will release them soon.

Thanks for Kaniyam.com team and FreeTamilEbooks.com team for bringing these ebooks and the awesome service for the Tamil readers.

New Open Source Text to Speech system for Tamil


Prof. Vasu Renganathan, Univ Pennsylvania, Philadelphia, USA, has released his Text to Speech for Tamil language as Open source.

Get the source at :

https://github.com/vasurenganathan/tamil-tts

See in action:

http://www.thetamillanguage.com/tamilnlp/speak/

http://www.thetamillanguage.com/tamilnlp/speak/listentome.html

http://www.thetamillanguage.com/tamilnlp/speak/url_talk.php?url=

It is written in php.

There are many open source TTS systems available as espeak, Festival, CMU Sphinx  etc.

But they work fine for English only. A new system is needed for Tamil.

Myself and my brother Arulalan are trying to build a TTS system using python.

He wrote script to convert tamil text to IPA.

http://tuxcoder.wordpress.com/2014/08/02/release-txt2ipa-converter-v0-1/

https://github.com/arulalant/txt2ipa

The next step is to record audio for each symbol and play with python.

In the meantime, the TTS by Vasu gives a great enlightenment on text and sound processing,a s it has all the sound files and code to process text, map to sound files and stitch as a word etc.

We will port it to python soon.

This is not a very perfect TTS.
Many things have to be improved.

  • There is little gap between letters.
  • Need few more gap between sentences.
  • Need more voices.

We can add all these features as we have the source now.

Please check the code and explore a how TTS works.

Reply here if you are interested in improving Tamil TTS System.

Thanks.

Thanks for prof.vasu for open sourcing his nice works.

INFITT 2014 – International Conference for Tamil Internet


Home

INFITT is an international organization which connects, Tamil Scholars, Government, IT Professionals and Public.

Every year it conducts “Tamil Internet Conference”. One time in India and Next time in any other country. This year “Tamil Internet Conference 2014” has been conducted in Pondicherry on Sep 19,20,21 2014.

Latest_INFITT_LOGO_2014_2_small

This was my first participation to a INFITT conference.

100 papers were presented from the scholars from 9 countries.

It was a great place to meet most of the Scholars in Tamil.

Around 50 scholars came from Malaysia for this conference.

So happy to meet my Malaysian friends after a year.

I presented a paper on “Open-Tamil” a python library for processing Tamil Text.

Here is the paper

https://docs.google.com/document/d/16PGCQxO-yx8h1JGqOo-YY7Sb2sz3D5YyV_PbaYPlwYU/edit?usp=sharing

Here is the presentation

http://www.slideshare.net/tshrinivasan/open-tamilpresentationta

Sibi from fsftn gave a talk on “Introduction to OCR using Tesseract”

My friends BalaVignesh and Arthi BalaVignesh are researching on OCR using Tesseract.

They are building a web application for training Tesseract for Tamil Text. They gave a talk on their research.

There were many talks in various topics like Fonts conversion, Text to speech, mobile application development, Spell Checker and more.

ElanTamil from Malaysia explained their work on Tamil SpellChecker using hunspell and Grammar Checker using LanguageTool.

Most of the talks were pure academic and there were not much demonstration on practical implementations.

There are tons of research happening on Tamil Computing, Linguistic areas. But the sad part is no one is ready to share their works for public.

Many Universities run funded research on various topics, but they are not ready to share their works.

OCR, Text to Speech, Annotated Corpus, Speech to Text, Spell Checker, Grammar Checker are the highly required softwares. People are asking for them for more than 10 years.

There are many academicians did funded research by universities on these areas and created some working products with the help of their research students. After they retired they package their products and selling them.

As they see that not many people are interested in buying their products, they expect government to buy their software and distribute to all public for free.

I had a discussion with the participants asking for releasing their software as Free/Open Source Software.

But, most of them are not ready for this. They had huge fears on this. If they open source their works, they fear that some big company will take their works, sell and see huge gains.

They really had huge research and created few working software. If I have to create similar software, I have to invest more then 10 years of research, which is impossible.

If they opened their research result and their working software, many people can jump in the Tamil Linguistic area and improve their software.

There are many open source developers are ready to contribute for Tamil. But as we don’t know from where to start, we stand still on the starting point itself.

The existing software sellers, ex professors are not ready to share their works.

They keep on telling that “I have spent 20 years of research on this. Why I have to give it for free? Why I have to open source it? I have to take back the huge revenue for my works.”

They all forget that they got paid for their research works by universities, i.e by public. It is their duty to release their works for public.

I agree that if a company invests huge money and creates some software for tamil, it can sell it and expect the ROI. Even it can sell the closed source software. If the software is really useful and working perfectly, people will buy it for sure.

But these Ex Professors build their products based on their universities fund. The universities should own these software and release to public as Free/Open Source Software. But, these universities are not aware of this truth and these professors sell their works.

This is the great loss for Tamil Computing and Tamil People.

English and other languages are having great software as most of the linguistic research by their universities are released as open source.

Thats why English has so many software available.

I dont know how many decades it may take for Universities to release their tamil research works as open source.

Till then, let us leave these ex professors worrying and wondering on why their software are not, selling.

I dont know what will happen to their hard work and software, after their lifetime.

It is a happy news that few young open source enthusiasts started working on Tamil Software.

There is open-tamil python library for processing tamil text. It can convert 25 types of Tamil Encoding to Unicode. It has tamil to IPA conversion, which is a base for Text to Speech conversion.

Tesseract is being used for Tamil OCR development. Libreoffice got spellchecker and grammar checker.

I hope we can get more contributors for these projects. If they grow well, Tamil will get great open source software.

Apart from these thoughts,

Good stuff about this conference:

  • Met many good contributors for Tamil Computing.
  • Many papers gave new ideas for new open source tamil software development.
  • Co-ordination was good for the talks.
  • Food was nice.
  • The Dinner Treat given by CM was awesome.

Things to improve:

  • Make the Conference free for the audience. So that interested people around the city can participant. The current models enables only paid members to talk and hear the talks.
  • When there are three tracks, place the notice boards and banners to show, the track, talk, and time details.
  • Add the Table of Contents in the Conference book.
  • Release the conference book in creative commons license.
  • Do something more than yearly conference.
  • To increase membership, explain the benefits of members in the website.

I received Rs 5000 Cash Prize for the works on Tamil computing like www.kaniyam.com and www.FreeTamilEbooks.com by Prof.C.R.Selvakumar, Waterloo University, Canada.

Thanks sir for the recognition. This reminds me that I have to do more and continue these projects. These projects are being driven by great volunteers around the globe. I dedicate all the praise and prize to all the volunteers.

20140921_180457

The next conference will be in Singapore.

Hope we can create more open source software for tamil to talk in next conference.

Looking for sound files for Tamil TTS


I am trying to create an open source tamil Text to Speech Engine.

https://github.com/arcturusannamalai/open-tamil/

Using the above open-tamil library, I can split tamil words into letters.

========

cat letter.py

import tamil.utf8 as utf8

letters = utf8.get_letters(u”கூவிளம் என்பது என்ன சீர்”)

for letter in letters:
print letter

==========

python letter.py
கூ
வி

ம்


ன்

து


ன்

சீ
ர்

========

I can split the tamil words into letters using open-tamil.

Now, looking for sound files with the sounds of all tamil letters. So that we can map the text to sound files.

Do we have such collection online?

Thanks.

Strip unwanted html tags from a html file using pyhton


Recently, I got a word document to convert as an ebook for the site http://FreeTamilEbooks.com

I use http://pressbooks.com to convert html content as ebooks.

I saved the word document as html page.
It has lot of justify fomatted text.

When I copy the text from the html file, it gave lot of formatting issues on the WYSIWYG editor.

To solve this, we have to strip the unwanted HTML tags like p, font, span etc.

Wrote a small python script, which gave a clean HTML file.

Here is the code.

import lxml.html.clean as clean
from BeautifulSoup import BeautifulSoup

orig_content = open(‘t.html’, ‘rw’).read()

soup = BeautifulSoup(orig_content)

result = str(soup)

strip = clean.Cleaner(meta = True, style = True, page_structure = True, remove_tags = [‘FONT’, ‘font’, ‘span’, ‘h1′,’p’])
content = strip.clean_html(result)

new_content = open(‘tw.html’, ‘w’)
new_content.write(content)
new_content.close()