ILUG-C monthly meet – Saturday, 14 Oct 2017


Hi,

Indian Linux Users Group, Chennai [ ILUGC ] is spreading awareness on
Free/Open Source Software (F/OSS) in Chennai since January 1998.

We usually meet on the second Saturday of every month, and for the
month of October we shall meet on Saturday, October 14, 2017 at 1500
IST.

Venue: Classroom No 1,
Aerospace Engineering,
Near Gajendra Circle,
IIT Madras.
Link for the Map: http://bit.ly/iitm-aero

Time  : OCt 14, 2017  3.00 – 6.00 PM

Talk Details:

Talk – 1

Topic – Install/demo Tamil TTS

Description :
Recently, we found ways to install Tamill TTS provided by IITM donlab
and SSN college of Engineering, Chennai

Here is the install script  – https://github.com/tshrinivasan/tamil-tts-install

Will demonstrate the install process

Duration : 30 min

About speaker – T Shrinivasan tshrinivasan@gmail.com , Ebooks
publisher at FreeTamilEbooks.com

Talk – 2

Mini workshop using a TelegramBOT to translate strings for OpenStreetMaps.org

We are dreaming about Maps in Tamil, for long time.

Imagine your mobile phone or GPS device, shows the maps in Tamil,
displays the roads, interesting places in Tamil, It shows routes and
says the street names and directions in Tamil while driving.

The dream can come into real as we have most of the required technologies.
OpenStreetMaps to provide maps, many apps like streetcomplete,
osmcontribute to add streetname and interesting places, Tamil TTS to
say everything in tamil.

The major thing we need is we need all the strings in Tamil.
OSM supports language tags and we can give any string in any language,
along with its translation on other languages.

To enable the translation process of existing strings in OSM, we are
working on a telegram bot. Now, it is easy to contribute to OSM via
translation, with mobile or with web browser.

The bot will be released for public tomorrow with its source code.

It will ask for your osm username, and  then for translate or verify.
The strings will be translated by google translator as first step.
That is not perfect fully. so, we need people to verify it,

You can see a string with its translation. Then say it right or wrong.
once three people confirmed a string it as right, it will be
confirmed. The incorrect strings will be displayed for translation.

Once the strings are completed, they will be uploaded to OSM using a
bot account.

Will release the bot tomorrow.

Come with
your smartphone.
Install Indic Keyboard or Sellinam for Tamil Typing.
Register at openstreetmaps.org

Let us have a translation workshop for openstreetmaps.org

Thanks for the team.

Dinesh Karthik – dineshkarthik.r@gmail.com
Srikanth Logic – srik.lak@gmail.com
Syed Khaleel Jageer – jskcse4@gmail.com
Shrinivasan – tshrinivasan@gmail.com

Duration – one hour

Topic – 3:

Any lighting talks, QA session, etc.

Entry Free. All are welcome.

Advertisements

Installation script for Tamil Text to speech System


The Tamil TTS system provided by IITM and SSN College of Engineering, has a lengthy installation process.

I have written them here. https://goinggnu.wordpress.com/2017/09/20/how-to-compile-tamil-tts-engine-from-source/

It may not be easy to follow and you may find some issues. To make the life easier, I have created shell script, to automate the entire process.

Here it is – https://github.com/tshrinivasan/tamil-tts-install

System requirements:

Ubuntu 16.04

How to execute:

git clone https://github.com/tshrinivasan/tamil-tts-install.git

cd tamil-tts-install

Edit the file, install-tamil-tts.sh

Fill the following details.

DOWNLOAD_PATH=/home/ubuntu/tts/packages #to download the required packages

COMPILE_PATH=/home/ubuntu/tts/compiled # to place the compiled files and folders

Register here http://htk.eng.cam.ac.uk/download.shtml and get a username and password

HTKUSER=htkuserchennai

HTKPASSWORD=sgqY=t=M

Then, execute the file as

bash install-tamil-tts.sh

How to convert a text to audio?

export FESTDIR=/usr

cd COMPLIE_PATH ssn_hts_demo/scripts/complete “தமிழ் வாழ்க” linux

This will convert the text and store as wav in

ssn_hts_demo/wav/1.wav

you can play it with any audio player.

The full details of what is on the compile process is explained here. https://goinggnu.wordpress.com/2017/09/20/how-to-compile-tamil-tts-engine-from-source/

To hear a demo on how the tamil TTS system sounds, click here

Thanks for IITM Team – Prof Hema  and Anju for their great support on helping us to get installed the tamil TTS system.

 

Project Ideas – Part 2 – Looking for contributors


Here are few more project ideas.

1. mobile/web app to record voice for wikisoure – Show a word, record it, upload to commons, link back to wiktionary.

2. mobile/web app to record audio books  – FreeTamilEbooks needs audio books too

3. wordpress to android app convertor – Why cant we convert a wordpress site as android app with RSS feeds?

4. epub to apk convertor – Let us publish ebooks as mobile apps too.

5. blog to epub convertor – fix, add images
https://github.com/sathia27/blog2ebook
Add a feature to download images and add them to ebooks.

6. Daily mobi files for tamil newspapers
Crawl newspapers daily, make mobi, send them to kindle in email daily.

7. Send to kindle – feature for FTE
Add Send to kindle feature to FreeTamilEbooks.com site

8. Lime survey – SAAS – alternate to google forms
Explore limesurvey and make it as alternate for google forms.

9. Collect politicians info and release as app, site

How can we collect all politicians details as education, assets etc and publish for public?

http://tshrinivasan.blogspot.in/2015/12/how-to-collect-details-of-TN-politicians.html

10. setup ELK for tamil literature search, build a search engine on top of it

Explore using ElasticSearch and Kibana for Tamil Text analysis.

11. fix android app to record audio for wiktionary –

https://github.com/Atul22/wikiAudio
done at https://meta.wikimedia.org/wiki/WikiConference_India_2016/Chandigarh_Hackathon

12. Analyse tamil tv/radio show audio, find how many english words are used/hour
This paper may help
https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf
https://mail.python.org/pipermail/chennaipy/2017-March/001429.html
Contact Ganesh for python implementation of this algo

13. gui for voice record/upload – wiktionary

https://github.com/tshrinivasan/voice-recorder-for-tawictionary

This needs a GUI version for windows users

14. gui for csv uploader

https://github.com/tshrinivasan/tools-for-wiki/tree/master/csv-uploader-wiktionary

This needs a GUI for windows users

15. gui for open-tamil font convertor

https://github.com/Ezhil-Language-Foundation/open-tamil

Need a web application or GUI for all features of open-tamil

16. mobile app to teach tamil – pollachi nasan

http://tshrinivasan.blogspot.in/2015/03/blog-post_9.html

17. wiki massuser create

Sometimes, we need to create 100s of users on wikipedia, for any training/event. Currently, only 6 users can be created. Admins can create multiple users, one by one. Automate this process using mechanize and beautifulsoup.

18. OCR4wikisource web version using google vision api

Rewrite https://github.com/tshrinivasan/OCR4wikisource with google vision api and give a web interface.

19. create a command line TTS from the source of a mobile TTS app.

Here is a open source TTS mobile app for tamil.

http://www.iitm.ac.in/donlab/tts/androidapp.php

Register and download the source and apk.
The voice named “Naveen” is good.

There are many c files in the folder
SSNFlitehtsTamil/app/src/main/jni

Can you compile those files and give a binary file as a command line tool?

Explore these code and share your thoughts on how to convert this as a
desktop/command line application so that we can use it in our
computers.

20. Create a GUI app for bulk photo uploader for http://commons.wikimedia.org

https://github.com/tshrinivasan/mediawiki-uploader

Project Ideas – Part 1 – Looking for contributors


contribute to open source க்கான பட முடிவு

I am listing here few project ideas and requirements. If you are interested in contributing to any open source project, consider these to start with.

I am giving an intro about each of them in this series of blog posts.

Add your comment here if you pick any of the project to do, so that others can join with you.

1. Clean up Epub files.

We create epub files for FreeTamilEbooks.com by using Calibre. It creates epub files with lot of extra span and other tags. We need to remove all the unwanted tags from those epub files.

Create a command line or web application to clean up the given epub files.

If you are writing in python, plan to create a calibre plugin to clean the epub files.

2.  Download reports for Tamil Wikisource Ebooks

http://ta.wikisource is providing ebooks downloads.

In this database, all language wiki source ebook downloads are stored.

http://tools.wmflabs.org/wsexport/logs.sqlite

Create a web application or command line application to get the details of tamil books and create a download
count report for each book.

Create similar report as http://freetamilebooks.com/htmlbooks/download-report.html

 

3. Improve FreeTamilEbooks android app

The android app for FreeTamilEbooks has some bugs.
https://github.com/jskcse4/FreeTamilEBooks/issues

Use the App and read the issues.
Fix them.

 

4. OCR4WikiSource – Create a web application

OCR4WikiSource is a command line application that connects google ocr and wikisource.
It sends the pdf files to google drive, ocr it, gets text, sends to wikisource.

Create a web application to upload any pdf file, send to google via google vision api, get text, send to wikisource.

Links:
Here is the requirement.
https://github.com/tshrinivasan/OCR4wikisource/issues/89

Few links about it.
https://goinggnu.wordpress.com/2015/12/28/announcing-ocr4wikisource/

https://goinggnu.wordpress.com/2015/09/30/automating-google-ocr-with-python/

https://meta.wikimedia.org/wiki/WikiConference_India_2016/Submissions/Introduction_to_OCR4WikiSource

Discussion with wikipedia developers on this.
https://phabricator.wikimedia.org/T120788

Google Vision API
https://cloud.google.com/vision

Explore the links

https://github.com/GoogleCloudPlatform/cloud-vision

http://terrenceryan.com/blog/index.php/working-with-cloud-vision-api-from-php/

https://github.com/thangman22/google-cloud-vision-php

http://blog.aimanbaharum.com/2016/04/21/ocr-with-google-cloud-vision-api/

 

5. FlipBoard like application for Tamil

Flipboard is a web, mobile app which gives latest content on user selected topics. Create such application for providing tamil content from web on various topics. Content contributors should give links for good articles with relevant categories, tags. Users should subscribe to categories and read the latest content.

 

6. Firefox plugin for tamil wikisource proofreading

 

Tamil wikisource is having around 2000 public domain ebooks, OCRed by google OCR. We have to proofread those books manually.
QuickWikiEditor is a Firefox plugin that enables on the page editing of wiki content.
https://addons.mozilla.org/en-US/firefox/addon/quickwikieditor/

Need to extend this plugin, to send the error words and the corrected words to a remote web application. From there, we can get the list of error words, search for them in entire ta.wikisource.org, replace with the corrected words automatically using bots.

Extend the plugin and create a web application to get the words collection from the plugin.

 

7. Fix the Tamil TTS by IITM

IIT Madras and SSN college, released a Text to speech application for Tamil, as an android application. You can get the source at
https://www.iitm.ac.in/donlab/tts/

It is very initial version. Not as good as the latest  web version available at http://speech.ssn.edu.in/

 

Still, we can learn, extend the initial version.

Explore the android app, get the C code out of it, create a command line app or web app as having the c code as backend.

 

8. Web application to add details about ebooks in a xml file, in github.
We release Tamil ebooks at FreeTamilEbooks.com

We store all the details about the books in a XML file.

This file is source for Android and iOS apps for FreeTamilebooks.

Once an ebook is released, we have to update the xml file manually, which is tough for non-tech contributors.

Need a web application to get the ebooks details in a form, then add those details in XML file and commits to the repo automatically.

 

9. Add ebooks automatically in GoodReads.com

We can add the details about the ebooks in FreeTamilebooks.com to GoodReads.com

We have to fill a long form manually.
Need a command line or web application to simplify this process or automate it for adding info about the books in FreeTamilEbooks.com

10. Build a SAAS version of planet kind of RSS aggregation software.

 

Most tech communities need a planet kind of RSS aggregation software. They have to buy a VPS, install planet software and add the RSS feeds.

It will be good, if we build a SAAS version of planet or similar software, so that they can simply sign in, add rss feeds and start using it.

There are more ideas. Written them somewhere on my notebooks. Will collect them and share soon.

All the projects should be released as Free/Open Source software only.

If you are interested in doing any of the things said above, comment here.

email me to tshrinivasan AT gmail DOT com to know more details on any of the project.

New Open Source Text to Speech system for Tamil


Prof. Vasu Renganathan, Univ Pennsylvania, Philadelphia, USA, has released his Text to Speech for Tamil language as Open source.

Get the source at :

https://github.com/vasurenganathan/tamil-tts

See in action:

http://www.thetamillanguage.com/tamilnlp/speak/

http://www.thetamillanguage.com/tamilnlp/speak/listentome.html

http://www.thetamillanguage.com/tamilnlp/speak/url_talk.php?url=

It is written in php.

There are many open source TTS systems available as espeak, Festival, CMU Sphinx  etc.

But they work fine for English only. A new system is needed for Tamil.

Myself and my brother Arulalan are trying to build a TTS system using python.

He wrote script to convert tamil text to IPA.

http://tuxcoder.wordpress.com/2014/08/02/release-txt2ipa-converter-v0-1/

https://github.com/arulalant/txt2ipa

The next step is to record audio for each symbol and play with python.

In the meantime, the TTS by Vasu gives a great enlightenment on text and sound processing,a s it has all the sound files and code to process text, map to sound files and stitch as a word etc.

We will port it to python soon.

This is not a very perfect TTS.
Many things have to be improved.

  • There is little gap between letters.
  • Need few more gap between sentences.
  • Need more voices.

We can add all these features as we have the source now.

Please check the code and explore a how TTS works.

Reply here if you are interested in improving Tamil TTS System.

Thanks.

Thanks for prof.vasu for open sourcing his nice works.

Looking for sound files for Tamil TTS


I am trying to create an open source tamil Text to Speech Engine.

https://github.com/arcturusannamalai/open-tamil/

Using the above open-tamil library, I can split tamil words into letters.

========

cat letter.py

import tamil.utf8 as utf8

letters = utf8.get_letters(u”கூவிளம் என்பது என்ன சீர்”)

for letter in letters:
print letter

==========

python letter.py
கூ
வி

ம்


ன்

து


ன்

சீ
ர்

========

I can split the tamil words into letters using open-tamil.

Now, looking for sound files with the sounds of all tamil letters. So that we can map the text to sound files.

Do we have such collection online?

Thanks.