Building Open Source Tamil Spellchecker – Day 9 – Ported from C# to Python

Since Tamilnadu government released a spellchecker as open source here – https://github.com/Tamil-Virtual-Academy/Tamilinaiya-Spellchecker

I have joined with friends on porting this to Python. It is a desktop application in C#.  As linux has a C# environment called mono, I got recommendations to port to mono first. But, I am all new to C# and mono.

Decided to read the code line by line and port to Python directly.

One fine day, Manik joined me on a Jitsi meeting. He helped to run the C# code on a windows VM. With debugging enabled, we could understand the code little. We did pair programming and ported most of the part.

Manik R

Our Tamil NLP Rockstar Asokan sir, contributed to the code, by improving the logics and fixing the errors. He made nice TamilNLP tools and released them as open source here  – https://github.com/AshokR/TamilNLP

He is a regular writer for Kaniyam.com on various tamil tech topics.

Dont miss this wonderful free ebook on Tamil Computing – https://freetamilebooks.com/ebooks/future_of_tamil_and_information_technology/

Check all his ebooks here – https://freetamilebooks.com/authors/asokan/
More to be published.

https://pbs.twimg.com/profile_images/601402640786939905/Zpg9JGbE_400x400.jpg
Asokan R

We had a big showstopper as a confusing C# function. Rajesh explored it and explained me on a Jitsi call.

Rajesh

Last sunday, When I was exploring to port that function, got a mail from Asokan sir, as he completed all the porting and did a pull request. He added few tests too.

Checked them.

Wow. It works well. We have a working version of  a tamil spellchecker now in command line.

In the meantime, Arunmozhi forked the original code and started to port to Javascript. Here is his efforts – https://github.com/tecoholic/Tamil-Inaiya-Sol-Thiruthi

Hope he can port it faster from the python version.

https://avatars1.githubusercontent.com/u/383862?s=400&u=2d0514cc402ff0672fde29c5ded9556a3287f1cb&v=4

Arunmozhi

Muthu Annamalai of Ezhil Language asked to add more test cases for this.

https://github.com/tshrinivasan/Tamilinaiya-Spellchecker/blob/master/PythonPort/from_Csharp.py

Here is the python ported version of the Tamil spell checker.  

This is the true power of open source. we get helping hands from all over the world for the good causes. Tons of thanks for all contributors.

What to do next?

  • We need lot of test cases to test this.
  • We may have to add more custom rules.
  • Make it as a web application.
  • Have to check on how to make it to work with other applications like emacs, vim, gedit, kate, libreoffice, notepad++ etc

If you are interested in contributing, reply here or write to tshrinivasan@gmail.com

The big dream of bringing a open source Tamil spellchecker is happening. Happy to be a small part of this.

Tons of thanks for all friends for their helping hands and good hearts.

Read all days notes on building tamil spellchecker.

  1. Study notes on open-tamil spellchecker – day 1
  2. Building Tamil Spellchecker – Day 2 – Bloom Filter to quick query on dataset
  3. Building Tamil Spellchecker – Day 3 – Collecting all Tamil Nouns
  4. Building Tamil Spellchecker – Day 4 – Shall we collect ALL Tamil Words?
  5. Building Tamil Spellchecker – Day 5 – started collecting ALL Tamil Words
  6. Building Open Source Tamil Spellchecker – Day 6 – How fast is bloom filter for 24 lakh words?
  7. Building Open Source Tamil Spellchecker – Day 7 – Scrapping websites to get more words
  8. Building Open Source Tamil Spellchecker – Day 8 – Porting from C# to Python
  9. Building Open Source Tamil Spellchecker – Day 9 – Ported from C# to Python

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s