Since Tamilnadu government released a spellchecker as open source here – https://github.com/Tamil-Virtual-Academy/Tamilinaiya-Spellchecker
I have joined with friends on porting this to Python. It is a desktop application in C#. As linux has a C# environment called mono, I got recommendations to port to mono first. But, I am all new to C# and mono.
Decided to read the code line by line and port to Python directly.
One fine day, Manik joined me on a Jitsi meeting. He helped to run the C# code on a windows VM. With debugging enabled, we could understand the code little. We did pair programming and ported most of the part.
Our Tamil NLP Rockstar Asokan sir, contributed to the code, by improving the logics and fixing the errors. He made nice TamilNLP tools and released them as open source here – https://github.com/AshokR/TamilNLP
He is a regular writer for Kaniyam.com on various tamil tech topics.
Dont miss this wonderful free ebook on Tamil Computing – https://freetamilebooks.com/ebooks/future_of_tamil_and_information_technology/
Check all his ebooks here – https://freetamilebooks.com/authors/asokan/
More to be published.
Muthu Annamalai of Ezhil Language asked to add more test cases for this.
Here is the python ported version of the Tamil spell checker.
This is the true power of open source. we get helping hands from all over the world for the good causes. Tons of thanks for all contributors.
What to do next?
- We need lot of test cases to test this.
- We may have to add more custom rules.
- Make it as a web application.
- Have to check on how to make it to work with other applications like emacs, vim, gedit, kate, libreoffice, notepad++ etc
If you are interested in contributing, reply here or write to email@example.com
The big dream of bringing a open source Tamil spellchecker is happening. Happy to be a small part of this.
Tons of thanks for all friends for their helping hands and good hearts.
Read all days notes on building tamil spellchecker.
- Study notes on open-tamil spellchecker – day 1
- Building Tamil Spellchecker – Day 2 – Bloom Filter to quick query on dataset
- Building Tamil Spellchecker – Day 3 – Collecting all Tamil Nouns
- Building Tamil Spellchecker – Day 4 – Shall we collect ALL Tamil Words?
- Building Tamil Spellchecker – Day 5 – started collecting ALL Tamil Words
- Building Open Source Tamil Spellchecker – Day 6 – How fast is bloom filter for 24 lakh words?
- Building Open Source Tamil Spellchecker – Day 7 – Scrapping websites to get more words
- Building Open Source Tamil Spellchecker – Day 8 – Porting from C# to Python
- Building Open Source Tamil Spellchecker – Day 9 – Ported from C# to Python