Project Idea – Call for Contributors – web scrapping West Bengal Public Library Network


Hello all,

We have bengali wikisource friends requesting for web scrapping PDF files from a dspace based library.

http://dspace.wbpublibnet.gov.in:8080/jspui/

The site seems down some times. But will be up in few hours.
Can any one contribute to this project?

If you are interested, reply here or mail to me on tshrinivasan@gmail.com

Thanks.

Advertisements

30 Project Ideas for contributing to Indic Wikipedia Projects


Last week, I had an interesting meeting with Panjabi Wikimedian community and CIS-A2K team.

Panjabi wikimedia community is small in count. But each of them are contributing with their best. Many of them doing 100-days-of-wiki, personal wiki edithathon for 100 days. Few of them do in in multiple sites and many times a year.

Their interest on contribution and passion on their language is awesome.

Interacted on wikisource, wiktionary and wikipedia. Shared many ideas to improve their workflow. They are looking for many tools to automate their tasks. Those tools will be useful for all wiki communities.

Then, had some great discussions with CIS-A2K team. We spoke about many interesting project ideas.
Listing them all the ideas here.

1. List down the Top 10 tricks/hacks/must know on any wikisource project

2. Make simple tutorials on how to start contributing to wiki, in all possible languages. Still we dont have an ebook or easy starter guide in Tamil. There may be video tutorials. curate them and show them in better way to find them easily.

3. Telegram bot to proofread wikisource contents. Get a page from wikisource. split it into lines, then words. Show a word and OCRed content in a telegrambot. User should verify or type the correct spelling in telegram itself. Submit the changes to wikisource. Thus, we can make the collaborated proofreading easily.

4. Explore how to use flickr for helping photographers to donate their photos for commons. Flickr is easy for them to upload and showcase. From there, we should move the photos to commons. Few tools are already available. Explore them and train them for photographers.

5. We should celebrate the volunteers who contribute to wiki. By events, news announcements, interviews etc. CIS may explore this.

6. Web application for OCR4WikiSource

7. Make a web application to record audio and upload to commons and add in wiktionary words. explore Lingua-Libre for web app.

8. Make a mobile application to record audio and upload to commons and add in wiktionary words.

9. CIS may ask the language based organizations to give their works/tools on public licenses.

10. A one/two day meeting/conference to connect various language technologies. Each team can demonstrate the tools they are working on. others can learn and use them for their languages. CIS may organize this soon.

11. Building spell checkers for Tamil. Learn how other other languages are doing. Odia seems to have good spell checker. Explore that.

12. For iOS, there is no commons app to upload photos. It was there sometime ago. Fix the iOS commons app and rerelease it again.

13. Build Maps with local languages with OSM.

14. One/Two day training on wiki tech. like gadgets, tools, toolserver, API, etc

15. Tweet marketing to promote the ebooks released in wikisource projects. Measure the downloads.

16. CIS may talk with amazon to release the ebooks from wikisource for free always at amazon.

17. Explore Valmigi project of malayalam, chikubuku of kannada – for their ebooks.

18. Download ebooks from dspace, bengali books – West Bengal Public Library Network – url – http://dspace.wbpublibnet.gov.in:8080/jspui/

19. Explore paid works for wikisource proofreading.

20. Blog on how ta wikisource for 2000 ebooks from TN government in public domain license. Send to CIS. They may try to do the same for other languages.

21. ASI website has info about all monuments. Scrap them all and add in wiki.

22. Scrap details from tourism sites and add in wiki.

23. Kannada archeology site has tons of images but with 3 seals added in all images. scrap them, remove seal and add to commons.

24. Tool to audit wiki sites. like new users, edits, measurements, KPIs, reports etc.

25. Discuss with wiki writers and help them to automate their tasks. Build new tools to help them. train existing tools.

26. Get existing photos from many photographers. Get license doc. Add in OTRS. Have a team to upload the photos to commons.

27. Find the pages that don’t have images. Search in commons and add 1 image automatically.

28. Infobox in wiki pages may have 1 image. Check for the same page in other languages.. get the image from infobox and use it in missing pages.

29. Tito showed a broken JS script. Explore it and fix it.

30. Discuss with victor and google team to improve the OCR feature and integrating with wikisource. Explore existing tools like http://tools.wmflabs.org/ws-google-ocr/ and https://wikisource.org/wiki/Wikisource:Google_OCR

 

Thanks to Ravi, Tito, Tanveer, Dan, Charan Singh, Manavpreet, Rupika,Gurlaal, Stain for the interesting meeting and great ideas.

We can work on these ideas and implement them soon.

If you are interested in doing any of the ideas, reply here or mail me on tshrinivasan@gmail.com

 

Project Idea – Need a web interface for Tamil TTS


Hello all,

 

The Tamil TTS system provided by IITM and SSN College of Engineering had one issue. It can convert one tamil string to audio at one time.

https://github.com/tshrinivasan/tamil-tts-install

Because of this, we can not do parallel conversion. Few full text books took 4-5 hours for conversion. Because of this, we could not make it as a web application for public use.

Mohan

Mohan helped to make the Tamil TTS Script simpler to process multiple
conversion simultaneously.

Here is the super script that does magics.

https://github.com/mohan43u/tamil-tts-install

Thanks Mohan for your great works.

now, we need to convert this as a web application, so that anyone can
use it easily.

The requirements are below.
1. user registration with gmail
2. user should upload a tamil text file
3. once it is converted, user should receive an email with the link to
the audio file
4. we can keep the audio files for 1 week
5. REPT API support with authentication
6. A queue system

All these will be released in GPL.

If you are interested in doing this, reply here or write to me.

Project Idea – Mobile application to explain about the place I am in


Few months ago, I was talking to Siva. He is part of Tamil Heritage Activities. They take people to Mahabalipuram often and explain all its great history.

They are looking for a mobile app to auto explain about the nearby place, by getting the geolocation of the person.

if you goto near to tiger cave of mahabalipuram, the app should explain all about tiger cave. The data can be provided from a wikipedia page or some custom webservice.

We can extend the same to any place. Like, extending the same for all the temples in kanchipuram or even all over world.

Wikipedia and wikidata can be great starting points for providing required information.

Now, we are looking for android/ios developers to develop this as an open source mobile application.

If you are interested in this, mail me on tshrinivasan@gmail.com or reply here.

Thanks.

Project Idea – Mobile app to add POI using openstreetmaps


My friend David Rajamani is looking for an android app to be developed as a module to be used with other application being developed.

User should login with oauth from another app.

Then he should point a place the map displayed and add details about that place.

The data should be sent to openstreetmaps.org and their own database.

Maps.me android app can be taken as example and can be converted as module or pluggable into another app.

If you are interested in doing this project, mail me on tshrinivasan@gmail.com

Thanks.

Project Idea – Location aware mobile apps to explain historical places


Last week, met my old friend siva, of Tamil Heritage Trust, after few years. Their group is serving the community, by explaining all the history of great places of tamilnadu.

He explained me about a requirement of new mobile app.

Location aware app to explain on historic places.

Imagine, you goto tiger cave, near mahabalipuram. While wondering what is that, how it will be an app displays all the history about that place, when you go near or search for it.

Like that we can give details about all the monuments at mahabalipuram.

We can extend that to all the temples at kanchipuram or all around tamil Nadu.

The app should be released as free/open source software.

It can be a native or react native or hybrid app.

If you are interested in contributing for this, reply, comment here or mail me on tshrinivasan@gmail.com

Thanks.

Project Idea – Telegram bot to translate strings for Open Source Projects


telegram bot க்கான பட முடிவு

In wikimedia hackathon, I saw a demo of using telegram bot to translate strings from translatewiki.net

here are the notes about it.

============

Telegram Translation Bot: https://phabricator.wikimedia.org/T131664 DONE

Translate on translatewiki.net without leaving your Telegram app

Code: https://github.com/amire80/mediawiki-telegram-bot/

mediawiki.org page: https://www.mediawiki.org/wiki/User:Amire80/chat_bot_draft

Phabricator: amire80 * Wikipedia: Amire80 * Twitter: @aharoni

Amir E. Aharoni and Taras Bunyk presenting

Justin Du (MtDu), Taras Bunyk, and help from Brian Wolff, Madhvuvishy, bd808, Niklas Laxström, Jon Robson, and more people!

“Most people don’t speak English”

Translatewiki.net – thousands of messages to translate

can now translate through this simple mobile app instead of needing to load the full site in a browser

selects untranslated strings, in your preferred languages, sends them to you, and you translate, and it submits them to translatewiki

Long messages are automatically skipped to fit a use on mobile.

============

Thinking as we can build a bot to translate the strings for mozilla and openstreetmaps.

Need to get your inputs/thoughts/ideas for this.

translate க்கான பட முடிவு

These links may help to build a telegram bot for translations.

https://github.com/zanata/zanata-python-client

https://translate.zanata.org

https://translate.zanata.org/iteration/view/TamilMap/1/settings?dswid=1182

use this command to get the po file in /tmp/ta.po

zanata po pull –url https://translate.zanata.org/ –project-id TamilMap –project-version 1 –transdir /tmp

We can process the po file using polib

http://polib.readthedocs.io/en/latest/quickstart.html

There are many python libraries to create a telegram bot.

http://telepot.readthedocs.io/en/latest/

https://khashtamov.com/en/how-to-create-a-telegram-bot-using-python/

https://blog.pythonanywhere.com/148/

https://www.codementor.io/garethdwyer/building-a-telegram-bot-using-python-part-1-goi5fncay
With all these tools to create a bot, to process Po files and zanata to host the translations, we can connect them all.
If any one is interested in programming for this, reply here.

Thanks.

Image sources:

http://www.asktrustdee.com/2016/03/my-top-5-telegram-bot.html
https://commons.wikimedia.org/wiki/File:Translate_en-ta.png | CC-By-SA

Project Idea – Teach Tamil to Apertium – Open Source Machine translation tool


Apertium க்கான பட முடிவு

 

Apertium is a Open source Machine Translation system. It supports many languages like english, hindi, urudu etc.

Explore its features and train Tamil for it.

The following links will help to explore further.

https://www.apertium.org

http://wiki.apertium.org/wiki/Apertium_New_Language_Pair_HOWTO

http://wiki.apertium.org/wiki/Dravidian_languages

https://sourceforge.net/p/apertium/mailman/message/23318108/

 

If you are interested in exploring this, comment here. Let us work together and make some move with apertium for Tamil.

Project Ideas – Part 2 – Looking for contributors


Here are few more project ideas.

1. mobile/web app to record voice for wikisoure – Show a word, record it, upload to commons, link back to wiktionary.

2. mobile/web app to record audio books  – FreeTamilEbooks needs audio books too

3. wordpress to android app convertor – Why cant we convert a wordpress site as android app with RSS feeds?

4. epub to apk convertor – Let us publish ebooks as mobile apps too.

5. blog to epub convertor – fix, add images
https://github.com/sathia27/blog2ebook
Add a feature to download images and add them to ebooks.

6. Daily mobi files for tamil newspapers
Crawl newspapers daily, make mobi, send them to kindle in email daily.

7. Send to kindle – feature for FTE
Add Send to kindle feature to FreeTamilEbooks.com site

8. Lime survey – SAAS – alternate to google forms
Explore limesurvey and make it as alternate for google forms.

9. Collect politicians info and release as app, site

How can we collect all politicians details as education, assets etc and publish for public?

http://tshrinivasan.blogspot.in/2015/12/how-to-collect-details-of-TN-politicians.html

10. setup ELK for tamil literature search, build a search engine on top of it

Explore using ElasticSearch and Kibana for Tamil Text analysis.

11. fix android app to record audio for wiktionary –

https://github.com/Atul22/wikiAudio
done at https://meta.wikimedia.org/wiki/WikiConference_India_2016/Chandigarh_Hackathon

12. Analyse tamil tv/radio show audio, find how many english words are used/hour
This paper may help
https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf
https://mail.python.org/pipermail/chennaipy/2017-March/001429.html
Contact Ganesh for python implementation of this algo

13. gui for voice record/upload – wiktionary

https://github.com/tshrinivasan/voice-recorder-for-tawictionary

This needs a GUI version for windows users

14. gui for csv uploader

https://github.com/tshrinivasan/tools-for-wiki/tree/master/csv-uploader-wiktionary

This needs a GUI for windows users

15. gui for open-tamil font convertor

https://github.com/Ezhil-Language-Foundation/open-tamil

Need a web application or GUI for all features of open-tamil

16. mobile app to teach tamil – pollachi nasan

http://tshrinivasan.blogspot.in/2015/03/blog-post_9.html

17. wiki massuser create

Sometimes, we need to create 100s of users on wikipedia, for any training/event. Currently, only 6 users can be created. Admins can create multiple users, one by one. Automate this process using mechanize and beautifulsoup.

18. OCR4wikisource web version using google vision api

Rewrite https://github.com/tshrinivasan/OCR4wikisource with google vision api and give a web interface.

19. create a command line TTS from the source of a mobile TTS app.

Here is a open source TTS mobile app for tamil.

http://www.iitm.ac.in/donlab/tts/androidapp.php

Register and download the source and apk.
The voice named “Naveen” is good.

There are many c files in the folder
SSNFlitehtsTamil/app/src/main/jni

Can you compile those files and give a binary file as a command line tool?

Explore these code and share your thoughts on how to convert this as a
desktop/command line application so that we can use it in our
computers.

20. Create a GUI app for bulk photo uploader for http://commons.wikimedia.org

https://github.com/tshrinivasan/mediawiki-uploader

Project Idea – Automation script needed to download British Library books


British Library has already digitized many Indian books (including Tamil, Bengali and other languages) and uploaded them in their website.[1]  The books are split in separate pages in .tiff format, so, we need a script to automate the process of transferring them in Internet Archive/Commons as a single pdf/djvu file, so that we can use it in Wikisource.

https://i2.wp.com/eap.bl.uk/images/header_main.jpg
Got this request from my Wikipedia friend Bodhisattwa Mandal
I checked few Tamil Books.
Example :
http://eap.bl.uk/database/overview_item.a4d?catId=164997;r=18467

“Access for research purposes only” is the license for this file.

But, it seems that these books are very old and already in public domain.
We have all the permissions to download them and publish anywhere.
Now, we need a program in python or any language to download all the books, magazines from the sire http://eap.bl.uk and to provide them as individual PDF files or a zip file of images.
Once, if we get the PDF or image files, we can do OCR them using google OCR and get text out of them. Then, we can publish both images and text for further proofreading and fixing to WikiSource sites, using OCR4WikiSource.
if you are interested to contribute for this project, reply with your details in comment or send mail to tshrinivasan@gmail.com
Thanks.