Project Idea – Call for Contributors – web scrapping West Bengal Public Library Network

Hello all,

We have bengali wikisource friends requesting for web scrapping PDF files from a dspace based library.

The site seems down some times. But will be up in few hours.
Can any one contribute to this project?

If you are interested, reply here or mail to me on



Notes : Mediawiki Hackathon 2017 – Vienna, Austria – Day 1

Reached Vienna yesterday evening. It is a nice, clean, low crowd, cold city. Walked around the city till midnight.

The Day 1 of the hackathon started with a welcome session. Organizors explained about the event. Then, mentors introduced themself and many people pitched for working together with their projects.

We are 260 participants from 48 counties. This is the very first time, I work with these many multi country people. Though we are from around the world, everyone followed friendly space policy. It gave a good feel of being with my own friends.

After the intro session, I was roaming around the halls/sessions to find a team for me. I did not even know what project to do. Was thinking on few issues on the task board.

Met Abbas and Ibrahim in Cafe. Abbas is from Iran and living in Austria. He is a Journalist. He explained the issues faced by refugees. It takes around 5 years for them to be accepted here as refugees. Till then, they can not work anywhere. They have to live politely here. He is thinking of making them to contribute for wikipedia and get good score from the wikimedia austria, so that they can get some goodwill and faith to get accepted as refugee.

He spoke about how the arabic wiki community is growing slowly and the issues they face. It is similar to Tamil community’s issues and growth.

Met Praveen. He is an indian student who studies masters here. He is an iOS developer. Gave some project ideas like audio recording app for wiktionary for iOS. He is interested on this. Routed him to the newbies intro session to wiki development.

Met Aaron Halfaker. He is the developer of ORES. An Xray engine for the wikipedia. It can be used to determine the edits are good or bad. It involves machine learning based on the manual scoring training. Asked him, if we can use it for Tamil wikipedia. He demonstrated how it works. Will attend his session tomorrow for a deeper demo. He asked me to collect the list of bad words in Tamil and to collect the patterns of the pages we delete in Tamil Wikipedia. Will explore on this further.

Met the Magioladitis, the developer of AutoWikiBrowser. Happy to know that they are working on a web version of it. They are using javascript to develop it. Wished and thank the team on behalf of GNU/Linux users community.

Had a discussion with Maxlath, founder of A book inventory application based on the wikidata. He uses wikidata to get all the info about the books automatically. Explored the site and gave few feature requests so that all of us can use the site.

Met Vivek Maskara, a guy from Banglore. He was a windows application developer then. Now an android developer. His team is adding more features to commons android upload tool. Explained him about the windows tool to bulk upload the images to commons. We need a windows application to add metadata line name, details to images offline. Then, with single click, all the images should be uploaded to commons.

This is the very first community event for him. He asked why you guys are contributing to wiki and other open sources.

Demonstrated my wiki tools.

  1. OCR4WikiSource –
  2. Audio recorder for wiktionary –
  3. Photos uploader for commons –

Explained him how these open source projects are helping indian language communities to grow and how programmers can help to the core wikipedia content contributors.

Dear photographers. Wait for few more days. The mass upload tool for windows is on the way. Vivek, Spend some time on this in the upcoming days.

Found Hugo Lopez, from France, who is working on a Massive Open Audio Recording Project, LinguaLibre Got the  source and tried to install. Got some issues on installation. Will fix it soon.

He is a linguist, teacher, OSM, wikipedia contributor. He found his anchestors language Gascon, was destroyed by the french, just to promote french as one language form France. In 1910, there were 6.7 million people who spoke Gascon. In 1990, it is 2.11 and in 2010 it is only 0.2 million people who knows Gascon. The French government pushed french everywhere. School kids were beaten of gascon or any other language was spoken except French. This is how a language dies. The same thing is happening for Tamil. To kill and to promote Hindi as one national language, Indian government is doing all the things, it can do.

Huge found Lingua Libre, as a great tool to preserve the languages. We can speak any words and it can record all the sounds. This is the one web application I am dreaming of so many years. And it has been done already. Agreed to add few features like connection with commons and wiktionary. For this have to learn PHP. But it will be worthy effort to add more features to this.

Met Johan Uhle from He came with his cute girl kid. Asked him about the workflow, tools for translating the OpenStreetMaps in Tamil. He is wondering to know these possible with mapbox and the localized maps are happening slowly.

Evening was filled with fun events with Karoke. Enjoyed the songs sung by various people. Had dinner with fun discussions about marriages on India VS Europe, education systems etc.

There were many sessions for newcomers, wikidata, semantic wiki, and more. Will plan for attending few talks tomorrow. Wikidata is getting more like a rockstar. Seems it can do tons of magics. Will explore it soon.

The day 1 is ending now. Its around 12.00 midnight. Hearing the “Wow.Awesome” sounds from the Karoke floor. Still they are singing. Going to join with them for a group song now. See you all tomorrow.

Here are few snaps I took –






Attending MediaWiki Hackathon at Vienna, Austria

Hackathons are the events I love most. We sit together with fellow developers, pick any task, focus and work till we make some minimum viable product or complete the project.

I have attended mediawiki hackathon, chennaigeeks hackathons, and Tamil Open Source hackathons in chennai. They gave great results and good projects.

For the first time, I am going to attend an international hackathon at Vienna, Austria on 19-21, May 2017.



Get full details here –

Happy to know that my application got approved. Visa process went smoothly and I am all set to go.

Planning to work on these projects.

1. Upload/import wizard for Wikisource works –

2. Notification: Your file was used –

3. Statistics dashboard for data on Wikidata –

Will blog my experiences here.

Hoping to meet other contributors for mediawiki.

Thanks for the organizers for the opportunity.

Project Idea – Automation script needed to download British Library books

British Library has already digitized many Indian books (including Tamil, Bengali and other languages) and uploaded them in their website.[1]  The books are split in separate pages in .tiff format, so, we need a script to automate the process of transferring them in Internet Archive/Commons as a single pdf/djvu file, so that we can use it in Wikisource.
Got this request from my Wikipedia friend Bodhisattwa Mandal
I checked few Tamil Books.
Example :;r=18467

“Access for research purposes only” is the license for this file.

But, it seems that these books are very old and already in public domain.
We have all the permissions to download them and publish anywhere.
Now, we need a program in python or any language to download all the books, magazines from the sire and to provide them as individual PDF files or a zip file of images.
Once, if we get the PDF or image files, we can do OCR them using google OCR and get text out of them. Then, we can publish both images and text for further proofreading and fixing to WikiSource sites, using OCR4WikiSource.
if you are interested to contribute for this project, reply with your details in comment or send mail to

Need 50 ideas for a python hackathon

We are planning for a python hackathon at DGV arts & science college, chennai.

The students learnt python already.

To make them contribute to Free Software, I had a discussion with HOD,

to conduct a one day hackathon.

There are 40 students and we can give a 50 programs for students to pick one and do in that 6 hours span.

I have the following list.

1. scrap a and get rate of a given peoduct

2. resize some huge size photos and add some text to all the images.

3. get two dates and calculate the no of days between them.

4. upload images to flickr using flickr api

5. auto mate blog posting using wordpress api

6. analyse a apache log file and get statistics from it.

7. create a solver for crossword puzzles using the given no of words and few letters

8. Download picture of the day from and make it as wallpaper or a widget

9. test a website for its availability. send mail to some people, if the site is down

10. backup all files in /var/www/html and databases and store in a remote place.

11. Url Shortner app

12. Bookmark the urls from the given specific twitter account tweets

13. Search Result’s position finder(like find the position of the searched result from yahoo search and Google search)

14. Captcha generator – generate a captcha image for a given word

15. Cost Estimator for building.

Write a programme to produce an estimate of construction of a building

(given data for  are like the dimension of rooms in square feet, doors, windows, loft,shelfs and ventilators)

(1) How much bricks does it takes to construct the structure
(2) How much cost involves for whitewashing the structure
(3) How much cost it takes to lay tiles.
expand the idea, leave it to the programmers to decide the tool,

16. Task report for office.

==  Objective

A python script should ask for a progress report everyday when employee leave office after working hours.

It should also send the progress report as a mail to the project manager. Once the mail is delivered,

the system should be shutdown automatically.

== Expected Output

user@debian~$ ./

Hi user kindly give your progress for today




Sent mail…..

Thank you

System is about to shutdown Did you save all files ? yes / no


System shutting down …
17. Getting checkin data of any user from foursquare or yelp API and show it in openstreetmap OR google map.
18. Check for duplicate file names in a directory tree.
19. Rename the MP3 files based on their ID3 tags.
20. Get Train status using PNR Number. Scrap any PNR checking website
21. Get all the links in a Twitter users Favourites and store them in a text file.
22. Count Down Timer – Create a program that allows the user to choose a time, and then prints out a message at given intervals (such as every second) that tells the user how much longer there is until the selected time.
23. What is the Weather ? – Scrap any weather site or use any weather api service to find weather of chennai or any given place.
24. Alarm Clock – A simple clock where it plays a sound after X number of minutes/seconds or at a particular time.
25. Credit Card Validator – Takes in a credit card number from a common credit card vendor (Visa, MasterCard, American Express, Discoverer) and validates it to make sure that it is a valid number (look into how credit cards use a checksum).
26. RSS Feed Reader – Given a link to RSS/Atom Feed, get all posts and display them.
27. Bandwidth Monitor – A small utility program that tracks how much data you have uploaded and downloaded from the net during the course of your current online session. See if you can find out what periods of the day you use more and less and generate a report or graph that shows it.
28. Address Book – Keep track of various contacts, their numbers, emails and little notes about them like a Rolodex in the database.
29. Import Picture and Save as Grayscale – A utility that sucks the color right out of an image and saves it. You could add more including adjusting contrast, colorizing and more for added complexity.
30. Web based addressbook – create a web based address book using Flask MicroFramework
31. Write a script to add  Copyright text in all the files in a directory tree.
Here are some good ideas –

solution for upload files to mediawiki site

I am working on a tool which automates the process of uploading files to a mediawiki site.
i.e or your local mediawiki based wiki.

Wrote the python script.
You can get it here.

But, when I run it againt my local wiki, to upload the ogg files, it is not accepting it.
throwing the error like following.

Traceback (most recent call last):
  File “”, line 96, in <module>
  File “”, line 73, in upload_file
    picture.upload(fileobj=file_object,comment=caption, ignorewarnings=True)
  File “/usr/local/lib/python2.7/dist-packages/wikitools/”, line 228, in upload
    res = req.query()
  File “/usr/local/lib/python2.7/dist-packages/wikitools/”, line 143, in query
    raise APIError(data[‘error’][‘code’], data[‘error’][‘info’])
wikitools.api.APIError: (u’filetype-banned’, u’This type of file is banned’)

it means that the ogg file format is not accepted in the mediawiki site.

How to allow the upload of the ogg files in the mediawiki site?

Add the following two lines in the file LocalSettings.php inside your mediawiki installation folder.

$wgStrictFileExtensions = False;

$wgFileExtensions = array(‘png’,’gif’,’jpg’,’jpeg’,’doc’,’xls’,’mpp’,’pdf’,’ppt’,’tiff’,’bmp’,’docx’, ‘xlsx’, ‘pptx’,’ps’,’odt’,’ods’,’odp’,’odg’, ‘ogg’);

After adding these lines, I can able to upload the ogg files to my local mediawiki site.

Thanks for Yuvaraj Pandian for getting the correct tokens to solve the issue.

Hackathon on KDE, Wikipedia, Django – Exebit 2012 – IITM


It’s hacking time again! Hackathon 2012 brings to you the unique opportunity to hack on open source projects and win big prizes. Just unleash the hacker inside you and the rest will fall..

This year, we bring to you, developers from three open sources projects – KDE, Mediawiki and Django. They will help you in getting started with the projects.

Event format
The event will start on the first day(3rd March, 2012) morning, with how to get started sessions by the three developers. You can participate in teams of maximum size 4. Each team has to choose only one project. Team registration will be done during the getting started sessions.

After the getting started sessions, a list of interesting bugs and features will be put up. You can choose from the given list or choose your own bugs to fix. You will have time till the next day morning to work on it. The developers will be present all throughout the day to help you in hacking the code.

On the second day morning, the teams will be given a chance to present their hacks. The top three hacks will be chosen and the respective teams will be awarded prizes.

Prepare ahead! It’s open source
Since the projects are open source projects, you don’t need to wait till the event starts. You can prepare beforehand and come for the event. This will increase your chances to win. Here are few links which can help you in getting started with the projects:

IRC: #kde-in #kde-devel

IRC: #mediawiki

Django: // This is the wiki page for the entire contribution system to django. Explains the different ways in which you can contribute, in terms of code, documentation, etc. // This contains the list of tickets which have been submitted. It can be queried in the order of easy pickings, patch received, etc.. The IRC channel for django is #django. The other guidelines are given in the first link itself.
What is an IRC? –

Note: “Participants have to bring their own laptops”

For queries contact :

Source :


ILUGC Monthly Meet (July 9 th)

ILUGC Monthly Meet (July 9 th):-

    Time : Sat July 9 , 2011 (3.00 – 5.30 PM)
Venue: Classroom No 8,
Areo Space Engineering,
Near Gajendra Circle,
IIT Madras.

Link for the Map:


Topic: Intro to Google App Engine with python sdk

Description :
An Introduction on how to get started and build a basic web application
with Google app engine and Python sdk .

Duration : 50 min

Speaker : Gautam [ gautham5678 AT gmail DOT com ]

About the speaker :

The speaker is a 4th Year Computer Engg student from Vels University, Chennai
with an interest in Web Application development , and FOSS development .

Links :



Topic : Getting started: Android – Primer and discussion

Description :
Everything to get ourselves started in android. A bit about the Android SDK,
command line tools, etc. Very basic stuff. I’m a novice so let’s learn together.

Duration : 40 min



Lightning Talk :

Topic : Bitcoin – concepts and theory


What bitcoins are, why use them, features, etc.






Topic: Introduction to Wikipedia


Wikipedia, is a is a free, web-based, collaborative, multilingual encyclopedia project
supported by the non-profit Wikimedia Foundation. Its 19 million articles (over 3.6 million in English)
have been written collaboratively by volunteers around the world, and almost all of its articles
can be edited by anyone with access to the site. As of May 2011, there were editions of Wikipedia in
281 languages. IT has become the largest and most popular general reference work on the Internet,
ranking around seventh among all websites on Alexa and having 365 million readers.

Let us discuss about wikipedia and how to contribute to wikipedia.

Duration: 30 mins

: Surya Prakash [ suryasalem2010 AT gmail DOT com ]

About Speaker :

Surya Prakash, is a second year, Mechanical Engineering student at Anna University.
He contributes to Tamil wikipedia and wikimedia community.


General group discussions on any queries, events etc.
CDs/DVDs can be shared on prior request.
Announce this to all your friends, social network sites etc.
All are welcome. Entry Free