British Library has already digitized many Indian books (including Tamil, Bengali and other languages) and uploaded them in their website. The books are split in separate pages in .tiff format, so, we need a script to automate the process of transferring them in Internet Archive/Commons as a single pdf/djvu file, so that we can use it in Wikisource.
Got this request from my Wikipedia friend Bodhisattwa Mandal
I checked few Tamil Books.
“Access for research purposes only” is the license for this file.
But, it seems that these books are very old and already in public domain.
We have all the permissions to download them and publish anywhere.
Now, we need a program in python or any language to download all the books, magazines from the sire http://eap.bl.uk and to provide them as individual PDF files or a zip file of images.
Once, if we get the PDF or image files, we can do OCR them using google OCR and get text out of them. Then, we can publish both images and text for further proofreading and fixing to WikiSource sites, using OCR4WikiSource.
if you are interested to contribute for this project, reply with your details in comment or send mail to email@example.com