There are many PDF files and DJVU files in WikiSource in various languages. In many wikisource projects, those files are splited into individual page as an Image, using proofRead extension.
Contributors see those images and type them manually.
This project helps the wikisource team to OCR the entire PDF or DJVU file, using the google drive OCR. Then it will update the relevant page in the wikisource with the text.
Grab the python code from here and run in your GNU/linux machines.
https://github.com/tshrinivasan/OCR4wikisource
It is based on
https://github.com/tshrinivasan/google-ocr-python
Reply here with your suggestions and improvements.
Pingback: Project Ideas – Part 1 – Looking for contributors | Going GNU