http://ProjectMadurai.com has tons of old tamil literature as HTML and A4 PDF files in public domain license.
There are many ebook reading devices and tables that dont support Tamil.
To use these devices, we can create 6 Inch PDF files with tamil content.
Let us see here, how to convert all Project Madurai ebooks into 6 Inch PDF files
using the utilities available in GNU/Linux.
1. Get the filenames.
This page has all the ebooks.
Copy the page content, paste in LibreOffice spreadsheet.
Copy the column named “unicode”
save the filenames only as a separate text file.
2. Download these files using wget and python script.
book = open(“pm.txt”).readlines()
for filename in book:
filename = filename.strip()
print “Downloading ” + filename
bookurl = “http://www.projectmadurai.org/pm_etexts/utf8/” + filename
command = “wget -E -H -k -K -p –max-redirect 0 –domains www.projectmadurai.org -e robots=off ” + bookurl
running the following command,
will download all the html files with the relevant images to the current folder.
There are 593 html files downloaded.
Convert to PDF
The utility wkhtmltopdf will convert any given html file to PDF file.
To convert to 6 inch PDF, the following command helps.
wkhtmltopdf -s A6 –minimum-font-size 40 -B 5 -L 5 -R 5 -T 5 source.html destination.pdf
Now, let us convert all the downloaded html files to 6 Inch PDF file using a small shell script.
for i in *.html; do orig=`basename $i .html`; echo “Converting $orig”; wkhtmltopdf -s A6 –minimum-font-size 40 -B 5 -L 5 -R 5 -T 5 $i.html $orig-6-inch.pdf; done
By running this command, all the 593 html files are converted into 6 inch PDF files.
Now, we can read these 6 inch PDF files in Kindle, android mobile or tablets.
4. Upload the 6 inch PDF files.
I have uploaded all the 6 inch PDF files here.
Get the real name of the book by comparing here.
Download your favorite book and start reading.