Project Ideas – Part 2 – Looking for contributors


Here are few more project ideas.

1. mobile/web app to record voice for wikisoure – Show a word, record it, upload to commons, link back to wiktionary.

2. mobile/web app to record audio books  – FreeTamilEbooks needs audio books too

3. wordpress to android app convertor – Why cant we convert a wordpress site as android app with RSS feeds?

4. epub to apk convertor – Let us publish ebooks as mobile apps too.

5. blog to epub convertor – fix, add images
https://github.com/sathia27/blog2ebook
Add a feature to download images and add them to ebooks.

6. Daily mobi files for tamil newspapers
Crawl newspapers daily, make mobi, send them to kindle in email daily.

7. Send to kindle – feature for FTE
Add Send to kindle feature to FreeTamilEbooks.com site

8. Lime survey – SAAS – alternate to google forms
Explore limesurvey and make it as alternate for google forms.

9. Collect politicians info and release as app, site

How can we collect all politicians details as education, assets etc and publish for public?

http://tshrinivasan.blogspot.in/2015/12/how-to-collect-details-of-TN-politicians.html

10. setup ELK for tamil literature search, build a search engine on top of it

Explore using ElasticSearch and Kibana for Tamil Text analysis.

11. fix android app to record audio for wiktionary –

https://github.com/Atul22/wikiAudio
done at https://meta.wikimedia.org/wiki/WikiConference_India_2016/Chandigarh_Hackathon

12. Analyse tamil tv/radio show audio, find how many english words are used/hour
This paper may help
https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf
https://mail.python.org/pipermail/chennaipy/2017-March/001429.html
Contact Ganesh for python implementation of this algo

13. gui for voice record/upload – wiktionary

https://github.com/tshrinivasan/voice-recorder-for-tawictionary

This needs a GUI version for windows users

14. gui for csv uploader

https://github.com/tshrinivasan/tools-for-wiki/tree/master/csv-uploader-wiktionary

This needs a GUI for windows users

15. gui for open-tamil font convertor

https://github.com/Ezhil-Language-Foundation/open-tamil

Need a web application or GUI for all features of open-tamil

16. mobile app to teach tamil – pollachi nasan

http://tshrinivasan.blogspot.in/2015/03/blog-post_9.html

17. wiki massuser create

Sometimes, we need to create 100s of users on wikipedia, for any training/event. Currently, only 6 users can be created. Admins can create multiple users, one by one. Automate this process using mechanize and beautifulsoup.

18. OCR4wikisource web version using google vision api

Rewrite https://github.com/tshrinivasan/OCR4wikisource with google vision api and give a web interface.

19. create a command line TTS from the source of a mobile TTS app.

Here is a open source TTS mobile app for tamil.

http://www.iitm.ac.in/donlab/tts/androidapp.php

Register and download the source and apk.
The voice named “Naveen” is good.

There are many c files in the folder
SSNFlitehtsTamil/app/src/main/jni

Can you compile those files and give a binary file as a command line tool?

Explore these code and share your thoughts on how to convert this as a
desktop/command line application so that we can use it in our
computers.

20. Create a GUI app for bulk photo uploader for http://commons.wikimedia.org

https://github.com/tshrinivasan/mediawiki-uploader

Advertisements

Real Time Bigdata Analysis – Few Tools


https://media.licdn.com/mpr/mpr/AAEAAQAAAAAAAAZbAAAAJGE5Y2ZiNmU2LWRhNTgtNDhlYi05YTY0LTAwYWVmY2EyZGY5Yw.png

 

Big Data Analysis is becoming one of the hot words in the IT industry. Everyone wants to analysis data. They all want to use the tools like hadoop, spark etc. These are used to process huge amount data. i.e in TB size . This is called “Historical Data Analysis”.

In opposite to this, there is “Real Time Data Analysis”. This is to process immediately on the stream of constantly incoming data.

The typical data pipeline for Real Time Big Data Analysis is as below.

App/Site->API Server->Message Queue(Kafka) ->Processor(Logstash)->Storage(Elasticsearch, Redis, MongoDB)->Visualization(Kibana)

Few years ago, we had to rely on Google Analytics and pay huge amount of money to get real time data of our site visitors, credit card swipes etc. Nowadays, we can build entire pipeline with Free/Open Source Software itself.

https://i1.wp.com/blog.infochimps.com/wp-content/uploads/2012/05/realtime-analytics.png

With the following links, we can setup the data pipeline easily.

https://www.digitalocean.com/community/tutorials/how-to-install-apache-kafka-on-ubuntu-14-04

http://docs.confluent.io/3.2.0/kafka-rest/docs/index.html

https://www.digitalocean.com/community/tutorials/how-to-install-elasticsearch-logstash-and-kibana-elk-stack-on-ubuntu-14-04

To setup these things are easy. But once the real time flow is started on production, remember, you are always on fire. You will feel that you are riding an aeroplane, with so many buttons on the dashboard. You have to keep running, while solving the real time issues when they appear.

Explore these tools and learn their basics. Learning Basics will give their sweet results for sure.

There are tons of new tools coming in this arena. We can not master all the tools. But, exploring and learning one tool will help to keep on moving with new tools easily.

I am exploring the following tools along with the ELK.

  1. Presto
  2. Spark
  3. Secor
  4. Druid
  5. Hadoop
  6. Hive

Doing most of the programming with Python. It becomes very slow to deal with GBs of data. Go language seems faster to work with text files. Started exploring Go too.

What are the new tools, technologies you are learning?

 

Image source- https://www.linkedin.com/pulse/real-time-stream-processing-big-data-platform-birendra-kumar-sahu

http://gcastd.com/

 

 

What I learnt from teaching ELK stack in a Workshop?


Today, I trained a mixed group of students about doing real time bigdata analysis in 4ccon, Chennai.

 

https://pbs.twimg.com/media/C23UnKJUcAECtEb.jpg:large

As the Bigdata is one of the trending words on the IT field, got around 40 participants.

The participants are from Electrical, CSE departments and few working professionals.

Though we asked everyone to bring the laptop with ELK stack preinstalled, many spot registered participants, did not get the laptop or installed anything.

Thats fine. I had one full day. We can do the installations in one hour.

There were many unexpected issues.

1. Windows laptops

I never thought that people will come with windows laptops. I did not know that ELK stack can run in windows, till I see the windows laptops.

I left windows some 10 years ago. Dont know the basic stuff to do on it. Fortunately, Mr. Sivarama selvan, from NIC, got the packages for windows and demonstrated the following

1. Installing Java
2. Setting the JAVA_HOME and path
3. Invoking logstash with a sample configuration file

Without him, I would felt hopeless. Thanks a lot sir.

2. Poor Internet

Though the college provided WiFI to all the rooms, we got very poor connectivity. Connection speed was too low. The ubuntu and redhat users lost their patients to get installed these packages from repositories.

After some time, I asked them to login to my laptop and explore the commands and use my elasticsearch and kibana to connect from chrome plugins (Elasticsearch tool box, postman). As the wifi was poor, they had to wait for a long time to check even small stuff.

3. Windows Users behaviors

Our mixed stilled participants found very tough to work with the Command Prompt in windows. Many saw it for the first time. So, traversing through various directories itself very tough. Had to teach the very basic commands like cd, dir etc. Never thought that I will be teaching MS Dos commands in the ELK workshop.

We provided Zip files for logstash, elasticsearch and kibana with sample configuration files in another zip file for logstash, Elasticsearch.

The icon for Zip file and a folder seems similar in Windows. On double clicking any zip file, it opens just like a folder. People started to double clicked the Zip file and edited the config files. When they tried to access those files from command prompt, they can not reach those files. It took much time for me to find the issue and trained them on how to extract zip files. 😦

4. Editing Files

Some opened the sample logstash config file in notepad. It showed everything in a single line. Changing some values were tough.

Some opened in MS word and save as docx files.

Some found difficulties on finding a file path to give in this sample config file.

5. curl for windows

As curl is the main tool to interact with elasticsearch, dont know how people can practise in windows without curl. Found curl for windows. Again, downloading with poor Internet, teaching how to install and how to use in command prompt was tough for me. So, missed this part. Asked people to use chrome plugins like sense, elasticseach toolbox. With these plugins, people can index only few data. They cant do bulk import of data.

6. ELK versions

Someone installed a mixed versions of ELK stack. it did not worked as I displayed on my laptop. After a deep troubleshooting session, found the version issue and installed the latest version as on my laptop.

Finally, got some handson learning.

With more than 50% of time, spent of fixing these issues, managed to explain the ELK stack. Demonstrated how to read a CSV file using logstash. Displayed the data in screen and sent to elasticsearch. Then, explained Elasticsearch. Demonstrated indexing data, importing bulk data, search, and delete. Then, explored kibana. Asked them to create visualizations and dashboards. They did it with huge interest.
Then, demonstrated how we can get data from twitter stream and analyse in kibana.

Participants are happy to get some handson with the ELK stack.

Used the following links.

Config files, sample data = https://github.com/tshrinivasan/elk-training

https://www.digitalocean.com/community/tutorials/how-to-install-elasticsearch-logstash-and-kibana-elk-stack-on-ubuntu-16-04

http://ikeptwalking.com/elasticsearch-sample-data/

http://www.generatedata.com/

sample config files.
https://github.com/elastic/examples/tree/master/ElasticStack_twitter

Slides

Here are my learnings:

1. Never expect internet connection.

Find a solution to setup a quick local intranet. Always go with a Wifi router. So that all VNC, SSH, web servers, file transfer can be easy and fast.

Get some portable packages for GNU/Linux too.

Always be prepared to run the workshop without internet.

2. Learn few Windows stuff and have software for windows too.

It is not good to ignore the windows users. When they come forward to learn something, we have to be prepared to teach them too.

Have a copy of ELK Zip files, curl, putty, VNC, Java setup files, Notepad++ editor, Firefox/Chrome browser etc.

3. Prepare documentation and share to participants

Prepare a how to install/setup/example document and share with all. With this document, people can explore further once they go home. If possible, create video tutorials and share online and offline.

4. Software versions

Make sure the software you use on laptop and participants using are same. ELK stak is changing a lot on every release.

5. Know the audience

Mostly, we get mixed skilled audience. I assumed that they had the basic computer skills like extracting files, understanding file path and using command line. When they lack on this, we have to start training them on the basics.

This is my first training on ELK for public. Learnt tons of stuff on my preparation hours and on workshop. Thanks for the participants. With their patience and interest on learning, the day was successful. Thanks for 4ccon volunteers for the wonderful event.