Big Data Analysis is becoming one of the hot words in the IT industry. Everyone wants to analysis data. They all want to use the tools like hadoop, spark etc. These are used to process huge amount data. i.e in TB size . This is called “Historical Data Analysis”.
In opposite to this, there is “Real Time Data Analysis”. This is to process immediately on the stream of constantly incoming data.
The typical data pipeline for Real Time Big Data Analysis is as below.
App/Site->API Server->Message Queue(Kafka) ->Processor(Logstash)->Storage(Elasticsearch, Redis, MongoDB)->Visualization(Kibana)
Few years ago, we had to rely on Google Analytics and pay huge amount of money to get real time data of our site visitors, credit card swipes etc. Nowadays, we can build entire pipeline with Free/Open Source Software itself.
With the following links, we can setup the data pipeline easily.
To setup these things are easy. But once the real time flow is started on production, remember, you are always on fire. You will feel that you are riding an aeroplane, with so many buttons on the dashboard. You have to keep running, while solving the real time issues when they appear.
Explore these tools and learn their basics. Learning Basics will give their sweet results for sure.
There are tons of new tools coming in this arena. We can not master all the tools. But, exploring and learning one tool will help to keep on moving with new tools easily.
I am exploring the following tools along with the ELK.
Doing most of the programming with Python. It becomes very slow to deal with GBs of data. Go language seems faster to work with text files. Started exploring Go too.
What are the new tools, technologies you are learning?