Why do I love apache Kafka?

apache kafka க்கான பட முடிவு

From Wikipedia,

https://en.wikipedia.org/wiki/Apache_Kafka

<quote>

Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a “massively scalable pub/sub message queue architected as a distributed transaction log,” making it highly valuable for enterprise infrastructures to process streaming data. Additionally, Kafka connects to external systems (for data import/export) via Kafka Connect and provides Kafka Streams, a Java stream processing library.

</quote>

I am using Kafka as a message queue, in one of my projects that get huge amount of real time data. I get around 6000 to 1,00,000 events per minute. I tried to read those events by a custom python script. The script can not read that huge data. It missed many data.

 

Was looking for a stable data reading tool. Found Kafka and explore it. For my surprise, it worked well. Stress tested with the tool “siege“, producing millions of test data. Single Kafka server received all the data and  stored.

apache kafka க்கான பட முடிவு

It compresses all the data as its own internal format and keeps them all. By default, it stores for a week. Anyone can write to it and anyone can read from it, in a very stable process.

Logstash is a perfect pet for reading from kafka. Then it can write to s3, another kafka or elasticsearch.

Installation is very simple. Just download, extract, start running it.

https://www.digitalocean.com/community/tutorials/how-to-install-apache-kafka-on-ubuntu-14-04

With confluent platform, it can read and write json documents easily.

I strongly suggest to use kafka on any message queue requirements.

Image sources:

 

 

Advertisements

One thought on “Why do I love apache Kafka?

  1. hey srini… while creating a spectrometer for food inspection which was proposed with distributed data sharing phenomenon, i could not find any streaming tool like this apart from spark. although i dont have any knowledge in data aggregation, analytics, augmentation side – only my student colleage was doing that stuff. finally we settled to simple locally hostable redis db, with python running in bbb. its awesome. but for a distributed inspection system, apart from the hardware instrumentation system, such streaming system are really required. its really warming to know that tools nowadays can accomodate such frequencies of input. great share.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s