Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a “massively scalable pub/sub message queue architected as a distributed transaction log,” making it highly valuable for enterprise infrastructures to process streaming data. Additionally, Kafka connects to external systems (for data import/export) via Kafka Connect and provides Kafka Streams, a Java stream processing library.
I am using Kafka as a message queue, in one of my projects that get huge amount of real time data. I get around 6000 to 1,00,000 events per minute. I tried to read those events by a custom python script. The script can not read that huge data. It missed many data.
Was looking for a stable data reading tool. Found Kafka and explore it. For my surprise, it worked well. Stress tested with the tool “siege“, producing millions of test data. Single Kafka server received all the data and stored.
It compresses all the data as its own internal format and keeps them all. By default, it stores for a week. Anyone can write to it and anyone can read from it, in a very stable process.
Logstash is a perfect pet for reading from kafka. Then it can write to s3, another kafka or elasticsearch.
Installation is very simple. Just download, extract, start running it.
With confluent platform, it can read and write json documents easily.
I strongly suggest to use kafka on any message queue requirements.