A short intro on Apache Kafka

5 min readSep 27, 2021

Nowadays thinking data as streams is a popular approach. We have got many popular platforms for working with streaming data, One of the most popular platforms is Apache Kafka. In this article, we’ll discuss concisely about Kafka. Happy Learning !!!

What’s Kafka ?

Precisely saying, Kafka is a distributed streaming platform used to handle real-time data. Kafka follows a distributed environment approach which means rather than sitting on one user’s computer, it runs across several servers making it capable of using additional processing power and storage capacity.

History of Kafka

Kafka was developed around 2010 at LinkedIn by a team that included Jay Kreps, Jun Rao, and Neha Narkhede to facilitate activity tracking, and collect application metrics and logs. Later Kafka was open-sourced and handed over to Apache Foundation. Kafka has quickly evolved from messaging queue to a full-fledged event streaming platform.

Before diving deep into the info about Kafka, let us discuss something about streaming data which we are hearing more in the article :)

Streaming data is a data that is generated constantly by ’n’ number of sources which are ordered in time continuously, this data can be a historical data that can be stored in databases.

Kafka written in ?

Kafka is written in Scala and Java, but it is also compatible with many other popular programming languages.

Kafka Operating System ?

Kafka works on cross platform(works on multiple platforms like Windows, OSX and Linux)

Kafka’s stable release ?

As per Wikipedia records, Kafka’s stable release is 2.8.0 / April 19, 2021 i.e. 5 months ago.

Why do we need Kafka ?

Following are the key aspects that justify the need for Kafka -

Backend Architecture is simplified: Kafka is a streamlined platform. A streamlined platform can store huge amount of data, these data are persistent and are replicated for fault tolerance. The following figure is the architecture of a complex system that is simplified by using Kafka.

Real-Time Processing of Data: In a real-time application, a continuous flow of data is needed. These data should be processed immediately with reduced latency (latency is the synonym for delay, latency is the expression for the time it takes for a data packet to travel from one designated point to another). Kafka Stream is used for building and deploying packages without any sperate stream processor or any heavy expensive infrastructure.
Connects to an Existing System: Kafka provides a framework known as Kafka connect to the existing systems in order to maintain the universal data pipeline.

How Kafka differs Traditional Messaging Queues ?

Kafka is different from traditional message queues (like RabbitMQ).

Kafka retains the message after it was consumed for a period of time (default is 7 days), while RabbitMQ removes messages immediately after the consumer’s confirmation was received.

Also, RabbitMQ pushes messages to consumers and keeps track of their load. It decides how many messages should be in processing by each of the consumers (there are settings for this behavior). Kafka supports fetching messages by consumers (pulling). It is designed to be ready to scale horizontally, by adding more nodes.

Traditional messaging queues expect to scale vertically, by adding more power to the same machine. These are the most important differences between Kafka and traditional messaging systems.

Use Cases

Kafka is used for messaging, website activity tracking, log aggregation and commit logs. Kafka can be used as a database, but it does not possess a data model or indexes.

Audit Logs:

An audit log is a document that records an event in an information technology system. Audit log entries usually include destination and source addresses, a timestamp and user login information.

Messaging:

Kafka works as a replacement for traditional message brokers such as ActiveMQ, RabbitMQ. This is because Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which makes it a good solution for large scale message processing applications.

Website Activity Tracking:

The original use case for Kafka was to be able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds. This means site activity (page views, searches, or other actions users may take) is published to central topics with one topic per activity type. These feeds are available for subscription for a range of use cases including real-time processing, real-time monitoring.

Activity tracking is often very high volume as many activity messages are generated for each user page view.

Metrics:

Kafka is often used for operational monitoring data. This involves aggregating statistics from distributed applications to produce centralized feeds of operational data.

Log Aggregation:

Many people use Kafka as a replacement for a log aggregation solution. Log aggregation typically collects physical log files off servers and puts them in a central place (a file server) for processing. Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages. This allows for lower-latency processing and easier support for multiple data sources and distributed data consumption. In comparison to log-centric systems like Scribe or Flume, Kafka offers equally good performance, stronger durability guarantees due to replication, and much lower end-to-end latency.

We have many more use cases such as Event Processing, Commit log and Stream processing.

Top Competitors of Kafka

MuleSoft Anypoint Platform.
IBM MQ.
Google Cloud Pub/Sub.
RabbitMQ.
Amazon MQ.
KubeMQ.
PubSub+
Google Cloud Dataflow.

Thousands of companies are built on Kafka

Today, Kafka is used by thousands of companies including over 60% of the Fortune 100. Among these are Box, Goldman Sachs, Target, Cisco, Intuit, and more. As the trusted tool for empowering and innovating companies, Kafka allows organizations to modernize their data strategies with event streaming architecture.

Hope you’ve got an overview on Kafka. We’ll walk-through in detail about Kafka in the next coming article. Will catch you up with the next article. Please feel free to reach out with new ideas for the coming articles.

Do check out my other articles,

Introduction to Node.js — A Quick Start

In this blog I’d like to give a complete idea from A-Z on Node.js. This includes

smsstemburu.medium.com

Thanks for reading :)