Today I would like to write few basic concepts of Kafka Streams which any non technical person can read and understand, I am from a not technical background and i used to find difficulty to understand these technical terms.
So lets get started!!!
In todays world almost everyone uses Twitter, the popular social network service.
Using Twitter application, user can interact and post messages in their account, see messages of other users and comment on them. This popular social network allows users to interact and post the message and these messages are known as Tweets.
Do you know how many tweets daily twitter handles??? its on an average 500 million tweets per day.
Isn't it huge, Of course it is, how is these data managed and processed ? The real-time nature of Twitter forced the business to adopt Kafka.
Concept 1: What is Kafka?
Apache Kafka is an open-source distributed streaming platform that enables data to be transferred at high throughput with low latency. Kafka is often used in real-time streaming data architectures to provide real-time analytics. Kafka is known for its quality of being fast, scalable, durable, and fault tolerant messaging system.
Concept 2: What is Kafka Stream?
Kafka Stream is represented as continues flow of data, where data is getting updated frequently. A stream in Kafka, is an ordered, replay able, and fault-tolerant sequence of immutable data records.
Take the example of twitter, tweets are constantly flowing as data stream.
The stream of data needs to be fed from some node right, that leads to our next concept “Stream Processing”
Concept 3: What is Stream Processing?
A stream processor is a node and it represents a processing step to transform data in streams by receiving one input record at a time from its upstream processors in the topology, applying its operation to it. Also, may subsequently produce one or more output records to its downstream processors.
Here comes the benefit of Kafka Streaming , it allows parallel processing of data. Stream processing allows few applications to exploit a limited form of parallel processing more simply and easily.
Concept 4: What is Processing Topology?
A processor topology is a graph of stream processors (nodes) that are connected by streams (edges). Similar to other stream processing systems, the topology in Kafka Streams defines from where to read the data , how the data will be processed and where to send the data ahead in the pipeline. It has mainly three types of nodes — Source, Processor and Sink, connected by edges called Streams.
Source Processor : It is the start point (input stream) of the topology , there is no other processor before this , the task of source processor is to consume records from one or multiple Kafka topics and forwarding them to its down-stream processors.
Sink Processor : This stream processor does not have down-stream processors. It sends any received records from its up-stream processors to a specified Kafka topics.
Concept 5: What is Stream Processing Applications?
A stream processing application is any program which uses everything we discussed above, it uses Kafka Streams library and defines the logic via one or more processor topologies.