In this post, we will implement the third part of our system, which is the Speed Layer of Lambda architecture. We will use Spark Structured Streaming to read data from Kafka’s “TwitterStreaming” topic and analyze this data in real time.
Lambda architecture is a data processing architecture introduced by Nathan Marz . It takes the advantages of both batch processing and stream-processing to handle a large amount of data effectively. Lambda architecture consists of 3 layers: Batch layer, Speed layer,
In this tutorial, we will use TwitterStreaming API to get Twitter’s tweets in real time and publish it to a Kafka topic. We then use Spark as a Consumer to read Twitter data from Kafka and analyse that data. To