Real-time Twitter Analysis

In this tutorial, we will use TwitterStreaming API to get Twitter’s tweets in real time and publish it to a Kafka topic. We then use Spark as a Consumer to read Twitter data from Kafka and analyse that data. To

Statistical analysis with Spark DataFrame

If you have used Python or participated in a Kaggle competition, you should be familiar with Pandas library. Pandas provides many functions that help users easily perform analysis on a given dataset. Spark DataFrame also support similar functions but process

Publishing Tweeter’s data to a Kafka topic

In the previous tutorial (A basic Kafka program), we wrote two Java classes: SimpleProducer and SimpleConsumer. SimpleProducer acts like a Kafka Producer that send messages (numbers from 0 to 9) to ‘test’ topic and SimpleConsumer is a Kafka Consummer that

Integrating Kafka with Spark using Structured Streaming

In the previous tutorial (Integrating Kafka with Spark using DStream), we learned how to integrate Kafka with Spark using an old API of Spark – Spark Streaming (DStream) . In this tutorial, we will use a newer API of Spark,

Sending and receiving messages between Akka Actors

As mentioned in the tutorial Introduction to Apache Akka, Akka Actors communicate with each other using messages, which are any type of object but they must be immutable. In general, objects like String, Int, Boolean are all immutable objects. In

Connecting Kafka to Cassandra Sink

The connection of Kafka to other databases is normally divided into Source Connector and Sink Connector. Source Connector is used to read data from Databases and publish it to Kafka broker while Sink Connector is used to write from Kafka

Life cycle of an actor

In this tutorial, we will write a program that demonstrates the life cycle of an Akka actor by overriding the following four functions: preStart(), preRestart(), postStop() and postRestart().The use of these four functions can be described using the following chart

Introduction of SMACK Stack

SMACK Stack is the collection of five BigData tools, including Spark, Mesos, Akka, Cassandra and Kafka (all are open source tools). This term was first introduced in 2015 when a group of programmers meet at a conference with the participation

Lambda Architecture with SMACK Stack

This is a mini-project developed with the aim of helping readers apply the content of tutorial series into practice by developing a big data processing system. In this project, we will use SMACK stack (Spark, Akka, Cassandra, and Kafka) to

Implementing the Batch Layer of Lambda Architecture

In the previous post, we implemented the first part of our system (Data collection and storage). In this post, we will implement the second part of our system, which is Batch Layer of Lambda Architecture. The Batch Processing of our

Implementing the Speed Layer of Lambda Architecture

In this post, we will implement the third part of our system, which is the Speed Layer of Lambda architecture. We will use Spark Structured Streaming to read data from Kafka’s “TwitterStreaming” topic and analyze this data in real time.

Implementing the Serving Layer of Lambda Architecture

In this post, we will implement the fourth part of our system, which is the Serving Layer of Lambda architecture. We will use Akka Http to create a REST API that allows users to retrieve the processing results by accessing

Writing Spark applications with Scala

In this tutorial series, we will learn how to write Spark applications to process and analyze big data. Spark supports multiple programming languages such as Scala, Java, Python and R, but its native language is Scala. Therefore, we will use

Read more

Everybody is a genius. But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid.

Albert Einstein

Nothing is IMPOSSIBLE, the word itself says ' I'M POSSIBLE'

Audrey Hepburn

Need help to build your own project ?

Contact us