Áp dụng các kỹ thuật thống kê, phân tích và các quy tắc ngữ pháp để xử lý dữ liệu đầu vào nhằm trích xuất các thông tin cần thiết, có giá trị cao....
Real-time Twitter Analysis
In this tutorial, we will use TwitterStreaming API to get Twitter’s tweets in real time and publish it to a Kafka topic. We then use Spark as a Consumer to read Twitter data from Kafka and analyse that data. To
Statistical analysis with Spark DataFrame
If you have used Python or participated in a Kaggle competition, you should be familiar with Pandas library. Pandas provides many functions that help users easily perform analysis on a given dataset. Spark DataFrame also support similar functions but process
Publishing Tweeter’s data to a Kafka topic
In the previous tutorial (A basic Kafka program), we wrote two Java classes: SimpleProducer and SimpleConsumer. SimpleProducer acts like a Kafka Producer that send messages (numbers from 0 to 9) to ‘test’ topic and SimpleConsumer is a Kafka Consummer that
Integrating Kafka with Spark using Structured Streaming
In the previous tutorial (Integrating Kafka with Spark using DStream), we learned how to integrate Kafka with Spark using an old API of Spark – Spark Streaming (DStream) . In this tutorial, we will use a newer API of Spark,
Sending and receiving messages between Akka Actors
As mentioned in the tutorial Introduction to Apache Akka, Akka Actors communicate with each other using messages, which are any type of object but they must be immutable. In general, objects like String, Int, Boolean are all immutable objects. In
Connecting Kafka to Cassandra Sink
The connection of Kafka to other databases is normally divided into Source Connector and Sink Connector. Source Connector is used to read data from Databases and publish it to Kafka broker while Sink Connector is used to write from Kafka
Life cycle of an actor
In this tutorial, we will write a program that demonstrates the life cycle of an Akka actor by overriding the following four functions: preStart(), preRestart(), postStop() and postRestart().The use of these four functions can be described using the following chart
Introduction of SMACK Stack
SMACK Stack is the collection of five BigData tools, including Spark, Mesos, Akka, Cassandra and Kafka (all are open source tools). This term was first introduced in 2015 when a group of programmers meet at a conference with the participation
Lambda Architecture with SMACK Stack
This is a mini-project developed with the aim of helping readers apply the content of tutorial series into practice by developing a big data processing system. In this project, we will use SMACK stack (Spark, Akka, Cassandra, and Kafka) to
Implementing the Batch Layer of Lambda Architecture
In the previous post, we implemented the first part of our system (Data collection and storage). In this post, we will implement the second part of our system, which is Batch Layer of Lambda Architecture. The Batch Processing of our
Implementing the Speed Layer of Lambda Architecture
In this post, we will implement the third part of our system, which is the Speed Layer of Lambda architecture. We will use Spark Structured Streaming to read data from Kafka’s “TwitterStreaming” topic and analyze this data in real time.
Implementing the Serving Layer of Lambda Architecture
In this post, we will implement the fourth part of our system, which is the Serving Layer of Lambda architecture. We will use Akka Http to create a REST API that allows users to retrieve the processing results by accessing
Writing Spark applications with Scala
In this tutorial series, we will learn how to write Spark applications to process and analyze big data. Spark supports multiple programming languages such as Scala, Java, Python and R, but its native language is Scala. Therefore, we will use
Read moreEverybody is a genius. But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid.
Nothing is IMPOSSIBLE, the word itself says ' I'M POSSIBLE'