In the previous post, we implemented the first part of our system (Data collection and storage). In this post, we will implement the second part of our system, which is Batch Layer of Lambda Architecture. The Batch Processing of our
If you have used Python or participated in a Kaggle competition, you should be familiar with Pandas library. Pandas provides many functions that help users easily perform analysis on a given dataset. Spark DataFrame also support similar functions but process
In the tutorial A basic Spark application using DataFrame, we used DataFrame to read and extract information from a json file(people.json). In this tutorial, we will learn how to use Spark to read data from different types of data source.
In this tutorial, we will start writing an example of Spark application using DataFrame. At first, we create a Scala Object with the name DataFrameEx as described in the tutorial Run a basic program. In order to use Spark, we
In this tutorial, we will write a WordCount program that count the occurrences of each word in a stream data received from a Data server. We will use Netcat to simulate the Data server and the WordCount program will use
In this tutorial series, we will learn how to use Spark SQL, Dataset and DataFrames. SparkSQL is one of the five main components of Spark that is developed for structured data processing. We can interact with SparkSQL via SQL, DataFrames