Implementing the Batch Layer of Lambda Architecture

In the previous post, we implemented the first part of our system (Data collection and storage). In this post, we will implement the second part of our system, which is Batch Layer of Lambda Architecture. The Batch Processing of our

Statistical analysis with Spark DataFrame

If you have used Python or participated in a Kaggle competition, you should be familiar with Pandas library. Pandas provides many functions that help users easily perform analysis on a given dataset. Spark DataFrame also support similar functions but process

Loading data from various sources

In the tutorial A basic Spark application using DataFrame, we used DataFrame to read and extract information from a json file(people.json). In this tutorial, we will learn how to use Spark to read data from different types of data source.

A basic Spark application using DataFrame

In this tutorial, we will start writing an example of Spark application using DataFrame. At first, we create a Scala Object with the name DataFrameEx  as described in the tutorial Run a basic program. In order to use Spark, we

A WordCount program

In this tutorial, we will write a WordCount program that count the occurrences of each word in a stream data received from a Data server. We will use Netcat to simulate the Data server and the WordCount program will use

Spark SQL, Dataset and DataFrames

In this tutorial series, we will learn how to use Spark SQL, Dataset and DataFrames. SparkSQL  is one of the five main components of Spark that is developed for structured data processing. We can interact with SparkSQL via SQL, DataFrames