In this tutorial, we will start writing an example of Spark application using DataFrame. At first, we create a Scala Object with the name DataFrameEx  as described in the tutorial Run a basic program. In order to use Spark, we need to create a Spark session using SparkSession.builder().getOrCreate(). We can also configure Spark with different properties such as the number of core, number of thread, memory,…However in this example, we only config two properties that are the name of Spark application and cluster manager (“local” means that Spark will run on local machine with only one worker thread, “local[K]” means running Spark on local machine with K worker threads and “local[*]” means running Spark on local machine with as many worker threads as logical cores on that machine). For more details of Spark properties, please visit Spark configuration .

Next, we create a DataFrame that read data from a json file and perform some operations on this DataFrame as follows (people.json file is a json file in  spark/example directory which is obtained after extracting Spark file):

Run the above code, we get the result:

We can also create a SQL temporary view from DataFrame and use .sql() function to run SQL queries as follows:

Run the above code, we get the result:

So, we have finished writing a simple Spark application using DataFrame to extract information from a given dataset.  The full code of this tutorial can be found below. Note: this is only a basic example, a first step that help you get used to writing a Spark applications. In the next tutorials, we will show you more complex examples that utilize powerful functions of Spark.

December 12, 2018
ITechSeeker