Apache Spark can be installed on many different OSs such as Windows, Ubuntu, Fedora, Centos,…This tutorial will show you how to install Spark on both Windows and Ubuntu. However, it is recommended to install and deploy Apache Spark on Linux based OS (Eg. Ubuntu, Fedora) as Spark is developed based on Hadoop ecosystem.

I. Install Apache Spark on Windows.

1. Download and install Java

– Download and install JDK at here

– Create a System variable named JAVA_HOME in Environment Variable (Control Panel -> System -> Advanced System settings in the upper left corner) with the value is the path of jdk folder(Eg: C:/Program Files/Java/jdk1.8.0_191)

– Add the value of jdk’s bin folder (%JAVA_HOME%bin) to Path enviroment variable
– Click save and then open Command Prompt to check Java installation with the command “java -version”.

2. Download and install Scala

– Download and install Scala at here (select Binary for windows)

– Create SCALA_HOME and Path variables (similar to Java installation)

– Check the installation using the command: scala -version.

3. Download Spark

– Download Spark at here and extract the downloaded file

– Create SPARK_HOME and Path as above

4. Download Winutils

– Download Winutils at here

– Select the version of Haddoop that compatible with the version of Spark and download winutils.exe file

– Save winutils.exe file in a particular folder and create HADOOP_HOME variable with the value is the path of that folder (if the error: Unable to load Winutils occurs, we might need to put Winutils in the bin folder)

5. Change the access permission of tmp/hive

– After completing all above steps, a folder named tmp/hive will be created on C drive and we need to change access permission of this folder to avoid the error of running Spark.

– To change the access permission, we open Command Prompt and use Winutils with chmod command as follows (777: grant read, write and execute permissions):                 

6. Check the installation

We will run a small Spark code to see if the installation is completed:

– Open Command Prompt (cmd).

– Type the command: spark-shell

– Use the code below to test Spark:

II. Install Apache Spark on ubuntu

1. Download and install Java

– Open a terminal and use the command below to install jdk:                     

– Check java installation with the command: java -version

We can create JAVA_HOME and PATH variables on Ubuntu by opening ~/.bashrc file with the command: “gedit ~/.bashrc”, then adding two lines below to the end of the file.

Save the file and open another terminal,  use the command: “source ~/.bashrc” to apply change on the ~/.bashrc file to the current system environment.

2. Download and install Scala

– Use the following command to install Scala: sudo apt-get install scala

– Check the installation with the command: scala -version

3. Download and install Spark

– Simmilar to the installation on Windows, we download Spark at here

– Extract downloaded file to a particular folder (Eg: ~/Workspace/BigData/Tools)

– Add the bin folder of Spark to PATH by opening bashrc file with the command: “gedit ~/.bashrc” and add the following line:

– Use the command: “source ~/.bashrc” to apply the change of bashrc file to the current system enviroment.

– Check the installation of Spark using the code below:

                

November 8, 2018
ITechSeeker