In the previous tutorial (Install Apache Spark), we learned how to install Spark on a local machine and ran a small test to check the installation. In this tutorial, we will show you how to set up Spark using master/slave model with 01 master node and 02 slave node (all are installed on Ubuntu)

The installation will be divided into two parts: the first part will show you the common steps need to be done on all three nodes and the second part is only for master node

1. Common setup on all three nodes (need to be done on each node)

–  Install Spark as described on the tutorial Install Apache Spark

– Open /etc/hosts file and add the following lines to this file (replace <MASTER-IP> with the  actual IP address of the node) :

– Install OpenSSH with the command:

2. Setup on master node only

a) Setup the connection between master and slave with SSH as follows:

– Create a key-pair on Master node with the command below :

– Copy and rename /.ssh/id_rsa.pub file to /.ssh/authorized_keys with the following command:

– Copy authorized_keys file to /.ssh folder on 2 slaves

– Check the connection between master and slave by opening a terminal on master node and try the following command:

b) Spark configuration

– In the conf folder of spark, rename spark-env.sh.template to spark-env.sh and slaves.template to slaves

– Open spark-env.sh file and add two lines belows:

For example:

– Open slaves file and add the following lines (delete ‘localhost’):

3. Run Spark (only on master node)

– We can run Spark on both master và slaver by running /sbin/start-all.sh on the master node as follows (use stop-all.sh to stop running Spark):

-Check the launch of Spark by visiting http://localhost:8080/ , we will see the list of running slave in the Worker section. Alternatively, we can check if Spark is running by using terminal with the command “jps” as below:

November 10, 2018
ITechSeeker