Saturday, February 20, 2016

5 Steps to get started running Spark on YARN with a Hadoop Cluster




Spark and Hadoop are the top topics these days.  The good news is Spark 1.5.2 runs on Hortonworks Hadoop (HDP) 2.3.  


This blog will run through the hello world style steps to run the Spark examples using YARN and HDP as well as explore the tools necessary to monitor the examples executing in ‘cluster’ mode.


Before we get too far into the actual steps this blog assumes the following pre-conditions:
  • You have a Hadoop cluster with both Spark and Ambari installed and you have access to the standard Spark Examples.  As a point of reference, this blog was created on a 5 node cluster running HDP 2.3 with 1 master node and 4 data nodes.
  • Your Hadoop cluster has multiple nodes.  The example will run on the Hortonworks Sandbox, but the monitoring capabilities described in this blog are much better if you have more than one node.

Running the Spark examples in YARN:
    1. Verify Spark has been installed on your HDP cluster instance and is accessible in the services drop down in the left side of the screen.



If Spark does not appear in the services list above, then click on the Actions button above, select Spark from the list of services as shown below and click the next button.


Follow the default instructions as presented by Ambari and then reset the other Ambari services as requested by Ambari.
At this point in time you should have a working Spark service on HDP with YARN.


    1. Go into HDP Ambari
      and check which servers are running Spark client



To run our example we need to identify which servers Spark is running.  You can do this by clicking on the ‘Spark’ service name and then selecting the ‘Spark Client’ option in the Summary section as shown below.




Once you have clicked on the Spark Client link, you will then see the hosts where the Spark Client is running.


In my sample, Spark Client is running on server5.hdp.  As every cluster will be different you will need to check for your server name and then in all of the subsequent steps, and replace ‘server5.hdp’ with your actual Spark Client server name for the subsequent steps contained in this blog.


Before moving to the next enter a terminal and then ssh into the Spark client machine:
ssh hdfs@server5.hdp

    1. Go to the Spark client directory



Our next step is to ‘cd’ into the ‘spark-client’ directory.
cd /usr/hdp/current/spark-client


At this directory are all of the standard spark distribution directories.


    1. Submit the SparkPi example over YARN

A good Spark example to checkout your Spark cluster instance is the SparkPi example.  The example is CP
./bin/spark-submit --class org.apache.spark.examples.SparkPi \
   --master yarn \
   --deploy-mode cluster \
   --driver-memory 4g \
   --executor-memory 2g \
   --executor-cores 4 \
   --queue default \
   lib/spark-examples*.jar \
   100


The example above is the standard SparkPi example on YARN with one important change; I modified the task count from 10 up to 100 in order to have more monitoring data available for review.


The key configurations to run a Spark job on a YARN cluster are:
  • master – Determines how to run the job.  As we want for this blog review to execute Spark in YARN, the ‘yarn’ value has been selected for the example above.  The other options available include ‘mesos’ and ‘standalone’
  • deploy-mode – We selected ‘cluster’ to run the above SparkPi example within the cluster.  To run the problem outside of the cluster, then select the ‘client’ option.
  • driver-memory – The amount memory available for the driver process. In a YARN cluster Spark configuration the Application Master runs the driver.
  • executor-memory – The amount of memory allocated to the executor process
  • executor-cores – the total number of cores allocated to the executor process
  • queue – The YARN queue name on which this job will run.  If you have not already defined queues to your cluster, it is best to utilize the ‘default’ queue.


    1. SparkPi example monitoring



The SparkPi example we submitted in the last step can be monitored while executing by checking out the ResourceManager UI from the Ambari -> YARN -> QuickLinks


From the ResourceManager UI screen as shown below it is possible to see the current execution status and statistics.  For example we see that the job is consuming 47.5% of the cluster (nothing else was running at the time), and that 3 containers are being used to run the job.  Just as with any other YARN application we can click the Application ID or Tracking UI link associated with the job to get more information about the jobs progress.




Once the SparkPi job has completed execution you will see a FinalStatus of ‘SUCCEEDED’ if everything was successful or FAILED if the job did not complete.


Clicking on the ‘History’ link associated with this job takes you to the Spark History screen


Selecting the job’s description link will take you the Sparks Job Detail page.
The SparkPi example job is incredibly simplistic, but for a real world Spark application you would want to review this screen to better under stand how the job was allocating resources between the stages.  Then for more fine grained job results details, cluck on the Completed Stages Description ‘reduce at SparkPi.scala:36’.




We see in the example above that the SparkPi example utilized 2 of the 5 nodes defined to my test cluster, though it looks like most of the work was allocated to the server4.hdp instance.  Looking at the statistics we see that each of the servers processed 50 tasks, but server5.hdp in this example just required a little more time.  This could be attributable to the fact that the Spark client is also running on this node.  The good news is the tooling exists with Spark and HDP to dig deep into your Spark executed YARN cluster jobs to diagnosis and tune as required.

You have now run your first Spark example on a YARN cluster with Ambari. The process as you see is easy, so now you are ready to move forward with your own applications.

6 comments:

Raju said...

I gathered many useful informations about this topic. Really very useful for learning the skills and will continue your blog reading in the future.
Big data analytics training in T nagar
Big data analytics training in Porur
Hadoop training in Velachery
Hadoop training in Adyar
Hadoop training in OMR
Hadoop training in chennai
Big data analytics training in Adyar
Big data training in chennai

sheela rajesh said...

This blog is full of Innovative ideas.surely i will look into this insight.please add more information's like this soon.
Hadoop Training in Chennai
Big data training in chennai
big data training in velachery
JAVA Training in Chennai
Python Training in Chennai
SEO Training in Chennai
Hadoop training in chennai
Big data training in chennai
big data training in velachery

sandeep saxena said...

Good information. Its very useful for my study. Keep posting more information like this.
Drupal Training in Chennai
Drupal 7 Certification
Drupal Training in Tambaram
Photoshop Classes for Photographers
Photoshop Training Classes in Chennai
learn photoshop
Manual Testing Training in Chennai
Mobile Testing Training in Chennai

sandeep saxena said...

Good information. Its very useful for my study. Keep posting more information like this.
Drupal Training in Chennai
Drupal 7 Certification
Drupal Training in Tambaram
Photoshop Classes for Photographers
Photoshop Training Classes in Chennai
learn photoshop
Manual Testing Training in Chennai
Mobile Testing Training in Chennai

Kerrthika K said...

It was good explanation and it looks more impressive!thank you for sharing precious information with us..
Ionic Training in Chennai
Ionic training course 
informatica mdm training
Informatica Training Institute in Chennai
IoT courses in Chennai
Data Analytics Courses in Chennai
Blockchain Training in Chennai

priyanka usha said...

Looks like great blog and got many information from this blog keep it up.
French Classes in Chennai
French Language Classes in Chennai
German Language Classes in Chennai
IELTS Training in Chennai
Japanese Language Course in Chennai
spanish language course in chennai
German classes in anna nagar
spoken english class anna nagar