Tuesday, August 1, 2017

Spark on DC/OS Part 2 - Installing specific versions of Spark

Spark is great for development...the problem is with each version great new features are added (funny calling new features a problem), and old features behavior changes.  The Hadoop distributions and other environments which support running Spark in cluster mode run just 1 and if you are fortunate to be running the Hortonworks distribution, you can run 2 versions of Spark.  The problem is there are lots of different Spark versions out there, each with their issues.  
DC/OS provides a nice solution to the Spark versioning problem by allowing you to install multiple Spark versions on a single cluster.  Furthermore, you can define different roles and quotas for each Spark version deployed to the cluster allowing for fine grained organization control.
This blog posting describes the techniques to install multiple Spark versions by installing multiple Spark Frameworks to your DC/OS cluster.
The first thing to note is that it is not possible to install versions other the currently supported version using the DC/OS GUI. To install a prior version it is necessary to utilize the DC/OS command line.
  1. List all the spark versions available to the cluster Execute the command shown below
    dcos package describe spark --package-versions
    Upon successful completion of the command above, you should see the following JSON output:
    [
    "1.1.0-2.1.1",
    "1.0.9-2.1.0-1",
    "1.0.9-1.6.3-1",
    "1.0.8-2.1.0-1",
    "1.0.7-2.1.0",
    "1.0.6-2.0.2",
    "1.0.5-1.6.3",
    "1.0.4-2.0.1",
    "1.0.3-2.0.1",
    "1.0.2-2.0.0",
    "1.0.1-2.0.0",
    "1.0.1-1.6.2",
    "1.0.1-1.6.1-2",
    "1.0.0-1.6.1-2",
    "1.6.1-6",
    "1.6.1-5",
    "1.6.1-4",
    "1.6.1-3",
    "1.6.1-2",
    "1.6.1-1",
    "1.6.1",
    "1.6.0",
    "1.5.0-multi-roles-v2"
    ]
    
  2. Select the version to install and invoke the dcos package install command
    dcos package install spark --package-version=1.0.1-1.6.2 --app-  id=/spark162
    
    Executing the command above instructs the DC/OS installer to select the Spark version 1.6.2 provided via the Spark framework 1.0.1. If the installation is successful, you should see the following output:
    Installing Marathon app for package [spark] version [1.0.1-1.6.2] with app id [/spark162]
    Installing CLI subcommand for package [spark] version [1.0.1-1.6.2]
    New command available: dcos spark
    DC/OS Spark is being installed!
    
     Documentation: https://docs.mesosphere.com/spark-1-7/
     Issues: https://docs.mesosphere.com/support/
    
  3. Review the list of installed Spark versions
    ~/dcos_scripts >dcos package list spark
    NAME   VERSION      APP        COMMAND  DESCRIPTION
    spark  1.0.1-1.6.2  /spark     spark    Spark is a fast and general cluster computing system for Big Data.  Documentation: https://docs.mesosphere.com/usage/managing-services/spark/
                     /spark162
    
    If all successful, you will see the default APP name created earlier in this document as well as the Spark 1.6.2 version instance.

Spark on DC/OS Part 1 - Installing the Spark Framework using the GUI

Installing the Spark Framework on DC/OS GUI

Installing Spark from the Universe is possible via either the DC/OS GUI or the using the DCOS Package installer from the command line.
The following sections describe in detail some of the different Spark framework installation options available.


Approach #1: Installing Spark from the DC/OS GUI

Installing the GUI is by far the easiest way to quickly deploy the latest Mesosphere supported Spark version to your DC/OS cluster.
  1. Locate the desired Spark package from the the DC/OS Universe
    Selecting the 'Universe' option from the left side menu then clicking 'Packages' provides a full list of available DC/OS frameworks to your cluster. To facilitate location of the Spark framework to install, just type 'spark' in the search input box at the top of the screen. This will display a list of all DC/OS framework packages affiliated with Spark in some way. For our example, we will select the first package titled 'spark'.
  2. Initiating the Spark Package installation process
    For this example we are going to just accept the default Spark framework configuratons as that is sufficient for most initial Spark program executions. Though as you get more comfortable with Spark on DC/OS you will want to explore the 'Advanced Installation' options to configure power features such as Kerberos, DC/OS Roles and default container images among a few of the more commonly used configuration options.
    After clicking the 'INSTALL PACKAGE' button from the screen above, very quickly afterwards you will see a SUCCESS message with details on how to get more information about the framework. Make a point to copy and save the URL presented here for future reference.
  3. Monitoring the Spark Package installation process and verification that it is available for execution
    The Spark Framework is not quite yet able to receive your spark programs for execution, as the framework is going through a validation and deployment process. If the Spark framework cycles between 'deploying' and 'waiting' that is probably an indicator that you have a Spark Framework configuration problem or insufficient resources (But, for this example, everything should be fine) for the framework deployment.
    We can tell by the diagram above that the Spark Framework was successfully deployed as for the Spark service name the status is green and we can see that it is running 1 instance.
  4. Install the Spark Command Line Interface (CLI) option
Your DC/OS cluster is now ready to run Spark programs; however, there is one more step required before you can actually submit a spark jobs. The last step is to install the Spark CLI option to facilitate submission of Spark jobs. To complete this last step, execute the following command:

dcos package install --cli spark


Approach #2: Installing the Spark Framework using the DC/OS Command Line
While the GUI Universe package installation option provides an easy way to setup services such as Spark within DC/OS, a more powerful tool for installations is the Command Line Interface (CLI). Using the CLI to install packages makes it easier to quickly setup services via bash scripts.
To install the spark package with a custom 'app-id', execute the following command:
dcos package install spark --app-id=/accntngspark
Unlike package installation using the GUI, using that the COS command line package install option installs both the spark package as well as at CLI in one step as shown below.
~/dcos_scripts >dcos package install spark --app-id=/accntngspark
Installing Marathon app for package [spark] version [1.1.0-2.1.1] with app id [/accntngspark]
Installing CLI subcommand for package [spark] version [1.1.0-2.1.1]
New command available: dcos spark
DC/OS Spark is being installed!

    Documentation: https://docs.mesosphere.com/service-docs/spark/
    Issues: https://docs.mesosphere.com/support/
You now have a fully functional Spark instance named /acctngspark.