Tuesday, August 1, 2017

Spark on DC/OS Part 2 - Installing specific versions of Spark

Spark is great for development...the problem is with each version great new features are added (funny calling new features a problem), and old features behavior changes.  The Hadoop distributions and other environments which support running Spark in cluster mode run just 1 and if you are fortunate to be running the Hortonworks distribution, you can run 2 versions of Spark.  The problem is there are lots of different Spark versions out there, each with their issues.  
DC/OS provides a nice solution to the Spark versioning problem by allowing you to install multiple Spark versions on a single cluster.  Furthermore, you can define different roles and quotas for each Spark version deployed to the cluster allowing for fine grained organization control.
This blog posting describes the techniques to install multiple Spark versions by installing multiple Spark Frameworks to your DC/OS cluster.
The first thing to note is that it is not possible to install versions other the currently supported version using the DC/OS GUI. To install a prior version it is necessary to utilize the DC/OS command line.
  1. List all the spark versions available to the cluster Execute the command shown below
    dcos package describe spark --package-versions
    Upon successful completion of the command above, you should see the following JSON output:
  2. Select the version to install and invoke the dcos package install command
    dcos package install spark --package-version=1.0.1-1.6.2 --app-  id=/spark162
    Executing the command above instructs the DC/OS installer to select the Spark version 1.6.2 provided via the Spark framework 1.0.1. If the installation is successful, you should see the following output:
    Installing Marathon app for package [spark] version [1.0.1-1.6.2] with app id [/spark162]
    Installing CLI subcommand for package [spark] version [1.0.1-1.6.2]
    New command available: dcos spark
    DC/OS Spark is being installed!
     Documentation: https://docs.mesosphere.com/spark-1-7/
     Issues: https://docs.mesosphere.com/support/
  3. Review the list of installed Spark versions
    ~/dcos_scripts >dcos package list spark
    spark  1.0.1-1.6.2  /spark     spark    Spark is a fast and general cluster computing system for Big Data.  Documentation: https://docs.mesosphere.com/usage/managing-services/spark/
    If all successful, you will see the default APP name created earlier in this document as well as the Spark 1.6.2 version instance.