Tuesday, August 1, 2017

Spark on DC/OS Part 1 - Installing the Spark Framework using the GUI

Installing the Spark Framework on DC/OS GUI

Installing Spark from the Universe is possible via either the DC/OS GUI or the using the DCOS Package installer from the command line.
The following sections describe in detail some of the different Spark framework installation options available.


Approach #1: Installing Spark from the DC/OS GUI

Installing the GUI is by far the easiest way to quickly deploy the latest Mesosphere supported Spark version to your DC/OS cluster.
  1. Locate the desired Spark package from the the DC/OS Universe
    Selecting the 'Universe' option from the left side menu then clicking 'Packages' provides a full list of available DC/OS frameworks to your cluster. To facilitate location of the Spark framework to install, just type 'spark' in the search input box at the top of the screen. This will display a list of all DC/OS framework packages affiliated with Spark in some way. For our example, we will select the first package titled 'spark'.
  2. Initiating the Spark Package installation process
    For this example we are going to just accept the default Spark framework configuratons as that is sufficient for most initial Spark program executions. Though as you get more comfortable with Spark on DC/OS you will want to explore the 'Advanced Installation' options to configure power features such as Kerberos, DC/OS Roles and default container images among a few of the more commonly used configuration options.
    After clicking the 'INSTALL PACKAGE' button from the screen above, very quickly afterwards you will see a SUCCESS message with details on how to get more information about the framework. Make a point to copy and save the URL presented here for future reference.
  3. Monitoring the Spark Package installation process and verification that it is available for execution
    The Spark Framework is not quite yet able to receive your spark programs for execution, as the framework is going through a validation and deployment process. If the Spark framework cycles between 'deploying' and 'waiting' that is probably an indicator that you have a Spark Framework configuration problem or insufficient resources (But, for this example, everything should be fine) for the framework deployment.
    We can tell by the diagram above that the Spark Framework was successfully deployed as for the Spark service name the status is green and we can see that it is running 1 instance.
  4. Install the Spark Command Line Interface (CLI) option
Your DC/OS cluster is now ready to run Spark programs; however, there is one more step required before you can actually submit a spark jobs. The last step is to install the Spark CLI option to facilitate submission of Spark jobs. To complete this last step, execute the following command:

dcos package install --cli spark


Approach #2: Installing the Spark Framework using the DC/OS Command Line
While the GUI Universe package installation option provides an easy way to setup services such as Spark within DC/OS, a more powerful tool for installations is the Command Line Interface (CLI). Using the CLI to install packages makes it easier to quickly setup services via bash scripts.
To install the spark package with a custom 'app-id', execute the following command:
dcos package install spark --app-id=/accntngspark
Unlike package installation using the GUI, using that the COS command line package install option installs both the spark package as well as at CLI in one step as shown below.
~/dcos_scripts >dcos package install spark --app-id=/accntngspark
Installing Marathon app for package [spark] version [1.1.0-2.1.1] with app id [/accntngspark]
Installing CLI subcommand for package [spark] version [1.1.0-2.1.1]
New command available: dcos spark
DC/OS Spark is being installed!

    Documentation: https://docs.mesosphere.com/service-docs/spark/
    Issues: https://docs.mesosphere.com/support/
You now have a fully functional Spark instance named /acctngspark.