Sunday, November 26, 2017

Managing DC/OS Placement Constraints..Or How to get the nodes you want for your processes

One of the strengths to DC/OS is its ability to place services on Agent nodes based on Cluster Placement rules or agent resource availability. The first step is to define agent attributes specifying for example where which VM Host or Rack the Agent node is placed and whether the agent node has any special resources such as GPUs for example. As part of this example let's assume that you have 6 agents assigned to your cluster on 2 Racks (we would follow the same example if the agents were assigned to a specific VM Host) based on the table below:
IP Address for Agent (replace with your agent node IP list)Rack IdGPU

Setting the DC/OS Agent Node Attributes

To enable DC/OS and Apache Mesos to properly manage the cluster based on the above definitions (by the way, we can assign any sort of attribute to our agent nodes, the above represents just one of the most common), we will need to define the attributes for each of the 7 agents and then restart the agent service as described in the steps below for each of the agent nodes:
  1. SSH into the agent node using the DC/OS command Line
    dcos node ssh --master-proxy --private-ip={IP Address from table above}
~/dcos_sdk/dcos-commons/frameworks/kafka >dcos node ssh --master-proxy --private-ip=
Running ssh -A -t core@ ssh -A -t core@ cd /var/
Last login: Mon Nov 27 00:11:50 UTC 2017 from on pts/0
Container Linux by CoreOS stable (1235.12.0)
Update Strategy: No Reboots
Failed Units: 2
core@ip-10-0-2-99 ~ $
  1. enter into the /var/lib/dcosdirectory and create the attribute file mesos-slave-common
    cd /var/lib/dcos
    vi mesos-slave-common
  2. Add attributes to the file with the following pattern:
    You can define as many attributes as you would like so long as each key:value pair is separated with ';' character. So for example for the first node in our table, we would assign as follows to the MESOS_ATTRIBUTES property:
  3. Now that the MESOS_ATTRIBUTES are defined it is necessary to remove the current Mesos attribute file then restart and validate the dcos-mesos-slave service with the following commands for the private-agent nodes:
    sudo rm -f /var/lib/mesos/slave/meta/slaves/latest
    sudo systemctl restart dcos-mesos-slave.service
    sudo systemctl status dcos-mesos-slave.service
    For the public-agent-nodes you will follow the commands below:
sudo rm -f /var/lib/mesos/slave/meta/slaves/latest
sudo systemctl restart dcos-mesos-slave-public.service
sudo systemctl status dcos-mesos-slave-public.service
  1. Deleting the metata/slaves/latest file forces Mesos to reset the node attributes based on the current mesos-slave-common definition, even after the cluster has been installed. That stated it is recommended to perform the systemctl restart dcos-mesos-slave.service on a select group of nodes at one time to ensure sufficient resources exist to keep the active services and tasks running. In other words, DON'T RESTART ALL THE AGENT NODES AT ONE TIME. Also, it is important to
  2. When the service restarted then you will see the Systemctl status below:
    sudo systemctl status dcos-mesos-slave.service
    dcos-mesos-slave.service - Mesos Agent: distributed systems kernel agent
    Loaded: loaded (/opt/mesosphere/packages/mesos--b561eb0a7d13cda8a36d4fc014e35aefe97b24d9/; enabled; vendor preset: disabled)
    Active: active (running) since Mon 2017-11-27 01:04:15 UTC; 3s ago
    Process: 20094 ExecStartPre=/bin/bash -c for i in /proc/sys/net/ipv4/conf/*/rp_filter; do echo 2 > $i; echo -n "$i: "; cat $i; done (code=exited, status=0/SUCCESS)
    Process: 20091 ExecStartPre=/opt/mesosphere/bin/ /var/lib/dcos/mesos-resources (code=exited, status=0/SUCCESS)
    Process: 20072 ExecStartPre=/opt/mesosphere/bin/bootstrap dcos-mesos-slave (code=exited, status=0/SUCCESS)
    Process: 20064 ExecStartPre=/bin/ping -c1 ready.spartan (code=exited, status=0/SUCCESS)
    Main PID: 20108 (mesos-agent)
     Tasks: 15
    Memory: 3.3M
       CPU: 514ms
    CGroup: /system.slice/dcos-mesos-slave.service
            └─20108 /opt/mesosphere/packages/mesos--b561eb0a7d13cda8a36d4fc014e35aefe97b24d9/bin/mesos-agent
  3. Now that the dcos-mesos-slave service has restarted, go into the DC/OS GUI Nodes screen first to confirm that the just now changed Agent node is healthy. If the recently changed nodes have some form of error indicator beside them or if the HEALTH is 'unhealthy', then double check your typing in the mesos-slave-common file for typos.
As we see the first node we modified is in fact healthy per the screen capture above, we are now ready to click on the node we just changed ('').
Then click on the 'Details' link at the top of the page so we can confirm that our two new Node level attributes are defined as expected.
Here we see that the rack is correctly set ton "A1" and we are saying there are no GPUs available on this node. You can now repeat these 5 steps for each of the agent nodes to complete the example.

Defining Placement Constraints to leverage the Agent Node Attributes

To continue our example, let's define two services:
  • rack-service: Which can only get placed on a Unique Rack. So in our example, scaling to 2 would result in one of these services placed on the A1 rack and the other on the B2 rack.
  • rack-b2-gpu-service: Which can only get placed on a DC/OS agent node where the 'rack' attribute is equal to the value 'B2' and also has a GPU resource.
Before proceeding, please make certain you have defined the attributes as specified in the agent node definition table found at the top of the table. If do not have all of the nodes defined with the attributes used for the placement constraint, then you will see your service waiting forever for a resource as shown in the image below:
To specify our Placement Constraint for the first service, lets execute the following steps:
  1. Go into the 'Services' screen and add a 'single container' service, giving it the name 'rack-a1-service', Then click on 'MORE SETTINGS' to expose the 'Placement Constraints' screen
  2. Click on 'ADD PLACEMENT CONSTRAINTS', then enter the placement rule as shown below:
    For our first service we just want to reference the attribute 'rack' which was defined earlier in this article. The beauty of the placement constraint implementation is we can use any combination of DC/OS agent attributes already defined to the cluster. Once we have entered the placement constraint as specified above and have the service the way we would like it, then just click 'Review & Run' followed by 'RUN SERVICE'.
    Looking at the instance definition for our new "rack-service" we see that it has deployed to whatever nodes is available, which frankly is not that interesting. However, by increasing the scale to 2 should see one of the services allocated to the nodes with the 'rack' attribute equal to "A1" and the other where it is equal to "B2".
Ok, Now lets look at our second more complex example which has 2 Placement constraints. For this example, we are going to follow the same basic steps as for our earlier example, except that we will have two rules for our placement constraints; rack=B2 and GPU=True as shown below:
Once we restart the service with the above placement constraint we should end up being deployed to the node which is on the B2 rack and has GPUs.

Tuesday, August 1, 2017

Spark on DC/OS Part 2 - Installing specific versions of Spark

Spark is great for development...the problem is with each version great new features are added (funny calling new features a problem), and old features behavior changes.  The Hadoop distributions and other environments which support running Spark in cluster mode run just 1 and if you are fortunate to be running the Hortonworks distribution, you can run 2 versions of Spark.  The problem is there are lots of different Spark versions out there, each with their issues.  
DC/OS provides a nice solution to the Spark versioning problem by allowing you to install multiple Spark versions on a single cluster.  Furthermore, you can define different roles and quotas for each Spark version deployed to the cluster allowing for fine grained organization control.
This blog posting describes the techniques to install multiple Spark versions by installing multiple Spark Frameworks to your DC/OS cluster.
The first thing to note is that it is not possible to install versions other the currently supported version using the DC/OS GUI. To install a prior version it is necessary to utilize the DC/OS command line.
  1. List all the spark versions available to the cluster Execute the command shown below
    dcos package describe spark --package-versions
    Upon successful completion of the command above, you should see the following JSON output:
  2. Select the version to install and invoke the dcos package install command
    dcos package install spark --package-version=1.0.1-1.6.2 --app-  id=/spark162
    Executing the command above instructs the DC/OS installer to select the Spark version 1.6.2 provided via the Spark framework 1.0.1. If the installation is successful, you should see the following output:
    Installing Marathon app for package [spark] version [1.0.1-1.6.2] with app id [/spark162]
    Installing CLI subcommand for package [spark] version [1.0.1-1.6.2]
    New command available: dcos spark
    DC/OS Spark is being installed!
  3. Review the list of installed Spark versions
    ~/dcos_scripts >dcos package list spark
    spark  1.0.1-1.6.2  /spark     spark    Spark is a fast and general cluster computing system for Big Data.  Documentation:
    If all successful, you will see the default APP name created earlier in this document as well as the Spark 1.6.2 version instance.

Spark on DC/OS Part 1 - Installing the Spark Framework using the GUI

Installing the Spark Framework on DC/OS GUI

Installing Spark from the Universe is possible via either the DC/OS GUI or the using the DCOS Package installer from the command line.
The following sections describe in detail some of the different Spark framework installation options available.

Approach #1: Installing Spark from the DC/OS GUI

Installing the GUI is by far the easiest way to quickly deploy the latest Mesosphere supported Spark version to your DC/OS cluster.
  1. Locate the desired Spark package from the the DC/OS Universe
    Selecting the 'Universe' option from the left side menu then clicking 'Packages' provides a full list of available DC/OS frameworks to your cluster. To facilitate location of the Spark framework to install, just type 'spark' in the search input box at the top of the screen. This will display a list of all DC/OS framework packages affiliated with Spark in some way. For our example, we will select the first package titled 'spark'.
  2. Initiating the Spark Package installation process
    For this example we are going to just accept the default Spark framework configuratons as that is sufficient for most initial Spark program executions. Though as you get more comfortable with Spark on DC/OS you will want to explore the 'Advanced Installation' options to configure power features such as Kerberos, DC/OS Roles and default container images among a few of the more commonly used configuration options.
    After clicking the 'INSTALL PACKAGE' button from the screen above, very quickly afterwards you will see a SUCCESS message with details on how to get more information about the framework. Make a point to copy and save the URL presented here for future reference.
  3. Monitoring the Spark Package installation process and verification that it is available for execution
    The Spark Framework is not quite yet able to receive your spark programs for execution, as the framework is going through a validation and deployment process. If the Spark framework cycles between 'deploying' and 'waiting' that is probably an indicator that you have a Spark Framework configuration problem or insufficient resources (But, for this example, everything should be fine) for the framework deployment.
    We can tell by the diagram above that the Spark Framework was successfully deployed as for the Spark service name the status is green and we can see that it is running 1 instance.
  4. Install the Spark Command Line Interface (CLI) option
Your DC/OS cluster is now ready to run Spark programs; however, there is one more step required before you can actually submit a spark jobs. The last step is to install the Spark CLI option to facilitate submission of Spark jobs. To complete this last step, execute the following command:

dcos package install --cli spark

Approach #2: Installing the Spark Framework using the DC/OS Command Line
While the GUI Universe package installation option provides an easy way to setup services such as Spark within DC/OS, a more powerful tool for installations is the Command Line Interface (CLI). Using the CLI to install packages makes it easier to quickly setup services via bash scripts.
To install the spark package with a custom 'app-id', execute the following command:
dcos package install spark --app-id=/accntngspark
Unlike package installation using the GUI, using that the COS command line package install option installs both the spark package as well as at CLI in one step as shown below.
~/dcos_scripts >dcos package install spark --app-id=/accntngspark
Installing Marathon app for package [spark] version [1.1.0-2.1.1] with app id [/accntngspark]
Installing CLI subcommand for package [spark] version [1.1.0-2.1.1]
New command available: dcos spark
DC/OS Spark is being installed!

You now have a fully functional Spark instance named /acctngspark.