Setup Mesos-DNS

2016-01-31 21_18_49-MarathonOver the last month I have been evaluating container clustering software. I started with Kubernetes, Rancher which uses swarm and Mesos. I am going through these evaluations to determine which container clustering software will fit my employer’s needs best.

ENVIRONMENT CENTOS 7.0 running three Mesos masters and two Mesos slaves

services: zookeeper, marathon, mesos-master

services: zookeeper, marathon, mesos-master

services: zookeeper, marathon, mesos-master

services: mesos-slave

services: mesos-slave

STEP 1. Prerequisites install golang and git

$ yum install go git
$ export GOPATH=$HOME/go
$ export PATH=$PATH:$GOPATH/bin
$ go get

$ go get
$ go get
$ go get

STEP 2. Clone the mesos-dns repository and build the mesos-dns binary.

$ git clone
$ cd ./mesos-dns
$ go build -o mesos-dns

After building mesos-dns you should have a mesos-dns binary file in your
./mesos-dns directory

STEP 3. In the ./mesos-dns directory there is a config.json.sample example file.
Copy this file and edit it for your own environment.

$ cp config.json.sample config.json

ThisĀ link describes the each of the fields in the config.json file.

  "zk": "zk://,,",
  "masters": ["","",""],
  "stateTimeoutSeconds": 300,
  "refreshSeconds": 60,
  "ttl": 60,
  "domain": "mesos",
  "ns": "ns1",
  "port": 53,
  "resolvers": [""],
  "timeout": 5,
  "listener": "",
  "SOAMname": "root.ns1.mesos",
  "SOARname": "ns1.mesos",
  "SOARefresh": 60,
  "SOARetry":   600,
  "SOAExpire":  86400,
  "SOAMinttl": 60,
  "dnson": true,
  "httpon": true,
  "httpport": 8123,
  "externalon": true,
  "recurseon": true,
  "IPSources": ["mesos", "host"],
  "EnforceRFC952": false

STEP 4. Run the mesos-dns with the config.json file to verify it is properly formatted.

$ ./mesos-dns -config=config.json
On the mesos slave create a directory for the config.json file.
I have designated as the mesos-dns server for my
$ mkdir /etc/mesos-dns

STEP 5. Copy the mesos-dns binary to the mesos slave which you have designated as the mesos-dns server. In this example I copy the mesos-dns service to mesos slave mesos04.

$ scp ./mesos-dns/mesos-dns

STEP 6. Configure the constraints for the mesos-dns service. This essentially tells the marathon to constrain the mesos-dns service to host For example, you may want to designate two nodes in your cluster to run mesos-dns. The constrains directive ensures that mesos-dns does not try to run on other hosts.


STEP 7. Update the network-script file with IP address of the host running mesos-dns.

$ vim /etc/sysconfig/network-scripts/ifcfg-ens160

STEP 8. After updating the network-script file restart the network service

systemctl restart network

STEP 9. If you have any applications running in marathon you should be able to look them up using mesos-dns. For example, I had a application named nodehello2. I was able to resolve the application using mesos-dns.

$ nslookup nodehello2.marathon.mesos

Name:   nodehello2.marathon.mesos
Name:   nodehello2.marathon.mesos

2016-01-31 21_13_24-2016-01-31 21_13_05-kube.txt - Notepad.png - Greenshot image editor

STEP 10. Additional verification can be done by hitting the node hello world app end point using the application name http://nodehello2.marathon.mesos with curl.

[root@mesos04 mesos-dns]$ docker ps
CONTAINER ID        IMAGE                             COMMAND                  CREATED             STATUS              PORTS                     NAMES
2f6d8a4f99fd   "/bin/sh -c '/node/bi"   35 hours ago        Up 35 hours>8081/tcp   mesos-a78b235a-8427-4743-9bcc-5d6aed338412-S3.3698d9f9-a25a-457a-8602-50d9c26e70a7
38ca56e041f3   "/bin/sh -c '/node/bi"   35 hours ago        Up 35 hours>8081/tcp   mesos-a78b235a-8427-4743-9bcc-5d6aed338412-S3.e16a1e3e-a662-40da-b353-318de55178dc

[root@mesos04 mesos-dns]$ curl http://nodehello2.marathon.mesos:31495
Version 2.0
Hello World
[root@mesos04 mesos-dns]$ curl http://nodehello2.marathon.mesos:31884
Version 1.0
Hello World

STEP 11. You can also return the ports of the application. For example, nodehello2 is running on port 31472 on s2.marathon.slave.mesos and port 31495 on s3.marathon.slave.mesos.

[root@mesos04 mesos-dns]$ dig _nodehello2._tcp.marathon.mesos SRV

_nodehello2._tcp.marathon.mesos. 60 IN  SRV     0 0 31472 nodehello2-uhq4s-s2.marathon.slave.mesos.
_nodehello2._tcp.marathon.mesos. 60 IN  SRV     0 0 31495 nodehello2-sbk5j-s3.marathon.slave.mesos.

Setup OpenVPN

OpenVPN was surprisingly easy to setup on in my lab environment. My setup included a CentOS 7 server running the latest version of OpenVPN server and a Windows 7 client running the latest OpenVPN client. I also have a Netgear router which I configured with a static route.

Before we begin you will need certificates
1. A computer to run OpenVPN (I used CentOS 7)
2. OpenVPN server will need a certificate
3. OpenVPN client will need a certificate
4. A home router which can be configured with static routes
5. A way to generate certificates for your vpn server and clients

Download the openVPN source code and compile it into an RPM

rpmbuild -tb /root/openvpn-2.3.8.tar.gz
rpm -ivh /root/rpmbuild/RPMS/x86_64/openvpn-2.3.8-1.x86_64.rpm

OpenVPN Server Configuration

Copy the sample server.conf file to /etc/openvpn/server.conf
Here is a list of settings I configured from the defaults server.conf file to get my OpenVPN server working.

# Which TCP/UDP port should OpenVPN listen on?
# If you want to run multiple OpenVPN instances
# on the same machine, use a different port
# number for each one. You will need to
# open up this port on your firewall.
port 1194

# TCP or UDP server?
proto udp

# "dev tun" will create a routed IP tunnel,
# "dev tap" will create an ethernet tunnel.
# Use "dev tap0" if you are ethernet bridging
# and have precreated a tap0 virtual interface
# and bridged it with your ethernet interface.
# If you want to control access policies
# over the VPN, you must create firewall
# rules for the the TUN/TAP interface.
# On non-Windows systems, you can give
# an explicit unit number, such as tun0.
# On Windows, use "dev-node" for this.
# On most systems, the VPN will not function
# unless you partially or fully disable
# the firewall for the TUN/TAP interface.

dev tun

# SSL/TLS root certificate (ca), certificate
# (cert), and private key (key). Each client
# and the server must have their own cert and
# key file. The server and all clients will
# use the same ca file.
# See the "easy-rsa" directory for a series
# of scripts for generating RSA certificates
# and private keys. Remember to use
# a unique Common Name for the server
# and each of the client certificates.
# Any X509 key management system can be used.
# OpenVPN can also use a PKCS #12 formatted key file
# (see "pkcs12" directive in man page).
ca lab-ca.pem
cert vpnserver.pem
key vpnserver-key-nopass.pem # This file should be kept secret

# Diffie hellman parameters.
# Generate your own with:
# openssl dhparam -out dh1024.pem 1024
# Substitute 2048 for 1024 if you are using
# 2048 bit keys.
dh dh1024.pem

# Configure server mode and supply a VPN subnet
# for OpenVPN to draw client addresses from.
# The server will take for itself,
# the rest will be made available to clients.
# Each client will be able to reach the server
# on Comment this line out if you are
# ethernet bridging. See the man page for more info.

# Push routes to the client to allow it
# to reach other private subnets behind
# the server. Remember that these
# private subnets will also need
# to know to route the OpenVPN client
# address pool (
# back to the OpenVPN server.
push "route"

# Certain Windows-specific network settings
# can be pushed to clients, such as DNS
# or WINS server addresses. CAVEAT:
push "dhcp-option DNS"

# The keepalive directive causes ping-like
# messages to be sent back and forth over
# the link so that each side knows when
# the other side has gone down.
# Ping every 10 seconds, assume that remote
# peer is down if no ping received during
# a 120 second time period.
keepalive 10 120

# Select a cryptographic cipher.
# This config item must be copied to
# the client config file as well.
cipher AES-128-CBC # AES

OpenVPN client Configuration
Download the OpenVPN client from here.
OpenVPN client configuration is saved here on Windows:
C:\Program Files\OpenVPN\config\client.ovpn

Here is a list of OpenVPN client settings I configured to get my OpenVPN client connected.

# Specify that we are a client and that we
# will be pulling certain config file directives
# from the server.

# Use the same setting as you are using on
# the server.
# On most systems, the VPN will not function
# unless you partially or fully disable
# the firewall for the TUN/TAP interface.
;dev tap
dev tun

# Are we connecting to a TCP or
# UDP server? Use the same setting as
# on the server.
;proto tcp
proto udp

# The hostname/IP and port of the server.
# You can have multiple remote entries
# to load balance between the servers.
# put my public IP here
remote 71.XX.XX.XXX 1194

# SSL/TLS parms.
# See the server config file for more
# description. It's best to use
# a separate .crt/.key file pair
# for each client. A single ca
# file can be used for all clients.
ca lab-ca.pem
cert usercert.pem
key key-pass.pem

# Select a cryptographic cipher.
# If the cipher option is used on the server
# then you must also specify it here.
cipher AES-128-CBC

Generate Certificates:
I generated my certificates using a Microsoft 2012 Certificate Authority. I generated one for certificate for the VPN server and another for the VPN client. I exported them from Microsoft CA in PFX format and used this Guide to convert them to PEM format.

My openVPN server certificate properties:
Subject Alternative Name=vpn
Subject Alternative
Subject Alternative Name=test02

My openVPN user certificate properties:
CN=user OU=WAU OU=US DC=lab DC=net

On the OpenVPN server copy the PEM files to /etc/openvpn/
On the OpenVPN Windows client copy the PEM files to C:\Program Files\OpenVPN\config\

Router Configuration
Configure your home router with a static route to the OpeVPN server on your home network
VPN client subnet:
OpenVPN Server:
2015-11-29 19_45_08-NETGEAR Router WNDR3400v2

Start the OpenVPN service on the OpenVPN server

systemctl start openvpn

Test Client Connection
On Windows 7 I noticied it was required to run the OpenVPN as administrator
Program Manager_2015-11-30_19-53-49

If you where successfully connected you should see “client is now connected”

Configure Rundeck to use Active Directory Authentication

This guide was written using the rundeck 2.4.2 RPM installed on CentOS 6.5. I go over the steps needed to setup Active Directory authentication in Rundeck

STEP 1. CREATE Active Directory Group

In Active Directory create a new group named “rundeckusers.” Then add your users to that AD group.

STEP 2. Create jaas-activedirectory.conf file

touch /etc/rundeck/jaas-activedirectory.conf
chown rundeck:rundeck /etc/rundeck/jaas-activedirectory.conf

Enter the following configuration settings into your jaas-ldap.conf file. You will need to configure the username/password for the user which will bind to Active Directory. You will also need to configure the userBaseDn. This is the OU which recursive searches for users will be performed on. In addition, configuring the roleBaseDn. The roleBaseDn is the OU where your “rundeck” AD user group is.

activedirectory {
    com.dtolabs.rundeck.jetty.jaas.JettyCachingLdapLoginModule required

STEP 3. Modify /etc/rundeck/profile

You’ll need to configure / modify to two lines. Add the path to the jaas-activedirectory.conf file and the loginmodule name, “activedirectory.” The login module name is the same as the name used in the jaas-activedirectory.conf file.

export RDECK_JVM=" \"

STEP 4. Create file /etc/rundeck/rundeckusers.aclpolicy
Add the ACL policy below for the admin in Rundeck. The group field should be the Active Directory user group “rundeckusers.” All users in the AD group with have admin access in rundeck.

touch /etc/rundeck/rundeckusers.aclpolicy
chown rundeck:rundeck /etc/rundeck/rundeckusers.aclpolicy
description: Admin project level access control. Applies to resources within a specific project.
  project: '.*' # all projects
    - equals:
        kind: job
      allow: [create] # allow create jobs
    - equals:
        kind: node
      allow: [read,create,update,refresh] # allow refresh node sources
    - equals:
        kind: event
      allow: [read,create] # allow read/create events
    - allow: [read,run,runAs,kill,killAs] # allow running/killing adhoc jobs
    - allow: [create,read,update,delete,run,runAs,kill,killAs] # allow create/read/write/delete/run/kill of all jobs
    - allow: [read,run] # allow read/run for nodes
  group: [rundeckusers]


description: Admin Application level access control, applies to creating/deleting projects, admin of user profiles, viewing projects and reading system information.
  application: 'rundeck'
    - equals:
        kind: project
      allow: [create] # allow create of projects
    - equals:
        kind: system
      allow: [read] # allow read of system info
    - equals:
        kind: user
      allow: [admin] # allow modify user profiles
    - match:
        name: '.*'
      allow: [read,import,export,configure,delete] # allow full access of all projects or use 'admin'
    - allow: [read,create,update,delete] # allow access for /ssh-key/* storage content

  group: [rundeckusers]

STEP 5. Configure Secure LDAP
Import the CA certificate which was used to setup Secure LDAP on the Active Directory Domain Controller. To secure the LDAP connection between the rundeck server and the AD domain controller it is recommended to import and trust the CA used on the domain controller. Then configure the jaas-ldap.conf file to use ldaps.

keytool -import -alias -file /root/CA.pem -keystore /usr/lib/jvm/java-1.7.0-openjdk- -storepass changeit

Quickly create selinux policies using audit2allow

Recently I was configuring MySQL in a high availability configuration when I encountered problems with getting my keepalived health check script to work.

I have two MySQL servers configured in Master/Master replication with a VIP (keepalived) which floats between the two servers. We only write to one of the masters using the VIP. The goal is to have a fail over of the VIP occur if the primary server becomes unreachable.

I created my health check script and configured Keepalived to use the script to check on Mysql. Below is snippet of code from my keepalived.conf config file. I would test the fail over by shutting down Mysql to force a fail over of the VIP to occur however the fail over was not occurring. When I would run keepalived as root from the console the VIP fail over process would work. I started to suspect a permissions or selinux issue.

vrrp_script check_mysql {
script /opt/mysql/
interval 2
timeout 3

track_script {

Introduce audit2allow, this tool reads the audit logs and creates selinux allow policies off of failed audits.

yum install /usr/bin/audit2allow 

I grep the audit.log file to find failures. Then wrote down context which was being denied.

grep /var/log/audit/audit.log 

After finding all the denied contexts I used audit2allow to create allow polices.

grep keepalived_t /var/log/audit/audit.log | audit2allow -M keepalived_t
grep root_t /var/log/audit/audit.log | audit2allow -M root_t
grep tmp_t /var/log/audit/audit.log | audit2allow -M tmp_t
grep mysqld_port_t /var/log/audit/audit.log | audit2allow -M mysqld_port_t

semodule -i keepalived_t.pp
semodule -i root_t.pp
semodule -i tmp_t.pp
semodule -i mysqld_port_t.pp

After creating the allow polices the health checking script would run successfully and a VIP fail over would occur in the event MySQL went down.

CentOS 7 Join Active Directory Domain

Before you begin ensure that the DNS on the Linux computer you wish to join to the domain is pointed to a the Active Directory server. Active Directory relies heavily on DNS to function.

STEP 1. Ensure the following packages are installed

yum -y install realmd sssd oddjob 
oddjob-mkhomedir adcli samba-common 

STEP 2. From the computer you will join to the domain run realm discover to verify connectivity to the domain controllers.

[root@test02 ~] realm discover LAB.NET
  type: kerberos
  realm-name: LAB.NET
  configured: kerberos-member
  server-software: active-directory
  client-software: sssd
  required-package: oddjob
  required-package: oddjob-mkhomedir
  required-package: sssd
  required-package: adcli
  required-package: samba-common
  login-formats: %U
  login-policy: allow-realm-logins

STEP 3. Join Active Directory domain, you must use an account which has privileges to join a computer the domain.

[root@test02 ~] realm join -U adminuser LAB.NET

STEP 4. Verify you can retrieve directory information for user

[root@test02 ~] id LAB\\ktest
uid=522401118(ktest) gid=522400513(domain users) 
groups=522400513(domain users)

STEP 5. Verify the ability to perform a su to an Active Directory user

[root@test02 ~] su - ktest
Last login: Sun Sep 20 05:21:42 CDT 2015 on pts/0
[ktest@test02 ~]$

STEP 6. To remove the requirement of fully qualifying the Active Directory username edit the sssd.conf file. After this change you will not be required to use DOMAIN\\ when logging in as an Active Directory user.

[root@test02 ~] vi /etc/sssd/sssd.conf
use_fully_qualified_names = False
[root@test02 ~] systemctl restart sssd 

Adding nodes to rundeck

I am still gaining operational knowledge of rundeck. Rundeck is an awesome job scheduling tool. Recently I was required to setup a job which is scheduled to run on a remote node. To perform this task you must edit the resource.xml file under the project directory. For this to work it is required that you setup ssh key pairs between the two servers. Check out this link from Digital Ocean on setting up ssh key pairs


Sample node added to the resources.xml file

  <node name="servername" description="Dev MySQL" tags="" hostname="servername" osArch="amd64" osFamily="unix" osName="Linux" osVersion="2.6.32-504.8.1.el6.x86_64" username="userAccount"/>

After adding the node to rundeck you must restart the service for the node to be recognized.

service rundeckd restart

Resolved: rundeck out of memory errors

I recently resolved an issue where rundeck would fail to run a Talend ETL job. Because rundeck and Talend both use the Java JVM I was unsure where the message was bubbling up from. After increasing -Xmx in the Talend job and rerunning the job still failed. I then increased -Xmx in /etc/rundeck/profile which resulted in the job completing successfully.

Below is the error message I received in the rundeck console.

Failed dispatching to node localhost: java.lang.OutOfMemoryError: Java heap space
14:10:16			Execution failed: 8155: [Workflow result: , step failures: {1=Dispatch failed on 1 nodes: [localhost: Unknown: java.lang.OutOfMemoryError: Java heap space]}, Node failures: {localhost=[Unknown: java.lang.OutOfMemoryError: Java heap space]}, flow control: Continue, status: failed

Increased the JVM memory settings from 1024m to 6144m.
vim /etc/rundeck/profile

RDECK_JVM="$RDECK_JVM -Xmx6144m -Xms256m -XX:MaxPermSize=256m -server"
#RDECK_JVM="$RDECK_JVM -Xmx1024m -Xms256m -XX:MaxPermSize=256m -server"