CouchBase and NUMA

This analysis was performed on CentOS 6.5 and Couchbase 2.5.

NUMA (None Uniform Memory Access) is something which many system administrators often overlooked when configuring memory intensive applications on multi-socket and multi-core machines.
NUMA is a computer memory design where memory access is time dependent because of the physical layout of computer circuitry. I believe Fig 1 provides a good explanation of how remote memory access slow processing time in application which cache large amounts of data in memory. For example, Couchbase aggressively caches all data documents in memory. Each NUMA Node is composed of memory banks and CPU socket(s).

Fig. 1
NUMA

What couchbases’s stance on configuring numactl for couch?
At present Couchbase does not implement any NUMA related optimization, which can be observed by issuing the numactl command. In Fig. 2, NUMA Node 0 has 21,422MB free while Node 1 has 27,385MB free. This results in a memory imbalance across nodes.

To check if your system is affect install numactl.

[root@cb02] yum install numactl

After installing numactl, issue the command “numactl –hardware”
Fig. 2

[root@cb02] numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22
node 0 size: 32722 MB
node 0 free: 21422 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23
node 1 size: 32768 MB
node 1 free: 27385 MB
node distances:
node 0 1
0: 10 20
1: 20 10

On Couchbase node two in the Couchbase cluster the NUMA imbalance is even worse.

[root@cb01] numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22
node 0 size: 32722 MB
node 0 free: 19500 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23
node 1 size: 32768 MB
node 1 free: 27532 MB
node distances:
node 0 1
0: 10 20
1: 20 10

To correct the issue we must modify the couchbase-server startup script to enable NUMA interleave.

vi /etc/init.d/couchbase-server

Line before editing init script couchbase-server

daemon --user couchbase "$DAEMON -- -noinput -detached > /opt/couchbase/var/lib/couchbase/logs/start.log 2>&1"

After adding “numactl –interleave all” to the beginning of the line

daemon --user couchbase "numactl --interleave all $DAEMON -- -noinput -detached > /opt/couchbase/var/lib/couchbase/logs/start.log 2>&1"

From the numactl MAN pages interleaving memory will be allocated in a round robin architecture.
“Set a memory interleave policy. Memory will be allocated using round robin on
nodes. When memory cannot be allocated on the current interleave target fall
back to other nodes.”

After updating the couchbase-server init script issue a restart

[root@cb01] service couchbase-server restart

Rerun numactl to verify that the NUMA memory nodes are in balanace.

[root@cb01] numactl --hardware

Resources: http://support.couchbase.com/entries/44029184-NUMA-Configuration

Advertisements

Creating a keytab file for kerberos authentication on Linux

This guide was created on CentOS 6
You will need the krb5-workstation package installed

yum install krb5-workstation

Create a keytab file for kerberos authentication for the user testuser1:

[user1@vm01 ~]$ ktutil
	ktutil:  addent -password -p testuser1@CORP.COMPANY.NET -k 1 -e aes256-cts
  	Password for testuser1@CORP.COMPANY.NET: [enter your password]
  	ktutil:  wkt testuser1.keytab
  	ktutil:  quit 

Initialize the key tab file to retrieve the kerberos ticket:

[user1@vm01 ~]$ kinit testuser1@CORP.COMPANY.NET -k -t ./testuser1.keytab 

Verify the kerberos ticket has been initialized

[user1@vm01 ~]$ klist
Ticket cache: FILE:/tmp/krb5cc_16777216_kbQnZ2
Default principal: testuser1@CORP.COMPANY.NET

Valid starting     Expires            Service principal
10/22/14 07:23:58  10/22/14 17:23:58  krbtgt/CORP.COMPANY.NET@CORP.COMPANY.NET
	renew until 10/29/14 07:23:58

Applications which are running under the profile which the Kerberos ticket are initialized should now be able to use the Kerberos ticket.

reference: https://kb.iu.edu/d/aumh

Enable auditing for Windows Firewall

Recently when troubleshooting a new IIS application deployment, I realized how helpful the windows firewall auditing feature is. The IIS application was having difficulty connecting to SQL Server. I had allowed outbound connecting to the SQL Server from IIS. However, after enabling the Windows auditing on packet filtering I discovered that connections back to the IIS server on port 1434 were being blocking.

windowsfirewallaudit

Example of a failure audit
failureaudit

Configuring an NFS auto mount

This guide was created using CentOS 6.
The example configuration consists of SERVER01 and SERVER02.
This guide was written for people who have some basic Linux experience.

Connect to the Linux Server acting as the NFS server SERVER01 using your favorite ssh client.
Then use vi to open the /etc/exports file.

vi /etc/exports

Add the following entry into the /etc/exports file.
This configuration accomplishes the following.
/mnt/volume is exposed to the NFS client SERVER02
The NFS client SERVER02 can read and write to /mnt/volume on
rw = read write access
ro = read only
no_root_squash = persist root access between systems
sync = causes all buffered modifications to file metadata and data to be written to the underlying file systems
Additional information can be found in the nfs man pages

/mnt/volume               SERVER02(rw,sync)

Run the following command on SERVER01 to reread the export file change (/etc/exports)

/usr/sbin/exportfs -ra

ssh into server SERVER02 to setup auto mounting of NFS export on SERVER01
Auto mounting, automatically connects and disconnects from the NFS export whenever it is in use or not in use. In the example below we are connecting to the NFS mount point on SERVER01 using read write.
The directory volume will automatically be created under /mnt on SERVER01 when you connect.

vi /etc/auto.misc
volume           -rw,soft,rsize=8192,wsize=8192,timeo=14,intr SERVER01:/mnt/volume

Use vi to edit the /etc/auto.master file. The master file defines which root directory to auto mount remote file systems on. It functions as a key / value pair. /mnt is the root mount point for file systems defined in /etc/auto.msic with a timeout value of 500 seconds.

vi /etc/auto.master
/mnt    /etc/auto.misc  --timeout 500

Reload the autofs service, so that the new configuration becomes active

service autofs reload

To verify the NFS auto mount point is working.

cd /mnt/volume 

If you run into problems, check the Linux systems log. This is the first place I look when troubleshooting.

tail -n 30 /var/log/messages

Configure Load balancer using keepalived and Nginx

The following web load balancer configuration is made up of Nginx, CentOS 6.5 and Keepalived.

Nginx is a highly scalable web server.

Keepalived’s website : The main goal of this project is to provide simple and robust facilities for loadbalancing and high-availability to Linux system and Linux based infrastructures. Loadbalancing framework relies on well-known and widely used Linux Virtual Server (IPVS) kernel module providing Layer4 loadbalancing.

Check out my previous post for compiling and configuring Keepalived

Load balancers:

lb01.lab.net - 172.16.1.2
lb02.lab.net - 172.16.1.3
virtual IP - 172.16.1.4

Web Servers:

web01.lab.net - 172.16.1.20
web02.lab.net - 172.16.1.30

Syslog Server:

syslog.lab.net - 172.16.1.31

STEP 1. Configure Keepalived
Primary Keepalived configuration file: /etc/keepalived/keepalived.conf
Once Keepalived.conf is configured you can scp ./keepalived.conf user@lb02:/etc/keepalived/keepalived.conf
On lb02.lab.net change the line “state BACKUP” to “state MASTER”. One server will act as the master and the other as backup.


! Configuration File for keepalived

global_defs {
   notification_email {
     ctatro@lab.net
   }
   notification_email_from loadbalancer01@lab.net
   smtp_server smtp.lab.net
   smtp_connect_timeout 30
   router_id loadbalancer01
}

vrrp_script chk_proxy {
        script '/usr/bin/killall -0 nginx';
        interval 2
        weight 2
}

vrrp_instance VI_169 {
    state BACKUP
    interface eth0
    virtual_router_id 169
    priority 150
    advert_int 5
    smtp_alert
    authentication {
        auth_type PASS
        auth_pass 9999
    }
    virtual_ipaddress {
        172.16.1.4/24 brd 172.16.1.255 dev eth0
        }

    track_script {
        chk_proxy
    }

        preempt_delay 300

}

STEP 2. Backup Keepalive config file: /etc/keepalived/keepalived.conf

! Configuration File for keepalived

global_defs {
   notification_email {
     ctatro@lab.net
   }
   notification_email_from loadbalancer02@lab.net
   smtp_server smtp.lab.net
   smtp_connect_timeout 30
   router_id loadbalancer02
}

vrrp_script chk_proxy {
        script '/usr/bin/killall -0 nginx';
        interval 2
        weight 2
}

vrrp_instance VI_169 {
    state BACKUP
    interface eth0
    virtual_router_id 169
    priority 100
    advert_int 5
    smtp_alert
    authentication {
        auth_type PASS
        auth_pass 9999
    }
    virtual_ipaddress {
        172.16.1.4/24 brd 172.16.1.255 dev eth0
        }

    track_script {
        chk_proxy
    }

        nopreempt

        preempt_delay 300

}

STEP 3. Start Keepalived
On lb01 and lb02 run the following to start keepalived

service keepalived start

STEP 4. Verify Virtual IP is UP
Verify that the virtual IP address is up by running ip addr show.
You should see a secondary IP on eth0.

[root@lb01 ctatro]$ ip addr show
1: lo: LOOPBACK,UP,LOWER_UP mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:50:56:89:30:54 brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.2/24 brd 10.196.63.255 scope global eth0
    inet 172.16.1.4/24 brd 10.196.63.255 scope global secondary eth0
    inet6 fe80::250:56ff:fe89:3054/64 scope link
       valid_lft forever preferred_lft forever

STEP 5. Niginx.conf configuration
Nginx global configuration file: /etc/nginx/nginx.conf
The nginx.conf file should be identical on lb01.lab.net and lb02.lab.net.

# configure 1 worker_processes per core
user  nginx;
worker_processes  4;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;


# worker_rlimit_nofile load balancer: worker_rlimit_nofile * worker_connections =
# make sure your ulimit nofile in limits.conf is set high enough
worker_rlimit_nofile 40960;

events {
        use epoll;
        # accept X connections per worker process
        worker_connections  10240;
        # accept more than 1 connection at a time
        multi_accept on;
}
timer_resolution  500ms;


http {
    include       /etc/nginx/mime.types;
   default_type  application/octet-stream;
   log_format  main  '$remote_addr\t$remote_user\t[$time_local]\t$request\t'
                      '$status\t$body_bytes_sent\t$http_referer\t'
                      '$http_user_agent\t$http_x_forwarded_for\t'
                        'Req_Time\t$request_time\tWeb_srv_rsp_time\t$upstream_response_time\tWeb_srv_IP:\t$upstream_addr\tReq_size:\t$request_length

\tHTTP_content_size:\t$http_content_length\tSSL_cipher:\t$ssl_cipher\tSSL_protocol:\t$ssl_protocol';

        access_log  /var/log/nginx/access.log  main;

        sendfile        on;
        tcp_nopush     on;
        # timeout during which a keep-alive client connection will stay open on the server side
        keepalive_timeout  30;
        #Sets the maximum number of requests that can be served through one keep-alive
        #connection. After the maximum number of requests are made, the connection is closed.
        keepalive_requests 100000;
        tcp_nodelay on;


        # Caches information about open FDs, freqently accessed files.
        open_file_cache max=10000 inactive=20s;
        open_file_cache_valid 30s;
        open_file_cache_min_uses 2;
        open_file_cache_errors on;

        #gzip  on;

        include /etc/nginx/conf.d/*.conf;
}

[root@lb01 ctatro]#

STEP 6. Nginx loadbalancer configuration
The cluster01.conf file should be identical on lb01.lab.net and lb02.lab.net.
nginx site configuration file: /etc/nginx/conf.d/cluster01.conf
In this configuration file we have HTTP and HTTPS backend web servers configured.
HTTP requests will be forwarded on to port 80 on the upstream servers. And HTTPS requests will be terminated by Nginx then re encrypted and forwarded on to the upstread HTTPS web servers.


# HTTPS WEB POOL
upstream secureProd {
        server web01.lab.net:443 weight=10 max_fails=3 fail_timeout=3s;
        server web02.lab.net:443 weight=10 max_fails=3 fail_timeout=3s;

        keepalive 100;
}

# HTTP WEB POOL
upstream httpProd {
        server web01.lab.net:80 weight=10 max_fails=3 fail_timeout=3s;
        server web02.lab.net:80 weight=10 max_fails=3 fail_timeout=3s;

        keepalive 100;
}


### START HTTPS SERVER
server {

        # backlog = worker_connections setting
        listen 172.16.1.4:443 ssl backlog=65535;

        ssl on;
        ssl_certificate /etc/ssl/certs/cert.pem;
        ssl_certificate_key /etc/ssl/certs/cert_private_key.pem;

        ssl_protocols  SSLv3 TLSv1 TLSv1.1 TLSv1.2;
        ssl_ciphers RC4:HIGH:!aNULL:!MD5:!kEDH;
        ssl_prefer_server_ciphers   on;

        server_name webcluster.corp.domain.net;

        access_log      /var/log/nginx/webcluster_https.access.log main;
        error_log       /var/log/nginx/webcluster_https.error.log debug;

        access_log     syslog:server=syslog.lab.net main;


        # do not transfer http request to next server on timeout
        proxy_next_upstream off;

        client_max_body_size 10m;
        proxy_buffering on;
        client_body_buffer_size 10m;
        proxy_buffer_size 32k;
        proxy_buffers 1024 32k;
        large_client_header_buffers 20 8k;

        location / {
                index   index.html

                proxy_set_header X-Forwarded-Proto https;
                proxy_redirect off;
                proxy_set_header Host $host;
                proxy_set_header  X-Real-IP  $remote_addr;
                proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_http_version 1.1;
                proxy_max_temp_file_size 0;
                proxy_pass https://secureProd;

        } #end location


}
### END HTTPS SERVER




### START HTTP SERVER
server {

        # backlog = worker_connections setting
        listen 172.16.1.4:80 backlog=65535;

        server_name webcluster.corp.footlocker.net;

        access_log      /var/log/nginx/webcluster_http.access.log main;
        error_log       /var/log/nginx/webcluster_http.error.log debug;

        access_log     syslog:server=syslog.lab.net main;


        # do not transfer http request to next server on timeout
        proxy_next_upstream off;

        client_max_body_size 10m;
        proxy_buffering on;
        client_body_buffer_size 10m;
        proxy_buffer_size 32k;
        proxy_buffers 1024 32k;
        large_client_header_buffers 20 8k;

        location / {
                index   index.html

                proxy_set_header X-Forwarded-Proto https;
                proxy_redirect off;
                proxy_set_header Host $host;
                proxy_set_header  X-Real-IP  $remote_addr;
                proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_http_version 1.1;
                proxy_max_temp_file_size 0;
                proxy_pass http://httpProd;

        } #end location


}
### END HTTP SERVER

STEP 7. Start Nginx
On lb01 and lb02 run the following to start keepalived

service nginx start

STEP 8. Tune kernel parameters
I also tuned some of the Linux kernel parameters to open up the TCP/IP stack. The TCP/IP stack on line is fairly restricted and needs to be opened up if the system will be handling a high number of TCP connections. I was able to substantially increase the throughput of my load balancer by tuning these settings.

[root@lb01 ~]$ sysctl -p
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_orphans = 60000
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
net.ipv4.icmp_echo_ignore_broadcasts = 1
kernel.exec-shield = 1
kernel.randomize_va_space = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_window_scaling = 1
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 16384
net.ipv4.tcp_max_syn_backlog = 65536
net.ipv4.tcp_max_tw_buckets = 1440000
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_congestion_control = cubic
fs.file-max = 3000000
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_fin_timeout = 15

Locking cron jobs

I was working with a developer this week who was struggling with what seemed to be a simple problem. He wanted a run a cron job every X minutes. However, sometimes the cron job would run long. Resulting in the job running in parallel. After some Google-fu I suggested using the utility flock.

flock places a lock on cronjob.lockfile while the ping 4.2.2.2 is running. -n tells any other process trying to place a lock on cronjob.lockfile to fail immediately. You can test this by opening up another terminal sessions and firing off another ping 4.2.2.2. The second process launched using flock should fail immediately.

flock -n /tmp/cronjob.lockfile -c ping 4.2.2.2

Creating scheduled events in MySQL

Unknown to many, the MySQL instance has the ability to schedule events. This feature can be useful for scheduling activities like maintenance and batch processes. To get started the user in MySQL creating the event requires the privileges to create events.

1. The following will GRANT user jsmith the ability to create events on all DBs in MySQL.

GRANT event privileges on all DBs to user jsmith

2. GRANT event privileges to all tables in DB myschema.

GRANT EVENT ON myschema.* TO jsmith;

3. Enter the following code to create a scheduled event which runs every 1 hour starting at 2014-01-07 13:00:00.

CREATE EVENT evntTruncate -- event name
ON SCHEDULE EVERY '1' HOUR -- run every 1 hour
STARTS '2014-01-07 13:00:00' -- should be in the future
DO
TRUNCATE TABLE City; -- SQL statement to execute

4. Run the following query to verify that the event has been scheduled.

SELECT * FROM information_schema.events;