Seven Story Rabbit Hole

Sometimes awesome things happen in deep rabbit holes. Or not.

   images

Running a Sync Gateway Cluster Under CoreOS on AWS

Follow the steps below to create a Sync Gateway + Couchbase Server cluster running under AWS with the following architecture:

architecture diagram

There is a youtube video (12 mins) which walks through this entire setup process, or you can follow the instructions below.

Kick off Couchbase Server + Sync Gateway cluster

Launch EC2 instances

Go to the Cloudformation Wizard

Recommended values:

  • ClusterSize: 3 nodes (default)
  • Discovery URL: as it says, you need to grab a new token from https://discovery.etcd.io/new and paste it in the box.
  • KeyPair: the name of the AWS keypair you want to use. If you haven’t already, you’ll want to upload your local ssh key into AWS and create a named keypair.

ssh into a CoreOS instance

Go to the AWS console under EC2 instances and find the public ip of one of your newly launched CoreOS instances.

Choose any one of them (it doesn’t matter which), and ssh into it as the core user with the cert provided in the previous step:

1
$ ssh -i aws.cer -A core@ec2-54-83-80-161.compute-1.amazonaws.com

Sanity check

Let’s make sure the CoreOS cluster is healthy first:

1
$ fleetctl list-machines

This should return a list of machines in the cluster, like this:

1
2
3
4
MACHINE          IP              METADATA
03b08680...     10.33.185.16    -
209a8a2e...     10.164.175.9    -
25dd84b7...     10.13.180.194   -

Kick off cluster

From the CoreOS machine you ssh’d into in the previous step:

1
2
3
4
$ wget https://raw.githubusercontent.com/tleyden/sync-gateway-coreos/master/scripts/sync-gw-cluster-init.sh
$ chmod +x sync-gw-cluster-init.sh
$ SG_CONFIG_URL=https://raw.githubusercontent.com/couchbaselabs/ToDoLite-iOS/master/sync-gateway-config.json
$ ./sync-gw-cluster-init.sh -n 1 -c master -b "todos" -z 512 -g $SG_CONFIG_URL -v 3.0.1 -m 3 -u user:passw0rd

You’ll want to use your own config URL for the SG_CONFIG_URL value. For example, a file hosted in github or on your own webserver.

View cluster

After the above script finishes, run fleetctl list-units to list the services in your cluster, and you should see:

1
2
3
4
5
6
7
UNIT                     MACHINE             ACTIVE  SUB
couchbase_bootstrap_node.service                281cd575.../10.150.73.56        active    running
couchbase_bootstrap_node_announce.service       281cd575.../10.150.73.56        active    running
couchbase_node.1.service                        36ab135c.../10.79.132.157       active    running
couchbase_node.2.service                        f815a846.../10.51.179.214       active    running
sync_gw_announce@1.service                      36ab135c.../10.79.132.157       active    running
sync_gw_node@1.service                          36ab135c.../10.79.132.157       active    running

Verify internal

Find internal ip

1
2
$ fleetctl list-units
sync_gw_node.1.service                209a8a2e.../10.164.175.9    active  running

Curl

On the CoreOS instance you are already ssh’d into, Use the ip found above and run a curl request against the server root:

1
2
$ curl 10.164.175.9:4985
{"couchdb":"Welcome","vendor":{"name":"Couchbase Sync Gateway","version":1},"version":"Couchbase Sync Gateway/master(6356065)"}

Verify external

Find external ip

Using the internal ip found above, go to the EC2 Instances section of the AWS console, and hunt around until you find the instance with that internal ip, and then get the public ip for that instance, eg: ec2-54-211-206-18.compute-1.amazonaws.com

Curl

From your laptop, use the ip found above and run a curl request against the server root:

1
2
$ curl ec2-54-211-206-18.compute-1.amazonaws.com:4984
{"couchdb":"Welcome","vendor":{"name":"Couchbase Sync Gateway","version":1},"version":"Couchbase Sync Gateway/master(6356065)"}

Congratulations! You now have a Couchbase Server + Sync Gateway cluster running.

Appendix A: Kicking off more Sync Gateway nodes.

To launch two more Sync Gateway nodes, run the following command:

1
$ fleetctl start sync_gw_node@{2..3}.service && fleetctl start sync_gw_announce@{2..3}.service

Appendix B: Setting up Elastic Load Balancer.

Setup an Elastic Load Balancer with the following settings:

elb screenshot

Note that it forwards to port 4984.

Once the Load Balancer has been created, go to its configuration to get its DNS name:

elb screenshot

Now you should be able to run curl against that:

1
2
$ curl http://coreos-322270867.us-east-1.elb.amazonaws.com/
{"couchdb":"Welcome","vendor":{"name":"Couchbase Sync Gateway","version":1},"version":"Couchbase Sync Gateway/master(b47aee8)"}

References

Up and Running With Couchbase Lite Phonegap Android on OSX

This will walk you through the steps to install the TodoLite-Phonegap sample app that uses Couchbase Lite Android. After you’re finished, you’ll end up with this app.

Install Homebrew

Install Android Studio

Install Phonegap

Install Node.js

Phonegap is installed with the Node Package Manager (npm), so we need to get Node.js first.

1
brew install node

Install Phonegap

1
$ sudo npm install -g phonegap

You should see this output

Check your version with:

1
2
$ phonegap -v
4.1.2-0.22.9

Install Ant

1
$ brew install ant

Check your Ant version with:

1
2
$ ant -version
Apache Ant(TM) version 1.9.4 compiled on April 29 2014

Note: according to Stack Overflow you may have to install XCode and the Command Line Tools for this to work

Create new Phonegap App

1
$ phonegap create todo-lite com.couchbase.TodoLite TodoLite

You should see the following output:

Creating a new cordova project with name "TodoLite" and id "com.couchbase.TodoLite" at location "/Users/tleyden/Development/todo-lite"
Using custom www assets from https://github.com/phonegap/phonegap-app-hello-world/archive/master.tar.gz
Downloading com.phonegap.hello-world library for www...
Download complete

cd into the newly created directory:

1
$ cd todo-lite

Add the Couchbase Lite plugin

1
$ phonegap local plugin add https://github.com/couchbaselabs/Couchbase-Lite-PhoneGap-Plugin.git

You should see the following output:

[warning] The command `phonegap local <command>` has been DEPRECATED.
[warning] The command has been delegated to `phonegap <command>`.
[warning] The command `phonegap local <command>` will soon be removed.
Fetching plugin "https://github.com/couchbaselabs/Couchbase-Lite-PhoneGap-Plugin.git" via git clone

Add additional plugins required by TodoLite-Phonegap

1
2
3
$ phonegap local plugin add https://git-wip-us.apache.org/repos/asf/cordova-plugin-camera.git
$ phonegap local plugin add https://github.com/apache/cordova-plugin-inappbrowser.git 
$ phonegap local plugin add https://git-wip-us.apache.org/repos/asf/cordova-plugin-network-information.git

Clone the example app source code

1
2
$ rm -rf www
$ git clone https://github.com/couchbaselabs/TodoLite-PhoneGap.git www

Verify ANDROID_HOME environment variable

If you don’t already have it set, you will need to set your ANDROID_HOME environment variable:

1
2
$ export ANDROID_HOME="/Applications/Android Studio.app/sdk"
$ export PATH=$PATH:$ANDROID_HOME/tools:$ANDROID_HOME/platform-tools

Run app

1
$ phonegap run android

You should see the following output:

[phonegap] executing 'cordova platform add android'...
[phonegap] completed 'cordova platform add android'
[phonegap] executing 'cordova run android'...
[phonegap] completed 'cordova run android'

Verify app

TodoLite-Phonegap should launch on the emulator and look like this:

screenshot

Facebook login

Hit the happy face in the top right, and it will prompt you to login via Facebook.

Screenshot

View data

After logging in, it will sync any data for your user stored on the Couchbase Mobile demo cluster.

For example, if you’ve previously used TodoLite-iOS or TodoLite-Android, your data should appear here.

screenshot

Test Sync via single device

  • Login with Facebook as described above
  • Add a new Todo List
  • Add an item to your Todo List
  • Uninstall the app
  • Re-install the app by running phonegap run android again
  • Login with Facebook
  • Your Todo List and item added above should now appear

Test Sync via 2 apps

Note: you could also setup two emulators and run the apps separately

Appendix A: using a more recent build of the Phonegap Plugin

Reset state

1
2
$ cd .. 
$ rm -rf todo-lite

Create another phonegap app

1
2
$ phonegap create todo-lite com.couchbase.TodoLite TodoLite
$ cd todo-lite

Download zip file

1
2
3
$ mkdir Couchbase-Lite-PhoneGap-Plugin && cd Couchbase-Lite-PhoneGap-Plugin
$ wget http://cbfs-ext.hq.couchbase.com/builds/Couchbase-Lite-PhoneGap-Plugin_1.0.4-41.zip
$ unzip Couchbase-Lite-PhoneGap-Plugin_1.0.4-41.zip

Add local plugin

1
$ phonegap local plugin add Couchbase-Lite-PhoneGap-Plugin

You should see output:

[warning] The command phonegap local <command> has been DEPRECATED. [warning] The command has been delegated to phonegap <command>. [warning] The command phonegap local <command> will soon be removed.

Now just follow the rest of the steps above ..

References

Getting Started With Go and Protocol Buffers

I found the official docs on using Google Protocol Buffers from Go a bit confusing, and couldn’t find any other clearly written blog posts on the subject, so I figured I’d write my own.

This will walk you through the following:

  • Install golang/protobuf and required dependencies
  • Generating Go wrappers for a test protocol buffer definition
  • Using those Go wrappers to marshal and unmarshal an object

Install protoc binary

Since the protocol buffer compiler protoc is required later, we must install it.

Ubuntu 14.04

If you want to use an older version (v2.5), simply do:

1
$ apt-get install protobuf-compiler

Otherwise if you want the latest version (v2.6):

1
2
3
4
5
$ apt-get install build-essential
$ wget https://protobuf.googlecode.com/svn/rc/protobuf-2.6.0.tar.gz
$ tar xvfz protobuf-2.6.0.tar.gz
$ cd protobuf-2.6.0
$ ./configure && make install

OSX

1
$ brew install protobuf

Install Go Protobuf library

This assumes you have Go 1.2+ or later already installed, and your $GOPATH variable set.

In order to generate Go wrappers, we need to install the following:

1
2
$ go get -u -v github.com/golang/protobuf/proto
$ go get -u -v github.com/golang/protobuf/protoc-gen-go

Download a test .proto file

In order to generate wrappers, we need a .proto file with object definitions.

This one is a slightly modified version of the one from the official docs.

1
$ wget https://gist.githubusercontent.com/tleyden/95de4bfe34321c79e91b/raw/f8696fe0f1462f377d6bd13c5f20cccfa182578a/test.proto

Generate Go wrappers

1
$ protoc --go_out=. *.proto

You should end up with a new file generated: test.pb.go

Marshalling and unmarshalling an object

Open a new file main.go in emacs or your favorite editor, and paste the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
package main

import (
  "log"

  "github.com/golang/protobuf/proto"
)

func main() {

  test := &Test{
      Label: proto.String("hello"),
      Type:  proto.Int32(17),
      Optionalgroup: &Test_OptionalGroup{
          RequiredField: proto.String("good bye"),
      },
  }
  data, err := proto.Marshal(test)
  if err != nil {
      log.Fatal("marshaling error: ", err)
  }
  newTest := &Test{}
  err = proto.Unmarshal(data, newTest)
  if err != nil {
      log.Fatal("unmarshaling error: ", err)
  }
  // Now test and newTest contain the same data.
  if test.GetLabel() != newTest.GetLabel() {
      log.Fatalf("data mismatch %q != %q", test.GetLabel(), newTest.GetLabel())
  }

  log.Printf("Unmarshalled to: %+v", newTest)

}

Explanation:

  • Lines 11-14: Create a new object suitable for protobuf marshalling and populate it’s fields. Note that using proto.String(..) / proto.Int32(..) isn’t strictly required, they are just convencience wrappers to get string / int pointers.
  • Line 18: Marshal to a byte array.
  • Line 22: Create a new empty object.
  • Line 23: Unmarshal previously marshalled byte array into new object
  • Line 28: Verify that the “label” field made the marshal/unmarshall round trip safely

Run it via:

1
$ go run main.go test.pb.go

and you should see the output:

1
Unmarshalled to: label:"hello" type:17 OptionalGroup{RequiredField:"good bye" }  

Congratulations! You are now using protocol buffers from Go.

References

Running a CBFS Cluster on CoreOS

This will walk you through getting a cbfs cluster up and running.

What is CBFS?

cbfs is a distributed filesystem on top of Couchbase Server, not unlike Mongo’s GridFS or Riak’s CS.

Here’s a typical deployment architecture:

cbfs overview

Although not shown, all cbfs daemons can communicate with all Couchbase Server instances.

It is not required to run cbfs on the same machine as Couchbase Server, but it is meant to be run in the same data center as Couchbase Server.

If you want a deeper understanding of how cbfs works, check the cbfs presentation or this blog post.

Kick off a Couchbase Cluster

cbfs depends on having a Couchbase cluster running.

Follow all of the steps in Running Couchbase Cluster Under CoreOS on AWS to kick off a 3 node Couchbase cluster.

Add security groups

A few ports will need to be opened up for cbfs.

Go to the AWS console and edit the Couchbase-CoreOS-CoreOSSecurityGroup-xxxx security group and add the following rules:

1
2
3
4
Type             Protocol  Port Range Source  
----             --------  ---------- ------
Custom TCP Rule  TCP       8484       Custom IP: sg-6e5a0d04 (copy and paste from port 4001 rule)
Custom TCP Rule  TCP       8423       Custom IP: sg-6e5a0d04 

At this point your security group should look like this:

security group

Create a new bucket for cbfs

Open Couchbase Server Admin UI

In the AWS EC2 console, find the public IP of one of the instances (it doesn’t matter which)

In your browser, go to http://<public_ip>:8091/

Create Bucket

Go to Data Buckets / Create New Bucket

Enter cbfs for the name of the bucket.

Leave all other settings as default.

create bucket

ssh in

In the AWS EC2 console, find the public IP of one of the instances (it doesn’t matter which)

ssh into one of the machines:

1
$ ssh -A core@<public_ip>

Run cbfs

Create a volume dir

Since the fileystem of a docker container is not meant for high throughput io, a volume should be used for cbfs.

Create a directory on the host OS (i.e., on the Core OS instance)

1
2
$ sudo mkdir -p /var/lib/cbfs/data
$ sudo chown -R core:core /var/lib/cbfs

This will be mounted by the docker container in the next step.

Generate fleet unit files

1
2
$ wget https://gist.githubusercontent.com/tleyden/d70161c3827cb8b788a8/raw/8f6c81f0095b0007565e9b205e90afb132552060/cbfs_node.service.template
$ for i in `seq 1 3`; do cp cbfs_node.service.template cbfs_node.$i.service; done

Start cbfs on all cluster nodes

1
$ fleetctl start cbfs_node.*.service

Run fleetctl list-units to list the units running in your cluster. You should have the following:

1
2
3
4
5
6
7
8
9
$ fleetctl list-units
UNIT                                            MACHINE                         ACTIVE    SUB
cbfs_node.1.service                             6ecff20c.../10.51.177.81        active    running
cbfs_node.2.service                             b8eb6653.../10.79.155.153       active    running
cbfs_node.3.service                             02d48afd.../10.186.172.24       active    running
couchbase_bootstrap_node.service                02d48afd.../10.186.172.24       active    running
couchbase_bootstrap_node_announce.service       02d48afd.../10.186.172.24       active    running
couchbase_node.1.service                        6ecff20c.../10.51.177.81        active    running
couchbase_node.2.service                        b8eb6653.../10.79.155.153       active    running

View cbfs output

1
2
3
4
5
6
7
$ fleetctl journal cbfs_node.1.service
2014/11/14 23:18:58 Connecting to couchbase bucket cbfs at http://10.51.177.81:8091/
2014/11/14 23:18:58 Error checking view version: MCResponse status=KEY_ENOENT, opcode=GET, opaque=0, msg: Not found
2014/11/14 23:18:58 Installing new version of views (old version=0)
2014/11/14 23:18:58 Listening to web requests on :8484 as server 10.51.177.81
2014/11/14 23:18:58 Error removing 10.51.177.81's task list: MCResponse status=KEY_ENOENT, opcode=DELETE, opaque=0, msg: Not found
2014/11/14 23:19:05 Error updating space used: Expected 1 result, got []

Run cbfs client

Run a bash shell in a docker container that has cbfsclient pre-installed:

1
$ sudo docker run -ti --net=host tleyden5iwx/cbfs /bin/bash

Upload a file

From within the docker container launched in the previous step:

1
2
3
# echo "foo" > foo
# ip=$(hostname -i | tr -d ' ')
# cbfsclient http://$ip:8484/ upload foo /foo

There should be no errors. If you run fleetctl journal cbfs_node.1.service again on the CoreOS instance, you should see log messages like:

1
2014/11/14 21:51:43 Recorded myself as an owner of e242ed3bffccdf271b7fbaf34ed72d089537b42f: result=success

List directory

1
2
# cbfsclient http://$ip:8484/ ls /
foo

It should list the foo file we uploaded earlier.

Congratulations! You now have cbfs up and running.

References

An Example of Using NSQ From Go

NSQ is a message queue, similar to RabbitMQ. I decided I’d give it a whirl.

Install Nsq

1
2
3
$ wget https://s3.amazonaws.com/bitly-downloads/nsq/nsq-0.2.31.darwin-amd64.go1.3.1.tar.gz
$ tar xvfz nsq-0.2.31.darwin-amd64.go1.3.1.tar.gz
$ sudo mv nsq-0.2.31.darwin-amd64.go1.3.1/bin/* /usr/local/bin

Launch Nsq

1
2
3
$ nsqlookupd & 
$ nsqd --lookupd-tcp-address=127.0.0.1:4160 &
$ nsqadmin --lookupd-http-address=127.0.0.1:4161 &

Get Go client library

1
$ go get -u -v github.com/bitly/go-nsq

Create a producer

Add the following code to main.go:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
package main

import (
  "log"
  "github.com/bitly/go-nsq"
)

func main() {
  config := nsq.NewConfig()
  w, _ := nsq.NewProducer("127.0.0.1:4150", config)

  err := w.Publish("write_test", []byte("test"))
  if err != nil {
      log.Panic("Could not connect")
  }

  w.Stop()
}

and then run it with:

1
$ go run main.go

If you go to your NSQAdmin at http://localhost:4171, you should see a single message in the write_test topic.

NSQAdmin

Create a consumer

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
package main

import (
  "log"
  "sync"

  "github.com/bitly/go-nsq"
)

func main() {

  wg := &sync.WaitGroup{}
  wg.Add(1)

  config := nsq.NewConfig()
  q, _ := nsq.NewConsumer("write_test", "ch", config)
  q.AddHandler(nsq.HandlerFunc(func(message *nsq.Message) error {
      log.Printf("Got a message: %v", message)
      wg.Done()
      return nil
  }))
  err := q.ConnectToNSQD("127.0.0.1:4150")
  if err != nil {
      log.Panic("Could not connect")
  }
  wg.Wait()

}

and then run it with:

1
$ go run main.go

You should see output:

1
2
2014/11/12 08:37:29 INF    1 [write_test/ch] (127.0.0.1:4150) connecting to nsqd
2014/11/12 08:37:29 Got a message: &{[48 55 54 52 48 57 51 56 50 100 50 56 101 48 48 55] [116 101 115 116] 1415810020571836511 2 0xc208042118 0 0}

Congratulations! You just pushed a message through NSQ.

Enhanced consumer: use NSQLookupd

The above example hardcoded the ip of nsqd into the consumer code, which is not a best practice. A better way to go about it is to point the consumer at nsqlookupd, which will transparently connect to the appropriate nsqd that happens to be publishing that topic.

In our example, we only have a single nsqd, so it’s an extraneous lookup. But it’s good to get into the right habits early, especially if you are a habitual copy/paster.

The consumer example only needs a one-line change to get this enhancement:

1
err := q.ConnectToNSQLookupd("127.0.0.1:4161")

Which will connect to the HTTP port of nsqlookupd.

CoreOS With Nvidia CUDA GPU Drivers

This will walk you through installing the Nvidia GPU kernel module and CUDA drivers on a docker container running inside of CoreOS.

architecture diagram

Launch CoreOS on an AWS GPU instance

  • Launch a new EC2 instance

  • Under “Community AMIs”, search for ami-f669f29e (CoreOS stable 494.4.0 (HVM))

  • Select the GPU instances: g2.2xlarge

  • Increase root EBS store from 8 GB –> 20 GB to give yourself some breathing room

ssh into CoreOS instance

Find the public ip of the EC2 instance launched above, and ssh into it:

1
$ ssh -A core@ec2-54-80-24-46.compute-1.amazonaws.com

Run Ubuntu 14 docker container in privileged mode

1
$ sudo docker run --privileged=true -i -t ubuntu:14.04 /bin/bash

After the above command, you should be inside a root shell in your docker container. The rest of the steps will assume this.

Install build tools + other required packages

In order to match the version of gcc that was used to build the CoreOS kernel. (gcc 4.7)

1
2
# apt-get update
# apt-get install gcc-4.7 g++-4.7 wget git make dpkg-dev

Set gcc 4.7 as default

1
2
3
# update-alternatives --remove gcc /usr/bin/gcc-4.8
# update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.7 60 --slave /usr/bin/g++ g++ /usr/bin/g++-4.7
# update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 40 --slave /usr/bin/g++ g++ /usr/bin/g++-4.8

Verify

1
# update-alternatives --config gcc

It should list gcc 4.7 with an asterisk next to it:

1
* 0            /usr/bin/gcc-4.7   60        auto mode

Prepare CoreOS kernel source

Clone CoreOS kernel repository

1
2
3
$ mkdir -p /usr/src/kernels
$ cd /usr/src/kernels
$ git clone https://github.com/coreos/linux.git

Find CoreOS kernel version

1
2
# uname -a
Linux ip-10-11-167-200.ec2.internal 3.17.2+ #2 SMP Tue Nov 4 04:15:48 UTC 2014 x86_64 Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz GenuineIntel GNU/Linux

The CoreOS kernel version is 3.17.2

Switch correct branch for this kernel version

1
2
# cd linux
# git checkout remotes/origin/coreos/v3.17.2

Create kernel configuration file

1
# zcat /proc/config.gz > /usr/src/kernels/linux/.config

Prepare kernel source for building modules

1
# make modules_prepare

Now you should be ready to install the nvidia driver.

Hack the kernel version

In order to avoid nvidia: version magic errors, the following hack is required:

1
# sed -i -e 's/3.17.2/3.17.2+/' include/generated/utsrelease.h

I’ve posted to the CoreOS Group to ask why this hack is needed.

Install nvidia driver

Download

1
2
3
# mkdir -p /opt/nvidia
# cd /opt/nvidia
# wget http://developer.download.nvidia.com/compute/cuda/6_5/rel/installers/cuda_6.5.14_linux_64.run

Unpack

1
2
3
# chmod +x cuda_6.5.14_linux_64.run
# mkdir nvidia_installers
# ./cuda_6.5.14_linux_64.run -extract=`pwd`/nvidia_installers

Install

1
2
# cd nvidia_installers
# ./NVIDIA-Linux-x86_64-340.29.run --kernel-source-path=/usr/src/kernels/linux/

Installer Questions

  • Install NVidia’s 32-bit compatibility libraries? YES
  • Would you like to run nvidia-xconfig? NO

If everything worked, you should see:

nvidia drivers installed

your /var/log/nvidia-installer.log should look something like this

Load nvidia kernel module

1
# modprobe nvidia

No errors should be returned. Verify it’s loaded by running:

1
# lsmod | grep -i nvidia

and you should see:

1
2
nvidia              10533711  0
i2c_core               41189  2 nvidia,i2c_piix4

Install CUDA

In order to fully verify that the kernel module is working correctly, install the CUDA drivers + library and run a device query.

To install CUDA:

1
2
# ./cuda-linux64-rel-6.5.14-18749181.run
# ./cuda-samples-linux-6.5.14-18745345.run

Verify CUDA

1
2
3
# cd /usr/local/cuda/samples/1_Utilities/deviceQuery
# make
# ./deviceQuery   

You should see the following output:

1
2
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = GRID K520
Result = PASS

Congratulations! You now have a docker container running under CoreOS that can access the GPU.

Appendix: Expose GPU to other docker containers

If you need other docker containers on this CoreOS instance to be able to access the GPU, you can do the following steps.

Exit docker container

1
# exit

You should be back to your CoreOS shell.

Add nvidia device nodes

1
2
3
$ wget https://gist.githubusercontent.com/tleyden/74f593a0beea300de08c/raw/95ed93c5751a989e58153db6f88c35515b7af120/nvidia_devices.sh
$ chmod +x nvidia_devices.sh
$ sudo ./nvidia_devices.sh

Verify device nodes

1
2
3
4
$ ls -alh /dev | grep -i nvidia
crw-rw-rw-  1 root root  251,   0 Nov  5 16:37 nvidia-uvm
crw-rw-rw-  1 root root  195,   0 Nov  5 16:37 nvidia0
crw-rw-rw-  1 root root  195, 255 Nov  5 16:37 nvidiactl

Launch docker containers

When you launch other docker containers on the same CoreOS instance, to allow them to access the GPU device you will need to add the following arguments:

1
$ sudo docker run -ti --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm tleyden5iwx/ubuntu-cuda /bin/bash

References

Running Couchbase Cluster Under CoreOS on AWS

Here are instructions on how to fire up a Couchbase Server cluster running under CoreOS on AWS CloudFormation. You will end up with the following system:

architecture diagram

Launch CoreOS instances via AWS Cloud Formation

Click the “Launch Stack” button to launch your CoreOS instances via AWS Cloud Formation:

NOTE: this is hardcoded to use the us-east-1 region, so if you need a different region, you should edit the URL accordingly

Use the following parameters in the form:

  • ClusterSize: 3 nodes (default)
  • Discovery URL: as it says, you need to grab a new token from https://discovery.etcd.io/new and paste it in the box.
  • KeyPair: use whatever you normally use to start EC2 instances. For this discussion, let’s assumed you used aws, which corresponds to a file you have on your laptop called aws.cer

ssh into a CoreOS instance

Go to the AWS console under EC2 instances and find the public ip of one of your newly launched CoreOS instances.

Choose any one of them (it doesn’t matter which), and ssh into it as the core user with the cert provided in the previous step:

1
$ ssh -i aws.cer -A core@ec2-54-83-80-161.compute-1.amazonaws.com

Sanity check

Let’s make sure the CoreOS cluster is healthy first:

1
$ fleetctl list-machines

This should return a list of machines in the cluster, like this:

1
2
3
4
MACHINE          IP              METADATA
03b08680...     10.33.185.16    -
209a8a2e...     10.164.175.9    -
25dd84b7...     10.13.180.194   -

Download cluster-init script

1
2
$ wget https://raw.githubusercontent.com/couchbaselabs/couchbase-server-docker/master/scripts/cluster-init.sh
$ chmod +x cluster-init.sh

This script is not much. I wrapped things up in a script because the instructions were getting long, but all it does is:

  • Downloads a few fleet init files from github.
  • Generates a few more fleet init files based on a template and the number of nodes you want.
  • Stashes the username/password argument you give it into etcd.
  • Tells fleetctl to kick everything off. Whee!

Launch cluster

Run the script you downloaded in the previous step:

1
$ ./cluster-init.sh -v 3.0.1 -n 3 -u "user:passw0rd"

Where:

  • -v the version of Couchbase Server to use. Valid values are 3.0.1 or 2.2.0.
  • -n the total number of couchbase nodes to start — should correspond to number of ec2 instances (eg, 3)
  • -u the username and password as a single string, delimited by a colon (:)

Replace user:passw0rd with a sensible username and password. It must be colon separated, with no spaces. The password itself must be at least 6 characters.

Once this command completes, your cluster will be in the process of launching.

Verify

To check the status of your cluster, run:

1
$ fleetctl list-units

You should see four units, all as active.

1
2
3
4
5
UNIT                     MACHINE             ACTIVE  SUB
couchbase_bootstrap_node.service                375d98b9.../10.63.168.35  active  running
couchbase_bootstrap_node_announce.service       375d98b9.../10.63.168.35  active  running
couchbase_node.1.service                        8cf54d4d.../10.187.61.136 active  running
couchbase_node.2.service                        b8cf0ed6.../10.179.161.76 active  running

Rebalance Couchbase Cluster

Login to Couchbase Server Web Admin

  • Find the public ip of any of your CoreOS instances via the AWS console
  • In a browser, go to http://<instance_public_ip>:8091
  • Login with the username/password you provided above

After logging in, your Server Nodes tab should look like this:

screenshot

Kick off initial rebalance

  • Click server nodes
  • Click “Rebalance”

After the rebalance is complete, you should see:

screenshot

Congratulations! You now have a 3 node Couchbase Server cluster running under CoreOS / Docker.

References

Goroutines vs Threads

Here are some of the advantages of Goroutines over threads:

  • You can run more goroutines on a typical system than you can threads.
  • Goroutines have growable segmented stacks.
  • Goroutines have a faster startup time than threads.
  • Goroutines come with built-in primitives to communicate safely between themselves (channels).
  • Goroutines allow you to avoid having to resort to mutex locking when sharing data structures.
  • Goroutines are multiplexed onto a small number of OS threads, rather than a 1:1 mapping.
  • You can write massively concurrent servers withouth having to resort to evented programming.

You can run more of them

On Java you can run 1000’s or tens of 1000’s threads. On Go you can run hundreds of thousands or millions of goroutines.

Java threads map directly to OS threads, and are relatively heavyweight. Part of the reason they are heavyweight is their rather large fixed stack size. This caps the number of them you can run in a single VM due to the increasing memory overhead.

Go OTOH has a segmented stack that grows as needed. They are “Green threads”, which means the Go runtime does the scheduling, not the OS. The runtime multiplexes the goroutines onto real OS threads, the number of which is controlled by GOMAXPROCS. Typically you’ll want to set this to the number of cores on your system, to maximize potential parellelism.

They let you avoid locking hell

One of the biggest drawback of threaded programming is the complexity and brittleness of many codebases that use threads to achieve high concurrency. There can be latent deadlocks and race conditions, and it can become near impossible to reason about the code.

Go OTOH gives you primitives that allow you to avoid locking completely. The mantra is don’t communicate by sharing memory, share memory by communicating. In other words, if two goroutines need to share data, they can do so safely over a channel. Go handles all of the synchronization for you, and it’s much harder to run into things like deadlocks.

No callback spaghetti, either

There are other approaches to achieving high concurrency with a small number of threads. Python Twisted was one of the early ones that got a lot of attention. Node.js is currently the most prominent evented frameworks out there.

The problem with these evented frameworks is that the code complexity is also high, and difficult to reason about. Rather than “straightline” coding, the programmer is forced to chain callbacks, which gets interleaved with error handling. While refactoring can help tame some of the mental load, it’s still an issue.

Running Caffe on AWS GPU Instance via Docker

This is a tutorial to help you get the Caffe deep learning framework up and running on a GPU-powered AWS instance running inside a Docker container.

Architecture

architecture diagram

Setup host

Before you can start your docker container, you will need to go deeper down the rabbit hole.

You’ll first need to complete the steps here:

Setting up an Ubuntu 14.04 box running on a GPU-enabled AWS instance

After you’re done, you’ll end up with a host OS with the following properties:

  • A GPU enabled AWS instance running Ubuntu 14.04
  • Nvidia kernel module
  • Nvidia device drivers
  • CUDA 6.5 installed and verified

Install Docker

Once your host OS is setup, you’re ready to install docker. (version 1.3 at the time of this writing)

Setup the key for the docker repo:

1
$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 36A1D7869245C8950F966E92D8576A8BA88D21E9

Add the docker repo:

1
2
$ sudo sh -c "echo deb https://get.docker.com/ubuntu docker main > /etc/apt/sources.list.d/docker.list"
$ sudo apt-get update

Install docker:

1
$ sudo apt-get install lxc-docker

Run the docker container

Find your nvidia devices

1
$ ls -la /dev | grep nvidia

You should see:

1
2
3
crw-rw-rw-  1 root root    195,   0 Oct 25 19:37 nvidia0
crw-rw-rw-  1 root root    195, 255 Oct 25 19:37 nvidiactl
crw-rw-rw-  1 root root    251,   0 Oct 25 19:37 nvidia-uvm

You’ll have to adapt the DOCKER_NVIDIA_DEVICES variable below to match your particular devices.

Here’s how to start the docker container:

1
2
$ DOCKER_NVIDIA_DEVICES="--device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm"
$ sudo docker run -ti $DOCKER_NVIDIA_DEVICES tleyden5iwx/caffe-gpu /bin/bash

It’s a large docker image, so this might take a few minutes, depending on your network connection.

Run caffe test suite

After the above docker run command completes, your shell will now be inside a docker container that has Caffe installed.

You’ll want run the Caffe test suite and make sure it passes. This will validate your environment, including your GPU drivers.

1
2
$ cd /opt/caffe
$ make test && make runtest

Expected Result: ... [ PASSED ] 838 tests.

Run the MNIST LeNet example

A more comprehensive way to verify your environment is to train the MNIST LeNet example:

1
2
3
4
5
$ cd /opt/caffe/data/mnist
$ ./get_mnist.sh
$ cd /opt/caffe
$ ./examples/mnist/create_mnist.sh
$ ./examples/mnist/train_lenet.sh

This will take a few minutes.

Expected output:

1
2
3
4
5
libdc1394 error: Failed to initialize libdc1394 
I1018 17:02:23.552733    66 caffe.cpp:90] Starting Optimization 
I1018 17:02:23.553583    66 solver.cpp:32] Initializing solver from parameters:
... lots of output ...
I1018 17:17:58.684598    66 caffe.cpp:102] Optimization Done.

Congratulations, you’ve got GPU-powered Caffe running in a docker container — celebrate with a cup of Philz!

References

Docker on AWS GPU Ubuntu 14.04 / CUDA 6.5

Architecture

After going through the steps in this blog post, you’ll end up with this:

architecture diagram

Setup host

Before you can start your docker container, you will need to go deeper down the rabbit hole.

You’ll first need to complete the steps here:

Setting up an Ubuntu 14.04 box running on a GPU-enabled AWS instance

After you’re done, you’ll end up with a host OS with the following properties:

  • A GPU enabled AWS instance running Ubuntu 14.04
  • Nvidia kernel module
  • Nvidia device drivers
  • CUDA 6.5 installed and verified

Install Docker

Once your host OS is setup, you’re ready to install docker. (version 1.3 at the time of this writing)

Setup the key for the docker repo:

1
$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 36A1D7869245C8950F966E92D8576A8BA88D21E9

Add the docker repo:

1
2
$ sudo sh -c "echo deb https://get.docker.com/ubuntu docker main > /etc/apt/sources.list.d/docker.list"
$ sudo apt-get update

Install docker:

1
$ sudo apt-get install lxc-docker

Run GPU enabled docker image

Find all your nvidia devices

1
$ ls -la /dev | grep nvidia

You should see:

1
2
3
crw-rw-rw-  1 root root    195,   0 Oct 25 19:37 nvidia0
crw-rw-rw-  1 root root    195, 255 Oct 25 19:37 nvidiactl
crw-rw-rw-  1 root root    251,   0 Oct 25 19:37 nvidia-uvm

Launch docker container

The easiest way to get going is to use this pre-built docker image that has the cuda drivers pre-installed. Or if you want to build your own, the accompanying dockerfile will be a useful starting point.

You’ll have to adapt the DOCKER_NVIDIA_DEVICES variable below to match your particular devices.

To start the docker container, run:

1
2
$ DOCKER_NVIDIA_DEVICES="--device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm"
$ sudo docker run -ti $DOCKER_NVIDIA_DEVICES tleyden5iwx/ubuntu-cuda /bin/bash

After running the above command, you should be at a shell inside your docker container:

1
root@1149788c731c:# 

Verify CUDA access from inside the docker container

Install CUDA samples

1
2
$ cd /opt/nvidia_installers
$ ./cuda-samples-linux-6.5.14-18745345.run -noprompt -cudaprefix=/usr/local/cuda-6.5/

Build deviceQuery sample

1
2
3
$ cd /usr/local/cuda/samples/1_Utilities/deviceQuery
$ make
$ ./deviceQuery   

You should see the following output

1
2
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = GRID K520
Result = PASS

References