Seven Story Rabbit Hole

Moving to a new blogging platform

2022-10-28T13:03:00-07:00

Octopress has treated me well over the years, but with the advent of more modern blogging platforms like Ghost I decided it was time to switch. Check out my new ghost blog!

Ghost advantages

Open source for hackability.
Hosted and self-hosted offerings, with easy migration either direction.
Very clean interface like Medium.
You own all your content, unlike Medium.
Extensible with plugins and themes.
Easy to inject markdown snippets or embed many types of content.
WYSIWIG editing straight from the browser is really nice and feels like a faster workflow.
There are options to create a paid newsletter (a la substack).

Ghost disadvantages

Since it’s not a static blog, you can’t just push to github pages and be done with it.
Self-hosting is a bit of a pain. My server ran out of memory and I ended up throwing in the towel and going with the Ghost Pro hosted version for now. (though I can always easily go back later, which is nice)
Ghost Pro costs money.

Octopress advantages

When published to github pages, you get an amazing lightning-fast hosting service completely free.
Markdown-centric means the content is very portable to other places that natively support markdown.

Octopress disadvantages

Maintaining the ruby tooling can be a bit of a pain. Upgrading is scary, so I got stuck at an old version.
Extensibility seemed like a pain so I personally didn’t bother.
Feels a bit archaic at this point.

Installing Autoware on Ubuntu 20.04

2022-10-19T14:10:00-07:00

This is a log of my experience installing Autoware on my bare metal laptop running Ubuntu 20.04. I had a ton of stumbling blocks but stuck with it and eventually got it working. I documented it along the way, so if you hit any of those same issues this might be useful to you.

As a warning, this blog post is pretty messy because of all those stumbling blocks, so you’re probably better off just following the official autoware installation docs and referring to this in case you run into the same problems.

Good luck!!

My system
Pre-install steps
Installation (docker-based)
Run a planning simulation
References

My system

Ubuntu 20.04
System76 Oryx Pro laptop, 2017
Nvidia GeForce GTX 1070
Nvidia Driver Version: 470.141.03 CUDA Version: 11.4 (upgraded during this blog post to Driver Version: 510.73.05 CUDA Version: 11.6)

Pre-install steps

Choose Ubuntu Linux version

Autoware currently supports both 20.04 and 22.04 (but not 18.04), and I decided to go with 20.04 since it was the next LTS version after the version I had installed (18.04).

I noticed that autoware recommended cuda version of 11.6, which only has official downloads for 20.04 and not 22.04, so that made me think that Ubuntu 20.04 was the better choice.

Here are the steps to upgrade to Ubuntu 20.04: official instructions.

Docker vs source install

I decided to go with the easier docker install until I had a need to use the source install.

Clean out old ros installs

$ apt-get remove ros-dashing-*
$ apt-get remove ros-melodic-*
$ apt-get autoremove

Installation (docker-based)

Install docker engine

Install docker engine based on these instructions. This links to the snapshot of the instructions that I used (as do below links). If you want to use the latest instructions, change the 0423b84ee8d763879bbbf910d249728410b16943 commit hash in the URL to main.

Nvidia container toolkit

Install the nvidia container toolkit based on these instructions.

After this step I was able to run nvidia-smi within the container:

root@apollo:/home/tleyden/Development# docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
Thu Oct 20 05:28:27 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03   Driver Version: 470.141.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| N/A   39C    P8     6W /  N/A |    116MiB /  8119MiB |     14%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Rocker

Rocker is an alternative to Docker compose used by Autoware.

Installed rocker based on these instructions.

After this step, running rocker shows the rocker help.

Start autoware docker container

I ran:

$ rocker --nvidia --x11 --user --volume $HOME/Development/autoware --volume $HOME/Development/autoware_map -- ghcr.io/autowarefoundation/autoware-universe:latest-cuda

but got this error:

Executing command: 
docker run --rm -it  --gpus all -v /home/tleyden/Development/autoware:/home/tleyden/Development/autoware -v /home/tleyden/Development/autoware_map:/home/tleyden/Development/autoware_map  -e DISPLAY -e TERM   -e QT_X11_NO_MITSHM=1   -e XAUTHORITY=/tmp/.docker_555jyzo.xauth -v /tmp/.docker_555jyzo.xauth:/tmp/.docker_555jyzo.xauth   -v /tmp/.X11-unix:/tmp/.X11-unix   -v /etc/localtime:/etc/localtime:ro  d0c01d5fe6d7 
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.6, please update your driver to a newer version, or use an earlier cuda container: unknown.

Error workaround

Using the approach suggested in this github post to add -e NVIDIA_DISABLE_REQUIRE=true, I ran the new command:

rocker -e NVIDIA_DISABLE_REQUIRE=true --nvidia --x11 --user --volume $HOME/Development/autoware --volume $HOME/Development/autoware_map -- ghcr.io/autowarefoundation/autoware-universe:latest-cuda

which seemed to work, as it dropped me into a container:

tleyden@86a918b83192:~/Development/autoware$ docker ps
bash: docker: command not found

Based on the response from the super helpful folks at Autoware in this discussion I determined I needed to upgrade my Cuda version based on these instructions. (see later step below)

Install vcstool

In the source instructions, it mentions that autoware depends on vcstool, which is a tool that makes it easy to manage code from multiple repos.

Install with:

curl -s https://packagecloud.io/install/repositories/dirk-thomas/vcstool/script.deb.sh | sudo bash
sudo apt-get update
sudo apt-get install python3-vcstool

Setup workspace (from within container)

In the container shell (started above):

cd autoware
mkdir src
vcs import src < autoware.repos
sudo apt update
rosdep update
rosdep install --from-paths . --ignore-src --rosdistro $ROS_DISTRO
colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Release

it took about 40 mins to build:

Summary: 238 packages finished [39min 59s]
  17 packages had stderr output: bag_time_manager_rviz_plugin elevation_map_loader grid_map_pcl image_projection_based_fusion lidar_apollo_instance_segmentation lidar_apollo_segmentation_tvm lidar_apollo_segmentation_tvm_nodes lidar_centerpoint livox_tag_filter map_loader map_tf_generator ndt_omp simulator_compatibility_test tier4_traffic_light_rviz_plugin trtexec_vendor tvm_utility velodyne_pointcloud

Run a planning simulation

According to the docs: “Ad hoc simulation is a flexible method for running basic simulations on your local machine, and is the recommended method for anyone new to Autoware.”, but there are no docs on how to do run an ad hoc simulation, so I am going to try a planning simulation based on the planning simulation docs

Install the gdown utility

This tool is needed to download the map data.

pip3 install gdown

Download maps

In the container started above:

gdown -O ~/autoware_map/ 'https://docs.google.com/uc?export=download&id=1499_nsbUbIeturZaDj7jhUownh5fvXHd'
unzip -d ~/autoware_map ~/autoware_map/sample-map-planning.zip

Launch autoware – take 1

From inside the container:

source ~/autoware/install/setup.bash
ros2 launch autoware_launch planning_simulator.launch.xml map_path:=$HOME/autoware_map/sample-map-planning vehicle_model:=sample_vehicle sensor_model:=sample_sensor_kit

I’m seeing a ton of errors like:

[rviz2-33] [ERROR] [1666308693.487516652] [rviz2]: rviz::RenderSystem: error creating render window: InvalidParametersException: Window with name 'OgreWindow(0)' already exists in GLRenderSystem::_createRenderWindow at /tmp/binarydeb/ros-galactic-rviz-ogre-vendor-8.5.1/.obj-x86_64-linux-gnu/ogre-v1.12.1-prefix/src/ogre-v1.12.1/RenderSystems/GL/src/OgreGLRenderSystem.cpp (line 1061)

and red-herring error

[component_container_mt-2] [ERROR] [1666308803.727411757] [system.system_monitor.hdd_monitor]: Failed to execute findmnt. /dev/sda3

and possible red herring error

[system_error_monitor-5] [ERROR] [1666308988.596834579] [system_error_monitor system_error_monitor/input_data_timeout]: [Single Point Fault]: 

These issues seem related:

I think this error matters the most, since I get it if I try to launch rviz directly:

[rviz2]: InvalidParametersException: Window with name 'OgreWindow(0)' already exists in GLRenderSystem::_createRenderWindow

The same error was reported in https://github.com/ros2/rviz/issues/753 and https://github.com/NVIDIA/nvidia-docker/issues/1438.

I will update my nvidia driver as alluded to above, remove the -e NVIDIA_DISABLE_REQUIRE=true workaround, and retry.

Upgrade to CUDA 11.6

I erroneously used the official nvidia instructions for installing cuda, so instead use the official autoware instructions to install cuda rather than the steps below.

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.6.0/local_installers/cuda-repo-ubuntu2004-11-6-local_11.6.0-510.39.01-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-11-6-local_11.6.0-510.39.01-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu2004-11-6-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

It failed on the last step:

# apt-get -y install cuda
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 cuda : Depends: cuda-11-6 (>= 11.6.0) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

Based on this advice, I’m going to re-install. First I am purging:

apt-get purge "cuda*" "libcudnn*" "tensorrt*"  "nvidia*"

Reboot.

I still had some libnvidia packages, so I purged them with:

apt-get purge ~nnvidia

Since I’m running a system76 laptop, I went to them for support in order to upgrade the nvidia drivers.

apt install system76-driver-nvidia

and now I’m running nvidia 515.65.01:

# nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+

Upgrade to CUDA 11.6 – take 2

Again for this step I erroneously used the official nvidia instructions for installing cuda, so instead use the official autoware instructions to install cuda rather than the steps below.

sudo dpkg -i cuda-repo-ubuntu2004-11-6-local_11.6.0-510.39.01-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu2004-11-6-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

This succeeded, but now nvidia-smi does not work:

$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch

I later realized that I diverged from the autoware instructions in two ways:

I should have run cuda_version=11-4 apt install cuda-${cuda_version} --no-install-recommends
There are a few post-installation actions that need to be run

Post-installation actions:

# Taken from: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions
echo 'export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc

Fixing nvidia-smi error

I simply rebooted, and now nvidia-smi works. Note that the cuda version went from 11.7 to 11.6. The strange thing is that previously I idn’t have the cuda packages installed.

$ nvidia-smi
Fri Oct 21 13:56:48 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.05    Driver Version: 510.73.05    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+

Start autoware docker container take 2

$ rocker --nvidia --x11 --user --volume $HOME/Development/autoware --volume $HOME/Development/autoware_map -- ghcr.io/autowarefoundation/autoware-universe:latest-cuda

but got error:

docker run --rm -it  --gpus all -v /home/tleyden/Development/autoware:/home/tleyden/Development/autoware -v /home/tleyden/Development/autoware_map:/home/tleyden/Development/autoware_map  -e DISPLAY -e TERM   -e QT_X11_NO_MITSHM=1   -e XAUTHORITY=/tmp/.dockerome5n2bc.xauth -v /tmp/.dockerome5n2bc.xauth:/tmp/.dockerome5n2bc.xauth   -v /tmp/.X11-unix:/tmp/.X11-unix   -v /etc/localtime:/etc/localtime:ro  d0c01d5fe6d7 
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

I realized I’m still missing several requirements:

Nvidia container toolkit – I had this previously, but it was uninstalled.
TensorRT and cuDNN – ditto

Install Nvidia container toolkit

I installed nvidia container toolkit based on these autoware instructions

And now it’s able to start a container and run nvidia-smi:

# docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
Fri Oct 21 21:08:28 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.05    Driver Version: 510.73.05    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+

Install cuDNN

# apt-get install libcudnn8=${cudnn_version} libcudnn8-dev=${cudnn_version}
Reading package lists... Done
Building dependency tree       
Reading state information... Done
E: Unable to locate package libcudnn8
E: Unable to locate package libcudnn8-dev

This error was caused by another divergence from the autoware instructions, where I didn’t run this step:

sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"

I re-ran all of these steps from the autoware docs:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt-get update

and now this step worked:

apt-get install libcudnn8=${cudnn_version} libcudnn8-dev=${cudnn_version}

Pin the libraries at those versions with:

sudo apt-mark hold libcudnn8 libcudnn8-dev

Install TensorRT

Using these instructions:

tensorrt_version=8.4.2-1+cuda11.6
sudo apt-get install libnvinfer8=${tensorrt_version} libnvonnxparsers8=${tensorrt_version} libnvparsers8=${tensorrt_version} libnvinfer-plugin8=${tensorrt_version} libnvinfer-dev=${tensorrt_version} libnvonnxparsers-dev=${tensorrt_version} libnvparsers-dev=${tensorrt_version} libnvinfer-plugin-dev=${tensorrt_version}
sudo apt-mark hold libnvinfer8 libnvonnxparsers8 libnvparsers8 libnvinfer-plugin8 libnvinfer-dev libnvonnxparsers-dev libnvparsers-dev libnvinfer-plugin-dev

Start autoware docker container take 3

$ rocker --nvidia --x11 --user --volume $HOME/Development/autoware --volume $HOME/Development/autoware_map -- ghcr.io/autowarefoundation/autoware-universe:latest-cuda
tleyden@apollo:~$ rocker --nvidia --x11 --user --volume $HOME/Development/autoware --volume $HOME/Development/autoware_map -- ghcr.io/autowarefoundation/autoware-universe:latest-cuda
Extension volume doesn't support default arguments. Please extend it.
Active extensions ['nvidia', 'volume', 'x11', 'user']
Step 1/12 : FROM python:3-slim-stretch as detector
...
Executing command:
docker run --rm -it  --gpus all -v /home/tleyden/Development/autoware:/home/tleyden/Development/autoware -v /home/tleyden/Development/autoware_map:/home/tleyden/Development/autoware_map  -e DISPLAY -e TERM   -e QT_X11_NO_MITSHM=1   -e XAUTHORITY=/tmp/.docker77n9jx85.xauth -v /tmp/.docker77n9jx85.xauth:/tmp/.docker77n9jx85.xauth   -v /tmp/.X11-unix:/tmp/.X11-unix   -v /etc/localtime:/etc/localtime:ro  d0c01d5fe6d7
tleyden@0b1ce9ed54bd:~$

This worked, but at first I was very confused that it actually worked.

It drops you back at the prompt with no meaningful output, but if you look closely, it’s a different prompt. The hostname changes from your actual hostname (apollo in my case), to this cryptic container name (0b1ce9ed54bd).

Note that if you run this in the container:

$ ros2 topic list
/parameter_events
/rosout

you will see meaningful output, whereas if you run that on your host, you will most likely see ros2: command not found, unless you had installed ros2 on your host previously.

Launch autoware take 2

(also requires maps download, see above)

From inside the container:

source ~/autoware/install/setup.bash
ros2 launch autoware_launch planning_simulator.launch.xml map_path:=$HOME/autoware_map/sample-map-planning vehicle_model:=sample_vehicle sensor_model:=sample_sensor_kit

But the same errors are showing up:

[rviz2-33] libGL error: MESA-LOADER: failed to retrieve device information
[rviz2-33] libGL error: MESA-LOADER: failed to retrieve device information
[rviz2-33] [ERROR] [1666388259.903611735] [rviz2]: RenderingAPIException: OpenGL 1.5 is not supported in GLRenderSystem::initialiseContext at /tmp/binarydeb/ros-galactic-rviz-ogre-vendor-8.5.1/.obj-x86_64-linux-gnu/ogre-v1.12.1-prefix/src/ogre-v1.12.1/RenderSystems/GL/src/OgreGLRenderSystem.cpp (line 1201)
[rviz2-33] [ERROR] [1666388259.905397712] [rviz2]: rviz::RenderSystem: error creating render window: RenderingAPIException: OpenGL 1.5 is not supported in GLRenderSystem::initialiseContext at /tmp/binarydeb/ros-galactic-rviz-ogre-vendor-8.5.1/.obj-x86_64-linux-gnu/ogre-v1.12.1-prefix/src/ogre-v1.12.1/RenderSystems/GL/src/OgreGLRenderSystem.cpp (line 1201)

Full output log

Note that these errors are also shown if I run rviz2 from within the container.

$ rviz2
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-tleyden'
libGL error: MESA-LOADER: failed to retrieve device information
libGL error: MESA-LOADER: failed to retrieve device information
[ERROR] [1666389050.804997231] [rviz2]: RenderingAPIException: OpenGL 1.5 is not supported in GLRenderSystem::initialiseContext at /tmp/binarydeb/ros-galactic-rviz-ogre-vendor-8.5.1/.obj-x86_64-linux-gnu/ogre-v1.12.1-prefix/src/ogre-v1.12.1/RenderSystems/GL/src/OgreGLRenderSystem.cpp (line 1201)
[ERROR] [1666389050.805238544] [rviz2]: rviz::RenderSystem: error creating render window: RenderingAPIException: OpenGL 1.5 is not supported in GLRenderSystem::initialiseContext at /tmp/binarydeb/ros-galactic-rviz-ogre-vendor-8.5.1/.obj-x86_64-linux-gnu/ogre-v1.12.1-prefix/src/ogre-v1.12.1/RenderSystems/GL/src/OgreGLRenderSystem.cpp (line 1201)
[ERROR] [1666389050.805275164] [rviz2]: InvalidParametersException: Window with name 'OgreWindow(0)' already exists in GLRenderSystem::_createRenderWindow at /tmp/binarydeb/ros-galactic-rviz-ogre-vendor-8.5.1/.obj-x86_64-linux-gnu/ogre-v1.12.1-prefix/src/ogre-v1.12.1/RenderSystems/GL/src/OgreGLRenderSystem.cpp (line 1061)

Workaround rviz2 errors by passing in /dev/dri device

Relevant github issues:

The recommended fix is:

apt-get install -y mesa-utils libgl1-mesa-glx

I don’t currently have either of those libraries installed:

dpkg -l | grep -i "mesa-utils"
dpkg -l | grep -i "libgl1-mesa-glx"

I installed these packages on the host (outside the container), but that didn’t fix the issue.

I tried installing the packages in the container, but that didn’t work either.

There is a discrepancy between glxinfo on the host vs container.

In this container:

$ rocker --nvidia --x11 --user nvidia/cuda:11.0.3-base-ubuntu20.04

It is returning an error:

$ apt update && apt install -y mesa-utils
$ glxinfo -B
name of display: :1
libGL error: MESA-LOADER: failed to retrieve device information
display: :1  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Intel Open Source Technology Center (0x8086)
    Device: Mesa DRI Unknown Intel Chipset  (0x3e9b)
    Version: 21.2.6
    Accelerated: yes
    Video memory: 3072MB
    Unified memory: yes
    Preferred profile: compat (0x2)
    Max core profile version: 0.0
    Max compat profile version: 1.3
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 0.0
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Unknown Intel Chipset 
OpenGL version string: 1.3 Mesa 21.2.6

Whereas on the host:

$ glxinfo -B
name of display: :1
display: :1  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Intel (0x8086)
    Device: Mesa Intel(R) UHD Graphics 630 (CFL GT2) (0x3e9b)
    Version: 21.2.2
    Accelerated: yes
    Video memory: 3072MB
    Unified memory: yes
    Preferred profile: core (0x1)
    Max core profile version: 4.6
    Max compat profile version: 4.6
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
OpenGL vendor string: Intel
OpenGL renderer string: Mesa Intel(R) UHD Graphics 630 (CFL GT2)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 21.2.2
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.6 (Compatibility Profile) Mesa 21.2.2
OpenGL shading language version string: 4.60
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 21.2.2
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

This stack overflow post suggested a workaround, and according to the rocker docs

“For Intel integrated graphics support you will need to mount the /dev/dri directory as follows:”

--devices /dev/dri

After restarting a container with that flag, it no longer shows the libGL error: MESA-LOADER: failed to retrieve device information error.

I posted a question on the autoware forum to find out why this workaround was needed. Apparently there is another way to solve this problem by forcing the use of the nvidia gpu rather than the intel graphics card:

prime-select query
# It should show on-demand by default
sudo prime-select nvidia
# Force to use NVIDIA GPU

but I haven’t verified this yet.

Start autoware docker container take 4

Add the --devices /dev/dri flag:

$ rocker --nvidia --x11 --devices /dev/dri --user --volume $HOME/Development/autoware --volume $HOME/Development/autoware_map -- ghcr.io/autowarefoundation/autoware-universe:latest-cuda

And now it finally works!! Running rviz2 from within the container shows the rviz window:

Launch autoware take 3

(also requires maps download, see above)

From inside the container:

source ~/autoware/install/setup.bash
ros2 launch autoware_launch planning_simulator.launch.xml map_path:=$HOME/autoware_map/sample-map-planning vehicle_model:=sample_vehicle sensor_model:=sample_sensor_kit

and rviz launched with autoware configured:

Phew! That was a lot harder than I thought it was going to be! It would have gone smoother if:

My nvidia/cuda drivers were up-to-date with Cuda 11.6 and a suitable nvidia driver version.
I’d followed the autoware docs rather than using the nvidia docs – the small divergences mattered a lot. (:facepalm)
I had known about the --devices /dev/dri or nvidia “prime-select” workarounds.

Continued ..

Unfortunately after these steps, rviz2 is still using the integrated cpu driver rather than the GPU.

See this follow-up post to see how to get it running on the GPU.

References

Official autoware installation guide

Installing Ghost on AWS Lightsail with SQLite

2020-06-27T09:59:00-07:00

Here are my requirements for a Ghost blogging platform backend:

Cheap – ideally under $5 / month
Ability to setup multiple blogs if I later want to add a new blog hosted on a different domain: so blog1.domainA.com + and blog2.domainB.com, without increasing cost.
Easy to manage and backup

Non-requirements:

High traffic
Avoiding CLI or some server management (would be nice, but does that exist for < $5 month?)

And here is the tech stack:

AWS Lightsail instance running Ubuntu 18
SQLite
Nginx
Node.js
Ghost Node.js module(s)

SQLite was chosen over MySQL since this is one less “moving part” and slightly easier to manage. See this blog post for the rationale.

Launch a Lightsail instance

Lightsail seems like a good value since you can get a decent sized instance and a static IP for $5.

Blueprint: OS Only Ubuntu 18.04 LTS
SSH key: upload your ~/.ssh/id_rsa.pub (maybe make a copy and rename it with a better name to easily find it in the AWS console later)
Instance Plan: $5/mo with 1GB or RAM, 1 vCPU, 40 GB SSD and 2 TB of transfer. Ghost recommends at least 1 GB of RAM, so it’s probably better to use this instance size or greater.
Identify your instance: rabbit (or whatever you wanna call it!)

You should see the following:

Create a static ip

Go to the Lightsail Networking section, and choose “Attach static ip”. Associate the static ip with the lightsail instance, and make a note of it as you will need in the next step.

Add DNS A record

Go to your DNS register where you registered your blog domain name (eg, Namecheap), and add a new A record as follows:

Use “blog” for the host if you want the blog named “blog.yourdomain.com”, but you could also name it something else.
Use the public static ip address created in the previous step.

Install Ghost dependencies

ssh in via ssh ubuntu@

Update the apt package list:

$ sudo apt-get update

Install nginx:

$ sudo apt-get install -y nginx
$ sudo ufw allow 'Nginx Full'

Install nodejs:

Add the NodeSource APT repository for Node 12, then install nodejs

$ curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash
$ sudo apt-get install -y nodejs

Install Ghost-CLI

$ sudo npm install ghost-cli@latest -g

Create ghost blog

Create a directory to hold the blog:

$ sudo mkdir -p /var/www/ghost/blog1
$ sudo chown ubuntu:ubuntu /var/www/ghost/blog1
$ sudo chmod 775 /var/www/ghost/blog1/
$ cd /var/www/ghost/blog1/

Install Ghost:

$ ghost install --db sqlite3

If you get an error about the node.js version being out of date, see the “Error installing ghost due to node.js being out of date” section below.

Here is how I answered the setup questions, but you can customize to your needs:

Enter your blog URL: http://blog1.domainA.com
Do you wish to setup Nginx?: Yes
Do you wish to setup SSL?: No
Do you wish to setup Systemd?: Yes
Do you want to start Ghost?: Yes

I decided to setup SSL in a separate step rather than initially, but the more secure approach would be to use https instead, eg https://blog1.domainA.com for the blog URL and answer Yes to the setup SSL question, which will trigger SSL setup initially.

If you do setup SSL, you will need to open port 443 in the Lightsail console, otherwise it won’t work. See the “Setup SSL” section below for instructions.

Create Ghost admin user

This part is a little scary, (and ghosts are scary), but Ghost basically puts your blog unprotected to the world without an admin user. The first person that stumbles across it gets to become the admin user. You want that to be you!

Quickly go to http://blog1.domainA.com and create the Ghost admin user.

Configure blog2 and map it’s DNS

Go to your DNS register where you registered your blog domain name (eg, Namecheap), and add a new A record as follows:

Use “blog” for the host if you want the blog named “blog.domainB.com”, but you could also name it something else.
Use the public static ip address from the Lightsail AWS console.

$ sudo mkdir -p /var/www/ghost/blog2
$ sudo chown ubuntu:ubuntu /var/www/ghost/blog2
$ sudo chmod 775 /var/www/ghost/blog2/
$ cd /var/www/ghost/blog2/

Install Ghost:

$ ghost install --db sqlite3

Use the same steps above, except for the blog URL use: http://blog.domainB.com

Congrats!

You now have two separate Ghost blogging sites setup on a single $5 / mo AWS Lightsail instance.

Appendix

Error installing ghost due to node.js being out of date

If you see this error:

$ ghost install --db sqlite3
You are running an outdated version of Ghost-CLI.
It is recommended that you upgrade before continuing.
Run `npm install -g ghost-cli@latest` to upgrade.

✔ Checking system Node.js version - found v12.22.10
✔ Checking logged in user
✔ Checking current folder permissions
✔ Checking system compatibility
✔ Checking memory availability
✔ Checking free space
✔ Checking for latest Ghost version
✔ Setting up install directory
✖ Downloading and installing Ghost v5.20.0
A SystemError occurred.

Message: Ghost v5.20.0 is not compatible with the current Node version. Your node version is 12.22.10, but Ghost v5.20.0 requires ^14.17.0 || ^16.13.0

Debug Information:
    OS: Ubuntu, v18.04.1 LTS
    Node Version: v12.22.10
    Ghost-CLI Version: 1.18.1
    Environment: production
    Command: 'ghost install --db sqlite3'

Try running ghost doctor to check your system for known issues.

You can always refer to https://ghost.org/docs/ghost-cli/ for troubleshooting.

Fix option #1 – specify an older version of ghost

Find an older version of ghost that is compatible with the node.js you have installed, then specify that version of ghost when installing it:

$ ghost install 4.34.3 --db sqlite3

How do you find that version? I happened to have another blog folder that I had previously installed, so I just used that. Maybe on the ghost website they have a compatibility chart.

The downside of this approach is that you won’t have the latest and greatest version of ghost, including security updates. The upside though is that you won’t break any existing ghost blogs on the same machine by upgrading node.js.

Fix option #2 – upgrade to a later node.js and retry

In the error above, it mentions that ghost requires node.js 14.17.0 or above.

The downside is that this could potentially break other existing ghost blogs on the same machine that are not compatible with the later version of node.js. Using containers to isolate dependencies would be beneficial here.

Upgrade to that version of node.js based on these instructions:

curl -fsSL https://deb.nodesource.com/setup_14.x | sudo -E bash -
sudo apt-get install -y nodejs

Run node -v to verify that you’re running a recent enough version:

$ node -v
v14.20.1

Update the ghost cli version:

sudo npm install -g ghost-cli@latest

Retry the ghost install command:

ghost install --db sqlite3

and this time it should not complain about the node.js version.

Setup SSL

During installation, you can answer “Yes” to setup SSL, and it will ask you for your email and use letsencrypt to generate a certificate for you. See this page for more details.

But you must also open port 443 in your Lightsail firewall, otherwise it won’t work.

Auto-renew SSL cert every 90 days

Lets Encrypt certificates expire after 90 days. To avoid downtime on your site, you should auto-renew the certificates. See this blog post for details.

I tried to follow the blog post, and ran ghost setup ssl-renew in my blog folder, but after switching to root with sudo su, I noticed this existing cron entry:

# crontab -l
32 0 * * * "/etc/letsencrypt"/acme.sh --cron --home "/etc/letsencrypt" > /dev/null

So it looks like it is already setup to renew the certs every day.

References

OpenWhisk Action Sequences

2017-07-02T11:30:00-07:00

This will walk you through getting up and running from scratch with Apache OpenWhisk on OSX, and setting up an Action Sequence where the output of one OpenWhisk Action is fed into the input of the next Action.

Install OpenWhisk via Vagrant

# Clone openwhisk
git clone --depth=1 https://github.com/apache/incubator-openwhisk.git openwhisk

# Change directory to tools/vagrant
cd openwhisk/tools/vagrant

# Run script to create vm and run hello action
./hello

You should see reams of output, followed by:

==> default: ++ wsk action invoke /whisk.system/utils/echo -p message hello --result
==> default: {
==> default:     "message": "hello"
==> default: }

SSH into Vagrant machine and run OpenWhisk CLI

$ vagrant ssh

Now you can access the OpenWhisk CLI:

$ wsk

        ____      ___                   _    _ _     _     _
       /\   \    / _ \ _ __   ___ _ __ | |  | | |__ (_)___| | __
  /\  /__\   \  | | | | '_ \ / _ \ '_ \| |  | | '_ \| / __| |/ /
 /  \____ \  /  | |_| | |_) |  __/ | | | |/\| | | | | \__ \   <
 \   \  /  \/    \___/| .__/ \___|_| |_|__/\__|_| |_|_|___/_|\_\
  \___\/ tm           |_|

Usage:
  wsk [command]

Re-run the “Hello world” via:

$ wsk action invoke /whisk.system/utils/echo -p message hello --result
{
    "message": "hello"
}

Hello Go/Docker

I tried following the instructions on James Thomas’ blog for running Go within Docker, but ran into an error (see Disqus comment), and so here’s how I worked around it.

First create a simple Go program and cross compile it. Save the following to exec.go:

package main

import "encoding/json"
import "fmt"
import "os"

func main() {
  // native actions receive one argument, the JSON object as a string
  arg := os.Args[1]

  // unmarshal the string to a JSON object
  var obj map[string]interface{}
  json.Unmarshal([]byte(arg), &obj)
  name, ok := obj["name"].(string)
  if !ok {
      name = "Stranger"
  }
  msg := map[string]string{"msg": ("Hello, " + name + "!")}
  res, _ := json.Marshal(msg)
  fmt.Println(string(res))
}

Cross compile it for Linux:

env GOOS=linux GOARCH=amd64 go build exec.go

Pull the upstream Docker image:

docker pull openwhisk/dockerskeleton

Create a custom docker image based on openwhisk/dockerskeleton:

FROM openwhisk/dockerskeleton

COPY exec /action/exec

Build and test:

$ docker build -t you/openwhisk-exec-test .
$ docker run you/openwhisk-exec-test /action/exec '{"name": "James"}'
{"msg":"Hello, James!"}

OpenWhisk Hello Go/Docker

Push up the docker image to dockerhub:

docker push you/openwhisk-exec-test

Create the OpenWhisk action:

wsk action create go_test --docker you/openwhisk-exec-test

Invoke the action to verify it works:

$ wsk action invoke go_test --blocking --result
{
    "msg": "Hello, Stranger!"
}
$ wsk action invoke go_test --blocking --result --param name James
{
    "msg": "Hello, James!"
}

Define custom actions

Get a list of AWS users using aws-go-sdk

Save this to main.go

package main

import (
  "github.com/aws/aws-sdk-go/service/iam"
  "github.com/aws/aws-sdk-go/aws/session"
  "fmt"
  "encoding/json"
  "os"
  "github.com/aws/aws-sdk-go/aws"
  "github.com/aws/aws-sdk-go/aws/credentials"
)

type Params struct {
  AwsAccessKeyId string
  AwsSecretAccessKey string
}

type Result struct {
  Doc interface{} `json:"doc"`
}

func main() {

  // native actions receive one argument, the JSON object as a string
  arg := os.Args[1]

  // unmarshal the string to a JSON object
  var params Params
  json.Unmarshal([]byte(arg), ¶ms)

  sess, err := session.NewSession(&aws.Config{
      Credentials: credentials.NewCredentials(
          &credentials.StaticProvider{Value: credentials.Value{
              AccessKeyID:     params.AwsAccessKeyId,
              SecretAccessKey: params.AwsSecretAccessKey,
          }},
      ),
  })

  // Create the service's client with the session.
  svc := iam.New(sess)

  listUsersInput := &iam.ListUsersInput{}

  listUsersOutput, err := svc.ListUsers(listUsersInput)
  if err != nil {
      panic(fmt.Sprintf("Error listing users: %v", err))
  }

  result := Result{
      Doc: listUsersOutput,
  }

  outputBytes, err := json.Marshal(result)
  if err != nil {
      panic(fmt.Sprintf("Error marshalling outputBytes: %v", err))
  }

  fmt.Printf("%s", string(outputBytes))

}

Build and package into docker image, and push up to docker hub

$ env GOOS=linux GOARCH=amd64 go build -o exec main.go
$ docker build -t you/fetch-aws-keys .
$ docker push you/fetch-aws-keys

Create an OpenWhisk action:

wsk action create fetch_aws_keys --docker you/fetch-aws-keys --param AwsAccessKeyId "YOURKEY" --param AwsSecretAccessKey "YOURSECRET"

Invoke it:

$ wsk action invoke fetch_aws_keys --blocking --result
{
    "doc": {
        "IsTruncated": false,
        "Marker": null,
        "Users": [
            {
                "Arn": "arn:aws:iam::9798798:user/some.user@yourcompany.co",
                "CreateDate": "2016-01-11T23:49:40Z",
                "PasswordLastUsed": "2017-06-07T17:41:08Z",
                "Path": "/",
                "UserId": "AIDAHGJJK87878KKW",
                "UserName": "some.user@yourcompany.co"
            },
        ...
    ]
}

Write to a CloudantDB

Cloudant Setup

Create a Cloudant database via the Bluemix web admin.

Under the Permissions control panel section for the database, choose Generate a new API key.

Check the _writer permission and make a note of the Key and Password

Verify connectivity by making a curl request:

$ curl -u "yourkey:yourpassword" http://67687-818ca382-081d--bluemix.cloudant.com/yourdb/_all_docs
{"total_rows":0,"offset":0,"rows":[

]}

OpenWhisk + Cloudant

wsk package bind /whisk.system/cloudant myCloudant -p username MYUSERNAME -p password MYPASSWORD -p host MYCLOUDANTACCOUNT.cloudant.com

I’m currently getting this error:

error: Binding creation failed: The supplied authentication is not authorized to access this resource. (code 751)

Switch to BlueMix

At this point I swiched to the OpenWhisk on Bluemix, and downloaded the wsk cli from the Bluemix website, and configure it with my api key per the instructions. Then I re-installed the action via:

wsk action create fetch_aws_keys --docker you/fetch-aws-keys --param AwsAccessKeyId "YOURKEY" --param AwsSecretAccessKey "YOURSECRET"

and made sure it worked by running:

$ wsk action invoke fetch_aws_keys --blocking --result

Cloudant Setup

Following these instructions:

You can get your Bluemix Org name (maybe the first part of your email address by default) and BlueMix space (dev by default) from the Bluemix web admin.

$ wsk property set --namespace myBluemixOrg_myBluemixSpace
ok: whisk namespace set to myBluemixOrg_myBluemixSpace

Refresh packages:

$ wsk package refresh
myBluemixOrg_myBluemixSpace refreshed successfully
created bindings:
updated bindings:
deleted bindings:

It didn’t work according to the docs, and no bindings were created even though I had created a Cloudant database in the Bluemix admin earlier.

I retried the package bind command that had failed earlier:

wsk package bind /whisk.system/cloudant myCloudant -p username MYUSERNAME -p password MYPASSWORD -p host MYCLOUDANTACCOUNT.cloudant.com

and this time success!!

ok: created binding myCloudant

Try writing to the db with:

$ wsk action invoke /yournamespace/myCloudant/write --blocking --result --param dbname yourdb --param doc "{\"_id\":\"heisenberg\",\"name\":\"Walter White\"}"

and you should get a response like:

{
    "id": "heisenberg",
    "ok": true,
    "rev": "1-f413f4b74a724e391fa5dd2e9c8e9d3f"
}

Connect them in a sequence

Create a new package binding pinned to a particular db

The /yournamespace/myCloudant/write action expects a dbname parameter, but the upstream fetch_aws_keys doesn’t contain that parameter. (and it’s better that it doesn’t, to reduce decoupling). So if you try to connect the two actions in a sequence at this point, it will fail.

$ wsk package bind /whisk.system/cloudant myCloudantTestDb -p username MYUSERNAME -p password MYPASSWORD -p host MYCLOUDANTACCOUNT.cloudant.com -p dbname testdb

Create sequence action

Create a sequence that will invoke these actions in sequence:

$ wsk action create fetch_and_write_aws_keys --sequence fetch_aws_keys,/namespace/myCloudantTestDb/write

Fetch the AWS keys
Write the doc containing the AWS keys to the testdb database bound to the myCloudantTestDb package

Try it out:

$ wsk action invoke fetch_and_write_aws_keys --blocking --result
{
    "id": "d80f24dc270208191c07c802bee4e58d",
    "ok": true,
    "rev": "1-ff66b6a20f50ea36d9019481276aa0bb"
}

To view the resulting document:

wsk action invoke /traun.leyden_dev/cloudantKeynuker/read --blocking --result --param id d80f24dc270208191c07c802bee4e58d
{
    "IsTruncated": false,
    "Marker": null,
    "Users": [
        {
            "Arn": "arn:aws:iam::9798798:user/some.user@yourcompany.co",
            "CreateDate": "2016-01-11T23:49:40Z",
            "PasswordLastUsed": "2017-06-07T17:41:08Z",
            "Path": "/",
            "UserId": "AIDAHGJJK87878KKW",
            "UserName": "ome.user@yourcompany.co"
        },

Drive with a scheduler

Let’s say we wanted this to run every minute.

First create an alarm trigger that will fire every minute:

$ wsk trigger create everyMinute --feed /whisk.system/alarms/alarm -p cron '* * * * *'

Now create a rule that will invoke the fetch_and_write_aws_keys action (which is a sequence action) whenever the everyMinute feed is triggered:

$ wsk rule create fetch_and_write_aws_keys_every_minute everyMinute fetch_and_write_aws_keys

To verify that it is working, check your cloudant database to look for new docs:

$ curl -u "yourkey:yourpassword" http://67687-818ca382-081d--bluemix.cloudant.com/yourdb/_all_docs

Or you can also monitor the activations:

$ wsk activation poll
Activation: everyMinute (f454e74ae4254657b0c920d14ea0d078)
[]

Activation: write (5d0e5c2a5af449efa1063b8dab71ba40)
[
    "2017-07-04T18:49:01.820736174Z stdout: success { ok: true,",
    "2017-07-04T18:49:01.820773215Z stdout: id: '6a3e007478278726c5ecd7c85a9fe845',",
    "2017-07-04T18:49:01.820781052Z stdout: rev: '1-25dae194be45260756aa43454fa28e60' }"
]

Activation: fetch_aws_keys (5d3a343b5a224130b4ea4bcb82517dc3)
[
    "2017-07-04T18:49:01.748729114Z stdout: XXX_THE_END_OF_A_WHISK_ACTIVATION_XXX",
    "2017-07-04T18:49:01.748801169Z stderr: XXX_THE_END_OF_A_WHISK_ACTIVATION_XXX"
]

Activation: fetch_and_write_aws_keys (14619d5125a247f983a6d1e840820bb4)
[
    "5d3a343b5a224130b4ea4bcb82517dc3",
    "5d0e5c2a5af449efa1063b8dab71ba40"
]

Activation: fetch_and_write_aws_keys_every_minute (de37f6b2bbaa407eb343b3859d9b3f74)
[]

Running PostgreSQL in Docker

2017-06-14T08:01:00-07:00

This walks you through:

Running Postgres locally in a docker container using docker networking (rather than the deprecated container links functionality that is mentioned in the Postgres Docker instructions.
Deploying to Docker Cloud

Basic Postgres container with docker networking

Create a user defined network

$ docker network create --driver bridge postgres-network

Launch Postgres in that network

The main parameter you will need to provide to postgres is a root db password. Replace ********* with a good password and run this command:

$ docker run --name postgres1 --network postgres-network -e POSTGRES_PASSWORD=********* -d postgres

Launch psql and connect to Postgres

$ docker run -it --rm --network postgres-network postgres psql -h postgres1 -U postgres
Password for user postgres: 
psql (9.6.3)
Type "help" for help.

postgres=#

You now have a working postgres database server.

Using a mounted volume for persistence

When running postgres under docker, most likely want to persist the database files on the host, rather than having them in the container.

First, remove the previous container with:

$ docker stop postgres1 && docker rm postgres1

Go into the /tmp directory:

$ cd /tmp

Launch a container and use /tmp/pgdata as the host directory to mount as a volume mount, which will be mounted in the container in /var/lib/postgresql/data, which is the default location where Postgres stores it’s data. The /tmp/pgdata directory will be created on the host if it doesn’t already exist.

$ docker run --name postgres1 --network postgres-network -v /tmp/pgdata:/var/lib/postgresql/data -e POSTGRES_PASSWORD=*************** -d postgres

List the contents of /tmp/pgdata and you should see several Postgres files:

$ ls pgdata/
PG_VERSION        pg_hba.conf     pg_serial       pg_twophase ...

Launch phppgadmin Container

First create a user

$ docker run -it --rm --network postgres-network postgres /bin/bash

Now you will be in a shell inside the docker container

# createuser testuser -P --createdb -h postgres1 -U postgres
Enter password for new role: *******
Enter it again: ******
Password: 

Launch pgpadmin

 $ docker run --name phppgadmin --network postgres-network -ti -d -p 8080:80 -e DB_HOST=postgres1 keepitcool/phppgadmin

Login

In your browser, open http://localhost:8080/ and you should see the phpadmin login screen:

username: testuser
password: **********

Deploying to Docker Cloud

Security warning! This is not a secure deployment and it’s not recommended to run this in production without a thorough audit by a security specialist.

Deploy Stack

Create a new stack and paste it into the box

postgres-server:
  autoredeploy: true
  environment:
    - POSTGRES_PASSWORD=***************
  image: 'postgres:latest'
  volumes:
    - '/var/lib/postgresql/data:/var/lib/postgresql/data'
phppgadmin:
  autoredeploy: true
  environment:
    - DB_HOST=postgres-server
  image: 'keepitcool/phppgadmin:latest'
  ports:
    - '8085:80'

For example:

Create user

Find the postgres-server container and hit the Terminal menu to get a shell on that container.

Enter:

# createuser testuser -P --createdb -h localhost -U postgres 
Enter password for new role: *******
Enter it again: *********

Login to Web UI

Find the phppgadmin service in the Docker Cloud Web UI, and look for the service endpoint, which should look something like this:

http://phppgadmin.postgres.071a32d40.svc.dockerapp.io:8085/

username: testuser
password: **********

Understanding function closures in Go

2016-12-20T18:10:00-08:00

Function closures are really powerful.

Essentially you can think of them like stateful functions, in the sense that they encapsulate state. The state that they happen to capture (or “close over” — hence the name “closure”) is everything that’s in scope when they are defined.

First some very basic higher order functions.

Higher order functions

Functions that take other functions and call them are called higher order functions. Here’s a trivial example:

func sendLoop(sender func()) {
  sender()
}

func main() {

  mySender := func() {
      fmt.Printf("I should send something\n")
  }
  sendLoop(mySender)

}

In the main() function, we define a function called mySender and pass it to the sendLoop() function. sendLoop() takes a confusing looking argument called sender func() — the parameter name is sender, and the parameter type is func(), which is a function that takes no arguments and returns no values.

To make this slightly less confusing, we can define a named SenderFunc function type and use that:

// A SenderFunc is a function that takes no arguments, returns nothing
// and presumably sends something
type SenderFunc func()

func sendLoop(sender SenderFunc) {
  sender()
}

func main() {

  mySender := func() {
      fmt.Printf("I should send something\n")
  }

  sendLoop(mySender)

}

sendLoop() has been updated to take SenderFunc as an argument, which is easier to read than taking a func() as an argument (which looks a bit like a function call!) If the SenderFunc type took more parameters and/or returned more values, having this in a defined type would be crucial for readability.

Adding a return value

Let’s make it slightly more realistic — let’s say that the sendLoop() might need to retry calling the SenderFunc passed to it a few times until it actually works. So the SenderFunc definition will need to be updated so that it returns a boolean that indicates whether a retry is necessary.

// A SenderFunc is a function that takes no arguments and returns a boolean
// that indicates whether or not the send needs to be retried (in the case of failure)
type SenderFunc func() bool

func sendLoop(sender SenderFunc) {
  for {
      retry := sender()
      if !retry {
          return
      }
      time.Sleep(time.Second)
  }
}

func main() {

  mySender := func() bool {
      fmt.Printf("I should send something and return a real retry value\n")
      return false
  }

  sendLoop(mySender)

}

One thing to note here is the clean separation of concerns — all sendLoop() knows is that it gets a SenderFunc which it should call and it will return a boolean indicator of whether or not it worked or not. It knows absolutely nothing about the inner workings of the SenderFunc, nor does it care.

A stateful sender — the wrong way

You have a new requirement that you need to only retry the SenderFunc 10 times, and then you should give up.

Your first inclination might be to take this approach:

// A SenderFunc is a function that takes no arguments and returns a boolean
// that indicates whether or not the send needs to be retried (in the case of failure)
type SenderFunc func() bool

func sendLoop(sender SenderFunc) {
  counter := 0
  for {
      retry := sender()
      if !retry {
          return
      }
      counter += 1
      if counter >= 10 {
          return
      }
      time.Sleep(time.Second)
  }
}

func main() {

  mySender := func() bool {
      fmt.Printf("I should send something and return a real retry value\n")
      return false
  }

  sendLoop(mySender)

}

This will work, but it makes the sendLoop() less generally useful. What happens when your co-worker hears about this nifty sendLoop() you wrote, and wants to use it with their own SenderFunc but wants it to retry 100 times? (side note: your SenderFunc implementation simply prints to the console, whereas theirs might write to a Slack channel, yet the sendLoop() will still work!)

To make it more generic, you could take this approach:

func sendLoop(sender SenderFunc, maxNumAttempts int) {
  counter := 0
  for {
      retry := sender()
      if !retry {
          return
      }
      counter += 1
      if counter >= maxNumAttempts {
          return
      }
      time.Sleep(time.Second)
  }
}

func main() {

  mySender := func() bool {
      fmt.Printf("I should send something and return a real retry value\n")
      return false
  }

  sendLoop(mySender, 10)

}

Which will work — but there’s a catch. Now that you’ve changed the method function signature of sendLoop() to take a second argument, all of the code that consumes sendLoop() will now be broken. If this were an exported function, it would be an even worse problem.

Luckily there is a much better way.

A stateful sender — the right way using function closures

Rather than making sendLoop() do the retry-related accounting and passing it parameters for that accounting, you can make the SenderFunc handle this and encapsulate the state via a function closure. In this case, the state is the number of retries that have been attempted, which will start at 0 and then increase on every call to the SenderFunc

How can SenderFunc keep internal state? It can “close over” any values that are in scope, which become associated with the function instance (I’m calling it an “instance” because it has state, as we shall see) and will be bound to the function instance as long as the function instance is around.

Here’s what the final code looks like:

// A SenderFunc is a function that takes no arguments and returns a boolean
// that indicates whether or not the send needs to be retried (in the case of failure)
type SenderFunc func() bool

func sendLoop(sender SenderFunc) {
  for {
      retry := sender()
      if !retry {
          return
      }
      time.Sleep(time.Second)
  }
}

func main() {

  counter := 0              // internal state closed over and mutated by mySender function
  maxNumAttempts := 10      // internal state closed over and read by mySender function

  mySender := func() bool {
      sentSuccessfully := rand.Intn(5)
      if sentSuccessfully {
          return false // it worked, we're done!
      }

      // didn't work, any retries left?
      // only retry if we haven't exhausted attempts
      counter += 1
      return counter < maxNumAttempts

  }

  sendLoop(mySender)

}

The counter state variable is bound to the mySender function instance, which is able to update counter on every failed send attempt since the function “closes over” the counter variable that is in scope when the function instance is created. This is the heart of the idea of a function closure.

The sendLoop() doesn’t know anything about the internals of the SenderFunc in terms of how it tracks whether or not it should retry or not, it just treats it as a black box. Different SenderFunc implementations could use vastly different rules and/or states for deciding whether the sendLoop() should retry a failed send.

If you wanted to make it even more flexible, you could update the SenderFunc to return a time.Duration in addition to a bool to indicate retry, which would allow you to implement “backoff retry” strategies and so forth.

What about thread/goroutine safety?

If you’re passing the same function instances that have internal state (aka function closures) to multiple goroutines that are calling it, you’re going to end up causing data races. There’s nothing special about function closures that protect you from this.

The simplest way to deal with is to make a new function instance for each goroutine you are sending the function instance to, which is probably what you want. In theory though, you could also wrap the state update in a mutex, which is probably not what you want since that will cause goroutines to block eachother trying to grab the mutex.

Tuning the Go HTTP client settings for load testing

2016-11-21T10:37:00-08:00

While working on a load testing tool in Go, I ran into a situation where I was seeing tens of thousands of sockets in the TIME_WAIT state.

Here are a few ways to get into this situation and how to fix each one.

Repro #1: Create excessive TIME_WAIT connections by forgetting to read the response body

Run the following code on a linux machine:

package main

import (
  "fmt"
  "html"
  "log"
  "net"
  "net/http"
  "time"
)

func startWebserver() {

  http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
      fmt.Fprintf(w, "Hello, %q", html.EscapeString(r.URL.Path))
  })

  go http.ListenAndServe(":8080", nil)

}

func startLoadTest() {
  count := 0
  for {
      resp, err := http.Get("http://localhost:8080/")
      if err != nil {
          panic(fmt.Sprintf("Got error: %v", err))
      }
      resp.Body.Close()
      log.Printf("Finished GET request #%v", count)
      count += 1
  }

}

func main() {

  // start a webserver in a goroutine
  startWebserver()

  startLoadTest()

}

and in a separate terminal while the program is running, run:

netstat -n | grep -i 8080 | grep -i time_wait | wc -l

and you will see this number constantly growing:

root@14952c2356a7:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
166
root@14952c2356a7:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
231
root@14952c2356a7:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
293
root@14952c2356a7:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
349
... 

Fix: Read Response Body

Update the startLoadTest() method to add the following line of code (and related imports):

func startLoadTest() {
  for {
          ...
      if err != nil {
          panic(fmt.Sprintf("Got error: %v", err))
      }
      io.Copy(ioutil.Discard, resp.Body)  // <-- add this line
      resp.Body.Close()
                ...
  }

}

Now when you re-run it, calling netstat -n | grep -i 8080 | grep -i time_wait | wc -l while it’s running will return 0.

Repro #2: Create excessive TIME_WAIT connections by exceeding connection pool

Another way to end up with excessive connections in the TIME_WAIT state is to consistently exceed the connnection pool and cause many short-lived connections to be opened.

Here’s some code which starts up 100 goroutines which are all trying to make requests concurrently, and each request has a 50 ms delay:

package main

import (
  "fmt"
  "html"
  "io"
  "io/ioutil"
  "log"
  "net/http"
  "time"
)

func startWebserver() {

  http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {

      time.Sleep(time.Millisecond * 50)

      fmt.Fprintf(w, "Hello, %q", html.EscapeString(r.URL.Path))
  })

  go http.ListenAndServe(":8080", nil)

}

func startLoadTest() {
  count := 0
  for {
      resp, err := http.Get("http://localhost:8080/")
      if err != nil {
          panic(fmt.Sprintf("Got error: %v", err))
      }
      io.Copy(ioutil.Discard, resp.Body)
      resp.Body.Close()
      log.Printf("Finished GET request #%v", count)
      count += 1
  }

}

func main() {

  // start a webserver in a goroutine
  startWebserver()

  for i := 0; i < 100; i++ {
      go startLoadTest()
  }

  time.Sleep(time.Second * 2400)

}

In another shell run netstat, note that the number of connections in the TIME_WAIT state is growing again, even though the response is being read

root@14952c2356a7:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
166
root@14952c2356a7:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
231
root@14952c2356a7:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
293
root@14952c2356a7:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
349
... 

To understand what’s going on, we’ll need to dig in a little deeper into the TIME_WAIT state.

What is the socket `TIME_WAIT` state anyway?

So what’s going on here?

What’s happening is that we are creating lots of short lived TCP connections, and the Linux kernel networking stack is keeping tabs on the closed connections to prevent certain problems.

From The TIME-WAIT state in TCP and Its Effect on Busy Servers:

The purpose of TIME-WAIT is to prevent delayed packets from one connection being accepted by a later connection. Concurrent connections are isolated by other mechanisms, primarily by addresses, ports, and sequence numbers[1].

Why so many TIME_WAIT sockets? What about connection re-use?

By default, the Golang HTTP client will do connection pooling. Rather than closing a socket connection after an HTTP request, it will add it to an idle connection pool, and if you try to make another HTTP request before the idle connection timeout (90 seconds by default), then it will re-use that existing connection rather than creating a new one.

This will keep the number of total socket connections low, as long as the pool doesn’t fill up. If the pool is full of established socket connections, then it will just create a new socket connection for the HTTP request and use that.

So how big is the connection pool? A quick look into transport.go tells us:

var DefaultTransport RoundTripper = &Transport{
        ... 
  MaxIdleConns:          100,
  IdleConnTimeout:       90 * time.Second,
        ... 
}

// DefaultMaxIdleConnsPerHost is the default value of Transport's
// MaxIdleConnsPerHost.
const DefaultMaxIdleConnsPerHost = 2

The MaxIdleConns: 100 setting sets the size of the connection pool to 100 connections, but with one major caveat: this is on a per-host basis. See the comments on the DefaultMaxIdleConnsPerHost below for more details on the implications of this.
The IdleConnTimeout is set to 90 seconds, meaning that after a connection stays in the pool and is unused for 90 seconds, it will be removed from the pool and closed.
The DefaultMaxIdleConnsPerHost = 2 setting below it. What this means is that even though the entire connection pool is set to 100, there is a per-host cap of only 2 connections!

In the above example, there are 100 goroutines trying to concurrently make requests to the same host, but the connection pool can only hold 2 sockets. So in the first “round” of the goroutines finishing their http request, 2 of the sockets will remain open in the pool, while the remaining 98 connections will be closed and end up in the TIME_WAIT state.

Since this is happening in a loop, you will quickly accumulate thousands or tens of thousands of connections in the TIME_WAIT state. Eventually, for that particular host at least, you will run out of ephemeral ports and not be able to open new client connections. For a load testing tool, this is bad news.

Fix: Tuning the http client to increase connection pool size

Here’s how to fix this issue.

import (
     .. 
)

var myClient *http.Client

func startWebserver() {
      ... same code as before

}

func startLoadTest() {
        ... 
  for {
      resp, err := myClient.Get("http://localhost:8080/")  // <-- use a custom client with custom *http.Transport
                ... everything else is the same
  }

}


func main() {

  // Customize the Transport to have larger connection pool
  defaultRoundTripper := http.DefaultTransport
  defaultTransportPointer, ok := defaultRoundTripper.(*http.Transport)
  if !ok {
      panic(fmt.Sprintf("defaultRoundTripper not an *http.Transport"))
  }
  defaultTransport := *defaultTransportPointer // dereference it to get a copy of the struct that the pointer points to
  defaultTransport.MaxIdleConns = 100
  defaultTransport.MaxIdleConnsPerHost = 100

  myClient = &http.Client{Transport: &defaultTransport}

  // start a webserver in a goroutine
  startWebserver()

  for i := 0; i < 100; i++ {
      go startLoadTest()
  }

  time.Sleep(time.Second * 2400)

}

This bumps the total maximum idle connections (connection pool size) and the per-host connection pool size to 100.

Now when you run this and check the netstat output, the number of TIME_WAIT connections stays at 0

root@bbe9a95545ae:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
0
root@bbe9a95545ae:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
0
root@bbe9a95545ae:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
0
root@bbe9a95545ae:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
0

The problem is now fixed!

If you have higher concurrency requirements, you may want to bump this number to something higher than 100.

Install Couchbase Server + Mobile on Docker Cloud

2016-10-31T16:28:00-07:00

Deploy Couchbase Server and Sync Gateway on Docker Cloud behind a load balancer.

Also available as a screencast

Launch node cluster

Launch a node cluster with the following settings:

Provider: AWS
Region: us-east-1 (or whatever region makes sense for you)
VPC: Auto (if you don’t choose auto, you will need to customize your security group)
Type/Size: m3.medium or greater
IAM Roles: None

Create Couchbase Server service

Go to Services and hit the Create button:

Click the globe icon and Search Docker Hub for couchbase/server. You should select the couchbase/server image:

Hit the Select button and fill out the following values on the Services Wizard:

Service Name: couchbaseserver
Containers: 2
Deployment strategy: High Availability
Autorestart: On failure
Network: bridge

In the Ports section: Enable published on each port and set the Node Port to match the Container Port

Hit the Create and Deploy button. After a few minutes, you should see the Couchbase Server vervice running:

Configure Couchbase Server Container 1 + Create Buckets

Go to the Container section and choose couchbaseserver-1.

Copy and paste the domain name (eca0fe88-7fee-446b-b006-99e8cae0dabf.node.dockerapp.io) into your browser, adding 8091 at the end (eca0fe88-7fee-446b-b006-99e8cae0dabf.node.dockerapp.io:8091)

You should now see the Couchbase Server setup screen:

You will need to find the container IP of Couchbase Server in order to configure it. To do that, go to the Terminal section of Containers/couchbaseserver-1, and enter ifconfig.

Look for the ethwe1 interface and make a note of the ip: 10.7.0.2 — you will need it in the next step.

Switch back to the browser on the Couchbase Server setup screen. Leave the Start a new cluster button checked. Enter the 10.7.0.2 ip address (or whatever was returned for your ethwe1 interface) under the Hostname field.

and hit the Next button.

For the rest of the wizard, you can:

skip adding the samples
skip adding the default bucket
uncheck Update Notifications
leave Product Registration fields blank
check “I agree ..”
make sure to write down your password somewhere, otherwise you will be locked out of the web interface

Create a new bucket for your application:

Configure Couchbase Server Container 2

Go to the Container section and choose couchbaseserver-2.

As in the previous step, copy and paste the domain name (4d8c7be0-3f47-471b-85df-d2471336af75.node.dockerapp.io) into your browser, adding 8091 at the end (4d8c7be0-3f47-471b-85df-d2471336af75.node.dockerapp.io:8091)

Hit Setup and choose Join a cluster now with settings:

IP Address: 10.7.0.2 (the IP address you setup the first Couchbase Server node with)
Username: Administrator (unless you used a different username in the previous step)
Password: enter the password you used in the previous step
Configure Server Hostname: 10.7.0.3 (you can double check this by going to the Terminal for Containers/couchbaseserver-2 and running ifconfig and looking for the ip of the ethwe1 interface)

Trigger a rebalance by hitting the Rebalance button:

Sync Gateway Service

Now create a Sync Gateway service.

Before going through the steps in the Docker Cloud web UI, you will need to have a Sync Gateway configuration somewhere on the publicly accessible internet.

Warning: This is not a secure solution! Do not use any sensitive passwords if you follow these steps

To make it more secure, you could:

Use a Volume mount and have Sync Gateway read the configuration from the container filesystem
Use a HTTPS + Basic Auth for the URL that hosts the Sync Gateway configuration

Create a Sync Gateway configuration on a github gist and get the raw url for the gist.

Make sure to set the server value to http://couchbaseserver:8091 so that it can connect to the Couchbase Service setup in a previous step.
Use the bucket created in the Couchbase Server setup step above

In the Docker Cloud web UI, go to Services and hit the Create button again.

Click the globe icon and Search Docker Hub for couchbase/sync-gateway. You should select the couchbase/sync-gateway image.

Hit the Select button and fill out the following values on the Services Wizard:

Service Name: sync-gateway
Containers: 2
Deployment strategy: High Availability
Autorestart: On failure
Network: bridge

In the Container Configuration section, customize the Run Command to use the raw URL of your gist, eg: https://gist.githubusercontent.com/tleyden/f260b2d9b2ef828fadfad462f0014aed/raw/8f544be6b265c0b57848

In the Ports section, use the following values:

In the Links section, choose couchbaseserver and hit the Plus button

Click the Create and Deploy button.

Verify Sync Gateway

Click the Containers section and you should have two Couchbase Server and two Sync Gateway containers running.

Click the sync-gateway-1 container and get the domain name (eca0fe88-7fee-446b-b006-99e8cae0dabf.node.dockerapp.io) and paste it in your browser with a trailing :4984, eg eca0fe88-7fee-446b-b006-99e8cae0dabf.node.dockerapp.io:4984

You should see the following JSON response:

{
   "couchdb":"Welcome",
   "vendor":{
      "name":"Couchbase Sync Gateway",
      "version":1.3
   },
   "version":"Couchbase Sync Gateway/1.3.1(16;f18e833)"
}

Setup Load Balancer

Click the Services section and hit the Create button. In the bottom right hand corner look for Proxies and choose dockercloud/haproxy

General Settings:

Service Name: sgloadbalancer
Containers: 1
Deployment Strategy: High Availability
Autorestart: Always
Network: Bridge

Ports:

Port 80 should be Published and the Node Port should be set to 80

Links:

Choose sync-gateway and hit the Plus button

Hit the Create and Deploy button

Verify Load Balancer

Click the Containers section and choose sgloadbalancer-1.

Copy and paste the domain name (eg, eca0fe88-7fee-446b-b006-99e8cae0dabf.node.dockerapp.io) into your browser.

You should see the following JSON response:

{
   "couchdb":"Welcome",
   "vendor":{
      "name":"Couchbase Sync Gateway",
      "version":1.3
   },
   "version":"Couchbase Sync Gateway/1.3.1(16;f18e833)"
}

Congratulations! You have just setup a Couchbase Server + Sync Gateway cluster on Docker Cloud.

Deep Dive of What Happens Under The Hood When You Open A Web Page

2016-10-02T12:50:00-07:00

This is a continuation of What Happens Under The Hood When You Open A Web Page, and it’s meant to be a deeper dive.

Clients and Servers

Remember back in the day when you wanted to know what time it was, and you picked up your phone and dialed 853-1212 and it said “At the tone, the time will be 8:53 AM?”.

Those days are over, but the idea lives on. The time service is identical in principal to an internet server. You ask it something, and it gives you an answer.

A well designed service does one thing, and one thing well.

With the time service, you can only ask one kind of question: “What time is it?”
With a DNS server, you can only ask one kind of question: “What is the IP address of organic-juice-for-dogs.io”

Clients vs Servers:

A “Client” can essentially be thought of as being a “Customer”. In the case of calling the time, it’s the person dialing the phone number. In the case of DNS, it’s the Google Chrome browser asking for the IP address.
A “Server” can be thought of as being a “Service”. In the case of calling the time, it’s something running at the phone company. In the case of DNS, it’s a service run by a combination of universities, business, and governments.

Web Browsers

The following programs are all web browsers, which are all technically HTTP Clients, meaning they are on the client end of the HTTP tube.

Google Chrome
Safari
Firefox
Internet Explorer
Etc..

What web browsers do:

Lookup IP addresses from DNS servers over the DNS protocol (which in turn sits on top of the UDP protocol)
Retrieve web pages, images, and more from web servers over the HTTP protocol (which in turn sits on top of the TCP protocol)
Render HTML into formatted “pages”
Executes JavaScript code to add a level of dynamic behavior to web pages

Protocols

In the previous post, there were a few “protocols” mentioned, like HTTP.

What are protocols really?

Any protocol is something to make it possible for things that speak the same protocol to speak to each other over that protocol.

A protocol is just a language, and just like everyone in English-speaking countries agree to speak English and can therefore intercommunicate without issues, many things on the internet agree to speak HTTP to each other.

Here’s what a conversation looks like in the HTTP protocol:

HTTP Client: GET /
HTTP Server: I'm a amazing HTML web page!!

Almost everything that happens on the Internet looks something like this:

                                                                                  
 ┌────────────────────┐                                         ┌────────────────────┐
 │                    │                                         │                    │
 │                    │                                         │                    │
 │                    │                                         │                    │
 │     Internet       ◀──────────────Protocol───────────────────▶    Internet        │
 │     Thing 1        │                                         │    Thing 2         │
 │                    │                                         │                    │
 │                    │                                         │                    │
 │                    │                                         │                    │
 └────────────────────┘                                         └────────────────────┘

Let’s look at a few protocols.

TCP and UDP

You can think of the internet as being made up of tubes. Two very common types of tubes are:

TCP (Transmission Control Protocol)
UDP (User Datagram Protocol)

Here’s what you might imagine an internet tube looking like:

IP

Really, you can think of TCP and UDP as internet tubes that are built from the same kind of concrete — and that concrete is called IP (Internet Protocol)

TCP wraps IP, in the sense that it is built on top of IP. If you took a slice of a TCP internet tube, it would look like this:

 ┌───────────────────────────────────────────┐
 │   TCP - (Transmission Control Protocol)   │
 │                                           │
 │                                           │
 │       ┌──────────────────────────┐        │
 │       │ IP - (Internet Protocol) │        │
 │       │                          │        │
 │       │                          │        │
 │       │                          │        │
 │       └──────────────────────────┘        │
 │                                           │
 └───────────────────────────────────────────┘

Ditto for UDP — it’s also built on top of IP. The slice of a UDP internet tube would look like this:

 ┌───────────────────────────────────────────┐
 │    UDP - (Universal Datagram Protocol)    │
 │                                           │
 │                                           │
 │       ┌──────────────────────────┐        │
 │       │ IP - (Internet Protocol) │        │
 │       │                          │        │
 │       │                          │        │
 │       │                          │        │
 │       └──────────────────────────┘        │
 │                                           │
 └───────────────────────────────────────────┘

IP, or “Internet Protocol”, is fancy way of saying “How machines on the Internet talk to each other”, and IP addresses are their equivalent of phone numbers.

Why do we need two types of tubes built on top of IP? They have different properties:

TCP tubes are heavy weight, they take a long time to build, and a long time to tear down, but they are super reliable.
UDP tubes are light weight, and have no guarantees. They’re like the ¯\_(ツ)_/¯ of internet tubes. If you send something down a UDP internet tube, you actually have no idea whether it will make it down the tube or not. It might seem useless, but it’s not. Pretty much all real time gaming, voice, and video transmissions go through UDP tubes.

HTTP tubes

If you take a slice of an HTTP tube, it looks like this:

┌───────────────────────────────────────────────────────────┐
│           HTTP - (HyperText Transfer Protocol)            │
│                                                           │
│       ┌───────────────────────────────────────────┐       │
│       │   TCP - (Transmission Control Protocol)   │       │
│       │                                           │       │
│       │        ┌──────────────────────────┐       │       │
│       │        │ IP - (Internet Protocol) │       │       │
│       │        │                          │       │       │
│       │        └──────────────────────────┘       │       │
│       │                                           │       │
│       └───────────────────────────────────────────┘       │
│                                                           │
└───────────────────────────────────────────────────────────┘

Because HTTP sits on top of TCP, which in turn sits on top of IP.

DNS tubes

DNS tubes are very similar to HTTP tubes, except they sit on top of UDP tubes. Here’s what a slice might look like:

┌───────────────────────────────────────────────────────────┐
│                DNS - (Domain Name Service)                │
│                                                           │
│       ┌───────────────────────────────────────────┐       │
│       │    UDP - (Universal Datagram Protocol)    │       │
│       │                                           │       │
│       │        ┌──────────────────────────┐       │       │
│       │        │ IP - (Internet Protocol) │       │       │
│       │        │                          │       │       │
│       │        └──────────────────────────┘       │       │
│       │                                           │       │
│       └───────────────────────────────────────────┘       │
│                                                           │
└───────────────────────────────────────────────────────────┘

Actually, internet tubes are more complicated

So when your Google Chrome web browser gets a web page over an HTTP tube, it actually looks more like this:

                                             
          ┌────────────────────┐             
          │                    │             
          │       Chrome       │             
          │       Browser      │             
          │                    │             
          └─────────┬────▲─────┘             
                    │    │                   
                    │    │                   
          ┌─────────▼────┴─────┐             
          │                    │             
          │   Some random      │             
          │  computer in WA    │             
          │                    │             
          └─────────┬─────▲────┘             
          ┌─────────▼─────┴────┐             
          │                    │             
          │   Some random      │             
          │  computer in IL    │             
          │                    │             
          └────────┬───▲───────┘             
          ┌────────▼───┴───────┐             
          │                    │             
          │   Some random      │             
          │  computer in MA    │             
          │                    │             
          └──────────┬───▲─────┘             
                     │   │                   
                     │   │                   
                     │   │                   
 Send me the HTML    │   │ stuff
                     │   │                   
                     │   │                   
                     │   │                   
                     │   │                   
          ┌──────────▼───┴─────┐             
          │                    │             
          │    HTTP Server     │             
          │                    │             
          └────────────────────┘

Each of these random computers in between are called routers, and they basically shuttle traffic across the internet. They make it possible that any two computers on the internet can communicate with each other, without having a direct connection.

If you’re curious to know which computers are in the middle of your connection between you and another computer on the internet, you can run a nifty little utility called traceroute:

$ traceroute google.com
traceroute to google.com (172.217.5.110), 64 hops max, 52 byte packets
dd-wrt (192.168.11.1)  1.605 ms  1.049 ms  0.953 ms
96.120.90.157 (96.120.90.157)  9.334 ms  8.796 ms  8.850 ms
te-0-7-0-18-sur03.oakland.ca.sfba.comcast.net (68.87.227.209)  9.744 ms  9.416 ms  9.120 ms
162.151.78.93 (162.151.78.93)  12.310 ms  11.559 ms  11.662 ms
be-33651-cr01.sunnyvale.ca.ibone.comcast.net (68.86.90.93)  11.276 ms  11.187 ms  12.426 ms
hu-0-13-0-1-pe02.529bryant.ca.ibone.comcast.net (68.86.84.14)  11.624 ms
    hu-0-12-0-1-pe02.529bryant.ca.ibone.comcast.net (68.86.87.14)  11.637 ms
    hu-0-13-0-0-pe02.529bryant.ca.ibone.comcast.net (68.86.86.94)  12.404 ms
as15169-3-c.529bryant.ca.ibone.comcast.net (23.30.206.102)  11.024 ms  11.498 ms  11.148 ms
108.170.243.1 (108.170.243.1)  11.037 ms
170.242.225 (108.170.242.225)  12.246 ms
170.243.1 (108.170.243.1)  11.482 ms

So from my computer to the computer at google.com, it goes through all of those intermediate computers. Some have DNS names, like be-33651-cr01.sunnyvale.ca.ibone.comcast.net, but some only have IP addresses, like 162.151.78.93

Any one of those computers could sniff the traffic going through the tubes (even the IP tubes that all the other ones sit on top of!). That’s one of the reasons you don’t want to send your credit cards over the internet without using encryption.

The End

What happens under the hood when you open a web page?

2016-09-30T20:25:00-07:00

First, the bird’s eye view:

                                                                        
┌────┐                   ┌────────────────┐               ┌────────────────┐
│You │                   │ Google Chrome  │               │    Internet    │
└────┘                   └────────────────┘               └────────────────┘
 │                               │                                  │   
 │    Show me the website for    │                                  │   
 │───organic-juice-for-dogs.io──▶│       1. Hey what's the IP of    │   
 │                               │─────organic-juice-for-dogs.io?──▶│   
 │                               │                                  │   
 │                               │                                  │   
 │                               │◀───────────63.120.10.5───────────│   
 │                               │                                  │   
 │                               │                                  │   
 │                               │        2. HTTP GET / to          │   
 │                               │───────────63.120.10.5───────────▶│   
 │                               │                                  │   
 │                               │                                  │   
 │                               │     HTML Content for homepage    │   
 │                               │◀───────────────of ───────────────│   
 │                               │     organic-juice-for-dogs.io    │   
 │                               │                                  │   
 │                               │                                  │   
 │         3. Render HTML into   │                                  │   
 │◀────────────a Web Page────────│                                  │   
 │                               │                                  │   
 │                               │                                  │   
 │      Click stuff in Google    │                                  │   
 │─────────────Chrome───────────▶│                                  │   
 │                               │                                  │   
 │                               │                                  │   
 │         4. Execute JavaScript │                                  │   
 │◀─────────and update Web Page──┤                                  │   
 │                               │                                  │   
 ▼                               ▼                                  ▼

It all starts with a DNS lookup.

Step 1. The DNS Lookup

Your Google Chrome software contacts a server on the Internet called a DNS server and asks it “Hey what’s the IP of organic-juice-for-dogs.io?”.

DNS has an official sounding acronym, and for good reason, because it’s a very authoritative and fundamental Internet service.

So what exactly is DNS useful for?

It transforms Domain names into IP addresses

                                                                               
 ┌────────────────────┐                                     ┌────────────────────┐
 │                    │      What's the IP address of       │                    │
 │                    │─────organic-juice-for-dogs.io?──────▶                    │
 │                    │                                     │                    │
 │       Chrome       │                                     │      DNS Server    │
 │       Browser      ◀───────────63.120.10.5───────────────│                    │
 │                    │                                     │                    │
 │                    │                                     │                    │
 │                    │                                     │                    │
 └────────────────────┘                                     └────────────────────┘
 

A Domain name, also referred to as a “Dot com name”, is an easy-to-remember word or group of words, so people don’t have to memorize a list of meaningless numbers. You could think of it like dialing 1-800-FLOWERS, which is a lot easier to remember than 1-800-901-1111

The IP address 63.120.10.5 is just like a phone number. If you are a human being and want to call someone, you might dial 415-555-1212. But if you’re a thing on the internet and you want to talk to another thing on the internet, you instead dial the IP address 63.120.10.5 — same concept though.

So, that’s DNS in a nutshell. Not very complicated on the surface.

Step 2. Contact the IP address and fetch the HTML over HTTP

In this step, Google Chrome sends an HTTP GET / HTTP request to the HTTP Server software running on a computer somewhere on the Internet that has the IP address 63.120.10.5.

You can think of the GET / as “Get me the top-most web page from the website”. This is known as the root of the website, in contrast to things deeper into the website, like GET /juices/oakland, which might return a list of dog juice products local to Oakland, CA. Since the root is a the top, that means the tree is actually upside down, and folks tend to think of websites as being structured as inverted trees.

The back-and-forth is going to look something like this:

 ┌────────────────────┐                                         ┌────────────────────┐
 │                    │          What's the HTML for            │                    │
 │                    ├──────────http://63.120.10.5/?───────────▶                    │
 │                    │                                         │                    │
 │       Chrome       │                                         │    HTTP Server     │
 │       Browser      ◀──────────────stuff─────────│                    │
 │                    │                                         │                    │
 │    HTTP CLIENT     │                                         │                    │
 │                    │                                         │                    │
 └────────────────────┘                                         └────────────────────┘
 

These things are speaking HTTP to each other. What is HTTP?

You can think of things that communicate with each other over the internet as using tubes. There are lots of different types of tubes, and in this case it’s an HTTP tube. As long as the software on both ends agree on the type of tube they’re using, everything just works and they can send stuff back and forth. HTTP is a really common type of tube, but it’s not the only one — for example the DNS lookup in the previous step used a completely different type of tube.

Usually the stuff sent back from the HTTP Server is something called HTML, which stands for HyperText Markup Language.

But HTML is not the only kind of stuff that can be sent through an HTTP tube. In fact, JSON (Javascript Object Notation) and XML (eXtensible Markup Language) are also very common. In fact there are tons of different types of things that can be sent through HTTP tubes.

So at this point in our walk through, the Google Chrome web browser software has some HTML text, and it needs to render it in order for it to appear on your screen in a nice easy to view format. That’s the next step.

Step 3. Render HTML in a Web page

HTML is technically a markup language, which means that the text contains formatting directives which has an agreed upon standard on how it should be formatted. You can think of HTML as being similar to a Microsoft Word document, but MS Word is obfuscated while HTML is very transparent and simple:

For example, here is some HTML:

   My first web page, circa, 1993!
   
        I am so proud to have made my very first web page, I Love the World Wide Web
   
   Best Viewed on NCSA Mosaic

Which gets rendered into:

So, you’ll notice that the

element is in a larger font. And the has spaces in between it and the other text.

How does the Google Chrome Web Browser do the rendering? It’s just a piece of software, and rendering HTML is one of it’s primary responsibilities. There are tons of poor engineers at Google who do nothing all day but fix bugs in the Google Chrome rendering code.

Of course, there’s a lot more to it, but that’s the essence of rendering HTML into a web page.

Step 4: Execute JavaScript in your Google Chrome Web Browser

So this step is optional because not all web pages will execute JavaScript in your web browser software, however it’s getting more and more common these days. When you open the Gmail website in your browser, it’s running tons of Javascript code to make the website as fast and responsive as possible.

Essentially, JavaScript adds another level of dynamic abilities to HTML, because when the browser is given HTML and it renders it .. that’s it! There’s no more action, it just sits there — it’s completely inert.

JavaScript, on the other hand, is basically a program-within-a-program.

                                                              
 ┌───────────────────────────────────────────────────────────────┐
 │                         Google Chrome                         │
 │           (A program written in C++ you downloaded)           │
 │                                                               │
 │                                                               │
 │      ┌──────────────────────────────────────────────────┐     │
 │      │                                                  │     │
 │      │                                                  │     │
 │      │     JavaScript for organic-juice-for-dogs.io     │     │
 │      │  (A program in JavaScript that snuck in via the  │     │
 │      │                  HTML document)                  │     │
 │      │                                                  │     │
 │      │                                                  │     │
 │      └──────────────────────────────────────────────────┘     │
 │                                                               │
 │                                                               │
 └───────────────────────────────────────────────────────────────┘

How does the JavaScript get to the web browser? It sneaks in over the HTML! It’s embedded in the HTML, since it’s just another form of text, and your Web Browser (Google Chrome) executes it.

     
          if (Paragraph == CLICKED) {
              Window.Alert("YOU MAY BE INFECTED BY A VIRUS, CLICK HERE IMMEDIATELY")
    }
     
    ...

What can JavaScript do exactly? The list is really, really long. But as a simple example, if you click a button on a webpage:

A JavasScript program can pop up a little “Alert Box”, like this:

Done!

And that’s the World Wide Web! You just went from typing a URL in your browser, from a shiny web page in your Google Chrome. Soup to nuts.

And you can finally buy some juice for your dog!

So that’s it for the high level stuff.

If you’re dying to know more, continue on to Deep Dive of What Happens Under The Hood When You Open A Web Page

Configuring InfluxDB and Grafana with Go client library

2016-09-12T15:14:00-07:00

Create a beautiful Grafana dashboard with realtime performance stats:

Install InfluxDB and Grafana

brew install influxdb grafana telegraf
brew services start influxdb
brew services start grafana
brew services start telegraf

Versions at the time of this writing:

InfluxDB: 1.0
Grafana: 3.1.1

Verify

The Grafana Web UI should be available at localhost:3000 — login with admin/admin
The InfluxDB Web UI should be available at localhost:8083

Create database on influx

Create db named “db”

$ influx
> create database db

Edit telegraf conf

Open /usr/local/etc/telegraf.conf in your favorite text editor and uncomment the entire statsd server section:

# Statsd Server
[[inputs.statsd]]
  ## Address and port to host UDP listener on
  service_address = ":8125"

  .. etc .. 

Set the database to use the “db” database created earlier, under the outputs.influxdb section of the telegraf config

[[outputs.influxdb]]
  ## The full HTTP or UDP endpoint URL for your InfluxDB instance.
  ## Multiple urls can be specified as part of the same cluster,
  ## this means that only ONE of the urls will be written to each interval.
  # urls = ["udp://localhost:8089"] # UDP endpoint example
  urls = ["http://localhost:8086"] # required
  ## The target database for metrics (telegraf will create it if not exists).
  database = "db" # required

Restart telegraf

brew services restart telegraf

Create Grafana Data Source

Open the Grafana Web UI in your browsers (login with admin/admin)
Use the following values:

Create Grafana Dashboard

Go to Dashboards / + New
Click the green thing on the left, and choose Add Panel / Graph

Delete the test metric, which is not needed, by clicking the trash can to the right of “Test Metric”

Under Panel / Datasource, choose db, and then hit + Add Query, you will end up with this

Push sample data point from command line

In order for the field we want to show up on the grafana dashboard, we need to push some data points to the telegraf statds daemon.

Run this in a shell to push the foo:1|c data point, which is a counter with value increasing by 1 on the key named “foo”.

while true; do echo "foo:1|c" | nc -u -w0 127.0.0.1 8125; sleep 1; echo "pushed data point"; done

Create Grafana Dashboard, Part 2

Under select measurement, choose foo from the pulldown
On the top right of the screen near the clock icon, choose “Last 5 minutes” and set Refreshing every to 5 seconds
You should see your data point counter being increased!

Add Go client library and push data points

Here’s how to update to your golang application to push new datapoints.

Install the g2s client library via:

$ go get github.com/peterbourgon/g2s

Here is some sample code to push data points to the statds telegraf process from your go program:

statdsClient, err := g2s.Dial("udp", "http://localhost:8125")
if err != nil {
  panic("Couldn't connect to statsd!")
}
req, err := http.NewRequest("GET", "http://waynechain.com/")
resp, err := http.DefaultClient.Do(req)
if err != nil {
  return err
}
s.StatsdClient.Timing(1.0, "open_website", time.Since(startTime))

This will push statsd “timing” data points under the key “open_website”, with the normal sample rate (set to 0.1 to downsample and only take every 10th sample). Run the code in a loop and it will start pushing stats to statsd.

Now, create a new Grafana dashboard with the steps above, but from the select measurement field choose open_website, and under SELECT choose field (mean) instead of field (value).

Go race detector gotcha with value receivers

2016-05-19T23:06:00-07:00

I ran into the following race detector error:

WARNING: DATA RACE
Write by goroutine 44:
  github.com/couchbaselabs/sg-replicate.stateFnActiveFetchCheckpoint()
      /Users/tleyden/Development/gocode/src/github.com/couchbaselabs/sg-replicate/replication_state.go:53 +0xb1d
  github.com/couchbaselabs/sg-replicate.(*Replication).processEvents()
      /Users/tleyden/Development/gocode/src/github.com/couchbaselabs/sg-replicate/synctube.go:120 +0xa3

Previous read by goroutine 27:
  github.com/couchbaselabs/sg-replicate.(*Replication).GetStats()
      :24 +0xef
  github.com/couchbase/sync_gateway/base.(*Replicator).populateActiveTaskFromReplication()
      /Users/tleyden/Development/gocode/src/github.com/couchbase/sync_gateway/base/replicator.go:241 +0x145

Goroutine 44 was running this code:

func (r *Replication) shutdownEventChannel() {
  r.EventChan = nil
}

and nil’ing out the r.EventChan field.

While goroutine 27 was calling this code on the same *Replication instance:

func (r Replication) GetStats() ReplicationStats {
  return r.Stats
}

It didn’t make sense, because they were accessing different fields of the Replication — one was writing to r.EventChan while the other was reading from r.Stats.

Then I changed the GetStats() method to this:

func (r Replication) GetStats() ReplicationStats {
  return ReplicationStats{}
}

and it still failed!

I started wandering around the Couchbase office looking for help, and got Steve Yen to help me.

He was asking me about using a pointer receiver vs a value receiver here, and then we realized that by using a value reciever it was copying all the fields, and therefore reading all of the fields, including the r.EventChan field that the other goroutine was concurrently writing to! Hence, the data race that was subtly caused by using a value receiver..

The fix was to convert this over to a pointer reciever, and the data race disappeared!

func (r *Replication) GetStats() ReplicationStats {
     return r.Stats
}

Setting up a self-hosted drone.io CI server

2016-02-15T19:37:00-08:00

Spin up AWS server

Ubuntu Server 14.04 LTS (HVM), SSD Volume Type – ami-fce3c696
m3.medium
250MB magnetic storage

Install docker

ssh ubuntu@ and install docker

Register github application

Go to github and register a new OAuth application using the following values:

Application name Couchbase Mobile Drone CI
Homepage URL http://ec2-54-163-185-45.compute-1.amazonaws.com
Application description Couchbase Mobile Drone CI
Authorization callback URL http://ec2-54-163-185-45.compute-1.amazonaws.com/authorize

It will give you a Client ID and Client Secret

Create `/etc/drone/dronerc` config file

On the ubuntu host:

$ sudo mkdir /etc/drone
$ emacs /etc/drone/dronerc

Configure Remote Driver

Add these values:

REMOTE_DRIVER=github
REMOTE_CONFIG=https://github.com?client_id=${client_id}&client_secret=${client_secret}

and replace client_id and client_secret with the values returned from github.

Configure Database

Add these values:

DATABASE_DRIVER=sqlite3
DATABASE_CONFIG=/var/lib/drone/drone.sqlite

Run Docker container

sudo docker run \
  --volume /var/lib/drone:/var/lib/drone \
  --volume /var/run/docker.sock:/var/run/docker.sock \
  --env-file /etc/drone/dronerc \
  --restart=always \
  --publish=80:8000 \
  --detach=true \
  --name=drone \
  drone/drone:0.4

Check the logs via docker logs and they should look something like this

Edit AWS security group

With your instance selected, look for the security groups in the instance details:

Add a new inbound port with the following settings:

Protocol TCP
Port Range 80
Source 0.0.0.0

It should look like this when you’re done:

Verify it’s running

Paste the hostname of your aws instance into your browser (eg, http://ec2-54-163-185-45.compute-1.amazonaws.com), and you should see a page like this:

Login

If you click the login button, you should see:

And then:

Activate a repository

Click one of the repositories you have access to, and you should get an “activate now” option:

which will take you to your project home screen:

Add a `.drone.yml` file to the root of the repository

In the repository you have chosen (in my case I’m using tleyden/sync_gateway, which is a golang project, and may refer to it later), add a .drone.yml file to the root of the repository with:

build:
  image: golang
  commands:
    - go get
    - go build
    - go test

Commit your change, but do not push to github yet, that will be in the next step.

$ git add .drone.yml
$ git commit -m "Add drone.yml"

Kickoff a build

Now push your change up to github.

$ git push origin master

and in your drone UI you should see a build in progress:

when it finishes, you’ll see either a pass or a failure. If you get a failure (which I did), it will look like this:

Manually triggering another build

In my case, the above failure was due to a dependency not building. Since nothing else needs to be pushed to the repo to fix the build, I’m just going to manually trigger a build.

On the build failure screen above, there is a Restart button, which triggers a new build.

Now it works!

Setup the Drone CLI

I could run this on my OSX workstation, but I decided to run this on a linux docker container. The rest of the steps assume you have spun up and are inside a linux docker container.

$ curl http://downloads.drone.io/drone-cli/drone_linux_amd64.tar.gz | tar zx
$ install -t /usr/local/bin drone

Go to your Profile page in the drone UI, and click Show Token.

Now set these environment variables

$ export DRONE_SERVER=http://ec2-54-163-185-45.compute-1.amazonaws.com
$ export DRONE_TOKEN=eyJhbGci...

Query repos

To test the CLI tool works, try the following commands:

# drone repo ls
couchbase/sync_gateway
tleyden/sync_gateway
# drone repo info tleyden/sync_gateway
tleyden/sync_gateway

Adding vendoring to a Go project

2016-02-08T22:49:00-08:00

Install gvt

After doing some research, I decided to try gvt since it seemed simple and well documented, and integrated well with exiting tools like go get.

$ export GO15VENDOREXPERIMENT=1
$ go get -u github.com/FiloSottile/gvt

Go get target project to be updated

I’m going to update todolite-appserver to use vendored dependencies for some of it’s dependencies, just to see how things go.

$ go get -u github.com/tleyden/todolite-appserver

Vendor dependencies

I’m going to vendor the dependency on kingpin since it has transitive dependencies of it’s own (github.com/alecthomas/units, etc). gvt handles this by automatically pulling all of the transitive dependencies.

$ gvt fetch github.com/alecthomas/kingpin

Now my directory structure looks like this:

├── main.go
└── vendor
    ├── github.com
    │   └── alecthomas
    ├── gopkg.in
    │   └── alecthomas
    └── manifest

Here is the manifest

gvt list shows the following:

$  gvt list
github.com/alecthomas/kingpin  https://github.com/alecthomas/kingpin  master 46aba6af542541c54c5b7a71a9dfe8f2ab95b93a
github.com/alecthomas/template https://github.com/alecthomas/template master 14fd436dd20c3cc65242a9f396b61bfc8a3926fc
github.com/alecthomas/units    https://github.com/alecthomas/units    master 2efee857e7cfd4f3d0138cc3cbb1b4966962b93a
gopkg.in/alecthomas/kingpin.v2 https://gopkg.in/alecthomas/kingpin.v2 master 24b74030480f0aa98802b51ff4622a7eb09dfddd

Verify it’s using the vendor folder

I opened up the vendor/github.com/alecthomas/kingpin/global.go and made the following change:

// Errorf prints an error message to stderr.
func Errorf(format string, args ...interface{}) {
  fmt.Println("CALLED IT!!")
  CommandLine.Errorf(format, args...)
}

Now verify that code is getting compiled and run:

$ go run main.go changesfollower
CALLED IT!!
main: error: URL is empty

(note: export GO15VENDOREXPERIMENT=1 is still in effect in my shell)

Restore the dependency

Before I check in the vendor directory to git, I want to reset it to it’s previous state before I made the above change to the global.go source file.

$ gvt restore

Now if I open global.go again, it’s back to it’s original state. Nice!

Add the vendor folder and push

$ git add vendor
$ git commit -m "..."
$ git push origin master

Also, I updated the README to tell users to set the GO15VENDOREXPERIMENT=1 variable:

$ export GO15VENDOREXPERIMENT=1
$ go get -u github.com/tleyden/todolite-appserver
$ todolite-appserver --help

but the instructions otherwise remained the same. If someone tries to use this but forgets to set GO15VENDOREXPERIMENT=1 in Go 1.5, it will still work, it will just use the kingpin dependency in the $GOPATH rather than the vendor/ directory. Ditto for someone using go 1.4 or earlier.

Removing a vendored dependency

As it turns out, I don’t even need kingpin in this project, since I’m using cobra. The kingpin dependency was caused by some leftover code I forgot to cleanup.

To remove it, I ran:

$ gvt delete github.com/alecthomas/kingpin
$ gvt delete github.com/alecthomas/template
$ gvt delete github.com/alecthomas/units
$ gvt delete gopkg.in/alecthomas/kingpin.v2

In this case, since it was my only dependency, it was easy to identify the transitive dependencies. In general though it looks like it’s up to you as a user to track down which ones to remove. I filed gvt issue 16 to hopefully address that.

Editor annoyances

I have emacs setup using the steps in this blog post, and I’m running into the following annoyances:

When I use godef to jump into the code of vendored dependency, it takes me to source code that lives in the GOPATH, which might be different than what’s under vendor/. Also, if I edit it there, my changes won’t be reflected when I rebuild.
I usually search for things in the project via M-x rgrep, but now it’s searching through every repo under vendor/ and returning things I’m not interested in .. since most of the time I only want to search within my project.

Configure Emacs as a Go Editor From Scratch Part 3

2016-02-07T04:25:00-08:00

This is a continuation from a previous blog post. In this post I’m going to focus on making emacs look a bit better.

Currently:

Install a nicer theme

I like the taming-mr-arneson-theme, so let’s install that one. Feel free to browse the emacs themes and find one that you like more.

$ `mkdir ~/.emacs.d/color-themes`
$ `wget https://raw.githubusercontent.com/emacs-jp/replace-colorthemes/d23b086141019c76ea81881bda00fb385f795048/taming-mr-arneson-theme.el`

Update your ~/emacs.d/init.el to add the following lines to the top of the file:

(add-to-list 'custom-theme-load-path "/Users/tleyden/.emacs.d/color-themes/")
(load-theme 'taming-mr-arneson t)

Now when you restart emacs it should look like this:

## Directory Tree

$ cd ~/DevLibraries
$ git clone https://github.com/jaypei/emacs-neotree.git neotree

Update your ~/emacs.d/init.el to add the following lines:

(add-to-list 'load-path "/some/path/neotree")
(require 'neotree)

Open a .go file and the enter M-x neotree-dir to show a directory browser:

Ref: NeoTree

Octopress under Docker

2016-02-06T05:38:00-08:00

I’m setting up a clean install of El Capitan, and want to get my Octopress blog going. However, I don’t want to install it directly on my OSX workstation — I want to have it contained in a docker container.

Install Docker

That’s beyond the scope of this blog post, but what I ended up doing on my new OSX installation was to:

Install VirtualBox 5.0.14
Install docker toolbox

Run tleyden5iwx/octopress

$ docker run -itd -v ~/Documents/blog/:/blog tleyden5iwx/octopress /bin/bash

What’s in ~/Documents/blog/? Basically, the octopress instance I’d setup as described in Octopress Setup Part I.

Bundle install

From inside the docker container:

# cd /blog/octopress
# bundle install

Edit a blog post

On OSX, open up ~/Documents/blog/source/_posts/path-to-post and make some minor edits

Push source

# git push origin source
Username for 'https://github.com': [enter your username]
Password for 'https://username@github.com': [enter your password]

Generate and push to master

Attempt 1

# rake generate
rake aborted!
Gem::LoadError: You have already activated rake 10.4.2, but your Gemfile requires rake 0.9.6. Using bundle exec may solve this.
/blog/octopress/Rakefile:2:in `'
(See full trace by running task with --trace) 

I have no idea why this is happening, but I just conceded defeat against these ruby weirdisms, wished I was using Go (and thought about converting my blog to Hugo), and took their advice and prefixed every command thereafter with bundle exec.

Attempt 2

# bundle exec rake generate && bundle exec rake deploy
Username for 'https://github.com': [enter your username]
Password for 'https://username@github.com': [enter your password]

Success!

Setting up Uniqush with APNS

2016-02-03T08:47:00-08:00

This walks you through running Uniqush in the cloud (under Docker) and setting up an iOS app to receive messages via APNS (Apple Push Notification Service).

Run Uniqush under Docker

Install Docker components

Config

mkdir -p volumes/uniqush
wget https://git.io/vgSYM -O volumes/uniqush/uniqush-push.conf

Security note: the above config has Uniqush listening on all interfaces, but depending on your setup you probably want to change that to localhost or something more restrictive.

Docker compose file

Copy and paste this content into docker-compose.yml

version: '2'

services:
  uniqush:
    container_name: uniqush
    ports:
      - "9898:9898"
    image: tleyden5iwx/uniqush
    entrypoint: uniqush-push
    links:
      - redis
    volumes:
      - ~/docker/volumes/uniqush/uniqush-push.conf:/etc/uniqush/uniqush-push.conf
  redis:
    container_name: redis
    image: redis

Start docker containers

$ docker compose up -d

Verify Uniqush is running

Run this curl command outside of the docker container to verify that Uniqush is responding to HTTP requests:

$ curl localhost:9898/version
uniqush-push 1.5.2

Create APNS certificate

In my case, I already had an app id for my app (com.couchbase.todolite), but push notifications are not enabled, so I needed to enable them:

Create a new push cert:

Choose the correct app id:

Generate CSR according to instructions in keychain:

This will save a CSR on your file system, and the next wizard step will ask you to upload this CSSR and generate the certificate. Now you can download it:

Double click the downloaded cert and it will be added to your keychain.

This is where I got a bit confused, since I had to also download the cert from the app id section — go to the app id and hit “Edit”, then download the cert and double click it to add to your keychain. (I’m confused because I thought these were the same certs and this second step felt redundant)

Create and use provisioning profile

Go to the Provisioning Profiles / Development section and hit the “+” button:

Choose all certs and all devices, and then give your provisioning profile an easy to remember name.

Download this provisioning profile and double click it to install it.

In xcode under Build Settings, choose this provisioning profile:

Register for push notifications in your app

Add the following code to your didFinishLaunchingWithOptions::

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
    
    // Register for push notifications
    if ([application respondsToSelector:@selector(isRegisteredForRemoteNotifications)])
    {
        // iOS 8 Notifications
        [application registerUserNotificationSettings:[UIUserNotificationSettings settingsForTypes:(UIUserNotificationTypeSound | UIUserNotificationTypeAlert | UIUserNotificationTypeBadge) categories:nil]];
        
        [application registerForRemoteNotifications];
    }
    else
    {
        // iOS < 8 Notifications
        [application registerForRemoteNotificationTypes:
         (UIRemoteNotificationTypeBadge | UIRemoteNotificationTypeAlert | UIRemoteNotificationTypeSound)];
    }

    // rest of your code goes here ...

}

And the following callback methods which will be called if remote notification is successful:

- (void)application:(UIApplication *)app didRegisterForRemoteNotificationsWithDeviceToken:(NSData *)deviceToken
{
    
    NSString *deviceTokenStr = [NSString stringWithFormat:@"%@",deviceToken];
    NSLog(@"didRegisterForRemoteNotificationsWithDeviceToken, Device token: %@", deviceTokenStr);
    
    NSString* deviceTokenCleaned = [[[[deviceToken description]
                                      stringByReplacingOccurrencesOfString: @"<" withString: @""]
                                     stringByReplacingOccurrencesOfString: @">" withString: @""]
                                    stringByReplacingOccurrencesOfString: @" " withString: @""];
    
     NSLog(@"didRegisterForRemoteNotificationsWithDeviceToken, Cleaned device token token: %@", deviceTokenCleaned);

}

and this callback which will be called if it’s not unsuccessful:

- (void)application:(UIApplication *)app didFailToRegisterForRemoteNotificationsWithError:(NSError *)err
{
    NSString *str = [NSString stringWithFormat: @"Error: %@", err];
    NSLog(@"Error registering device token.  Push notifications will not work%@", str);
}

If you now run this app on a simulator, you can expect an error like Error registering device token. Push notifications will not workError.

Run the app on a device you should see a popup dialog in the app asking if it’s OK to receive push notifications, and the following log messages in the xcode console:

didRegisterForRemoteNotificationsWithDeviceToken, Device token: <281c8710 1b029fdb 16c8e134 39436336 116001ce bf6519e6 8edefab5 23dab4e9>
didRegisterForRemoteNotificationsWithDeviceToken, Cleaned device token token: 281c87101b029fdb16c8e13439436336116001cebf6519e68edefab523dab4e9

Export APNS keys to .PEM format

Open keychain, select the login keychain and the My Certificates category:

Right click on the certificate (not the private key) “Apple Development Push Services: (your app id)”
Choose Export “Apple Development Push Services: (your app id)″.
Save this as apns-prod-cert.p12 file somewhere you can access it.
When it prompts you for a password, leave it blank (or add one if you want, but this tutorial will assume it was left blank)
Repeat with the private key (in this case, TodoLite Push Notification Cert) and save it as apns-prod-key.p12.

Now they need to be converted from .p12 to .pem format.

$ openssl pkcs12 -clcerts -nokeys -out apns-prod-cert.pem -in apns-prod-cert.p12
Enter Import Password: 
MAC verified OK

$ openssl pkcs12 -nocerts -out apns-prod-key.pem -in apns-prod-key.p12
Enter Import Password:
MAC verified OK
Enter PEM pass phrase: hello 

Remove the PEM passphrase:

$ openssl rsa -in apns-prod-key.pem -out apns-prod-key-noenc.pem
Enter pass phrase for apns-prod-key.pem: hello
writing RSA key

Add PEM files to Uniqush docker container

When you call the Uniqush REST API to add a Push Service Provider, it expects to find the PEM files on it’s local file system. Use the following commands to get these files into the running container in the /tmp directory:

$ `container=$(docker ps | grep -i uniqush-push | awk '{print $1}')`
$ docker cp /tmp/apns-prod-cert.pem $container:/tmp/apns-prod-cert.pem
$ docker cp /tmp/apns-prod-key-noenc.pem $container:/tmp/apns-prod-key-noenc.pem

Create APNS provider in Uniqush via REST API

$ curl -v http://localhost:9898/addpsp -d service=myservice \
                             -d pushservicetype=apns \
                     -d cert=/tmp/apns-prod-cert.pem \
                     -d key=/tmp/apns-prod-key-noenc.pem \
                     -d sandbox=true

(Note: I’m using a development cert, but if this was a distribution cert you’d want to use sandbox=false)

You should get a 200 OK response with:

[AddPushServiceProvider][Info] 2016/02/03 20:35:29 From=24.23.246.59:59447 Service=myservice PushServiceProvider=apns:9f49c9c618c97bebe21bea159d3c7a8577934bdf00 Success!

Add Uniqush subscriber

Using the cleaned up device token from the previous step 281c87101b029fdb16c8e13439436336116001cebf6519e68edefab523dab1e9, create a subscriber with the name mytestsubscriber via:

$ curl -v http://localhost:9898/subscribe -d service=myservice \
                                             -d subscriber=mytestsubscriber \
                       -d pushservicetype=apns \
                       -d devtoken=281c87101b029fdb16c8e13439436336116001cebf6519e68edefab523dab1e9 

You should receive a 200 OK response with:

[Subscribe][Info] 2016/02/03 20:43:21 From=24.23.246.59:60299 Service=myservice Subscriber=mytestsubscriber PushServiceProvider=apns:9f49c9c618c97bebe21bea159d3c7a8577934bdf00 DeliveryPoint=apns:2cbecd0798cc6731d96d5b0fb01d813c7c9a83af00 Success!

Push a test message

The moment of truth!

First, you need to either background your app by pressing the home button, or add some code like this so that an alert will be shown if the app is foregrounded.

$ curl -v http://localhost:9898/push -d service=myservice \
                                        -d subscriber=mytestsubscriber \
                  -d msg=HelloWorld

You should get a 200 OK response with:

[Push][Info] 2016/02/03 20:46:08 RequestId=56b26710-INbW8UWMUONtH8Ttddd2Qg== From=24.23.246.59:60634 Service=myservice NrSubscribers=1 Subscribers="[mytestsubscriber]"
[Push][Info] 2016/02/03 20:46:09 RequestID=56b26710-INbW8UWMUONtH8Ttddd2Qg== Service=myservice Subscriber=mytestsubscriber PushServiceProvider=apns:9f49c9c618c97bebe21bea159d3c7a8577934bdf00 DeliveryPoint=apns:2cbecd0798cc6731d96d5b0fb01d813c7c9a83af MsgId=apns:apns:9f49c9c618c97bebe21bea159d3c7a8577934bdf-1 Success!

And a push notification on the device!

References

CUDA 7.5 on AWS GPU Instance Running Ubuntu 14.04

2015-11-22T17:32:00-08:00

Launch stock Ubuntu AMI

Launch ami-d05e75b8
Choose a GPU instance type: g2.2xlarge or g2.8xlarge
Increase the size of the storage (this depends on what else you plan to install, I’d suggest at least 20 GB)

SSH in

$ ssh ubuntu@

Install CUDA repository

$ wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.5-18_amd64.deb
$ sudo dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.deb

Update APT

$ sudo apt-get update
$ sudo apt-get upgrade -y
$ sudo apt-get install -y opencl-headers build-essential protobuf-compiler \
    libprotoc-dev libboost-all-dev libleveldb-dev hdf5-tools libhdf5-serial-dev \
    libopencv-core-dev  libopencv-highgui-dev libsnappy-dev libsnappy1 \
    libatlas-base-dev cmake libstdc++6-4.8-dbg libgoogle-glog0 libgoogle-glog-dev \
    libgflags-dev liblmdb-dev git python-pip gfortran

You will get a dialog regarding the menu.lst file, just choose the default option it gives you.

Do some cleanup:

$ sudo apt-get clean

DRM module workaround

$ sudo apt-get install -y linux-image-extra-`uname -r` linux-headers-`uname -r` linux-image-`uname -r`

For an explanation of why this is needed, see Caffe on EC2 Ubuntu 14.04 Cuda 7 and search for this command.

Install CUDA

$ sudo apt-get install -y cuda
$ sudo apt-get clean

Verify CUDA

$ nvidia-smi

You should see:

+------------------------------------------------------+
| NVIDIA-SMI 352.63     Driver Version: 352.63         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K520           Off  | 0000:00:03.0     Off |                  N/A |
| N/A   30C    P0    36W / 125W |     11MiB /  4095MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Make sure kernel module and devices are present:

ubuntu@ip-10-33-135-228:~$ lsmod | grep -i nvidia
nvidia               8642880  0
drm                   303102  1 nvidia
ubuntu@ip-10-33-135-228:~$ ls -alh /dev | grep -i nvidia
crw-rw-rw-  1 root root    195,   0 Nov 23 01:59 nvidia0
crw-rw-rw-  1 root root    195, 255 Nov 23 01:58 nvidiactl

References

Caffe on EC2 Ubuntu 14.04 Cuda 7

Running Neural Style on an AWS GPU instance

2015-11-22T11:02:00-08:00

These instructions will walk you through getting neural-style up and running on an AWS GPU instance.

Spin up CUDA-enabled AWS instance

Follow these instructions to install CUDA 7.5 on AWS GPU Instance Running Ubuntu 14.04.

SSH into AWS instance

$ ssh ubuntu@

Install Docker

$ sudo apt-get update && sudo apt-get install curl
$ curl -sSL https://get.docker.com/ | sh

As the post-install message suggests, enable docker for non-root users:

$ sudo usermod -aG docker ubuntu

Verify correct install via:

$ sudo docker run hello-world

Mount GPU devices

Mount

$ cd /usr/local/cuda/samples/1_Utilities/deviceQuery
$ sudo make
$ sudo ./deviceQuery

You should see something like this:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GRID K520"
  CUDA Driver Version / Runtime Version          6.5 / 6.5
  ... snip ...

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = GRID K520
Result = PASS

Verify: Find all your nvidia devices

$ ls -la /dev | grep nvidia

You should see:

crw-rw-rw-  1 root root    195,   0 Oct 25 19:37 nvidia0
crw-rw-rw-  1 root root    195, 255 Oct 25 19:37 nvidiactl
crw-rw-rw-  1 root root    251,   0 Oct 25 19:37 nvidia-uvm

Start Docker container

$ export DOCKER_NVIDIA_DEVICES="--device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm"
$ sudo docker run -ti $DOCKER_NVIDIA_DEVICES kaixhin/cuda-torch /bin/bash

Re-install CUDA 7.5 in the Docker container

As reported in the Torch7 Google Group and in Kaixhin/dockerfiles, there is an API version mismatch with the docker container and the host’s version of CUDA.

The workaround is to re-install CUDA 7.5 via:

$ wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.5-18_amd64.deb
$ sudo dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.
deb
$ sudo apt-get update
$ sudo apt-get upgrade -y
$ sudo apt-get install -y opencl-headers build-essential protobuf-compiler \
    libprotoc-dev libboost-all-dev libleveldb-dev hdf5-tools libhdf5-serial-dev \
    libopencv-core-dev  libopencv-highgui-dev libsnappy-dev libsnappy1 \
    libatlas-base-dev cmake libstdc++6-4.8-dbg libgoogle-glog0 libgoogle-glog-dev \
    libgflags-dev liblmdb-dev git python-pip gfortran
$ sudo apt-get clean
$ sudo apt-get install -y linux-image-extra-`uname -r` linux-headers-`uname -r` linux-image-`uname -r`
$ sudo apt-get install -y cuda

Verify CUDA inside docker container

Running:

$ nvidia-smi 

Should show info about the GPU driver and not return any errors.

Running this torch command:

$ th -e "require 'cutorch'; require 'cunn'; print(cutorch)"

Should produce this output:

{
  getStream : function: 0x4054b760
  getDeviceCount : function: 0x408bca58
  .. etc
}

Install neural-style

The following should be run inside the docker container:

$ apt-get install -y wget libpng-dev libprotobuf-dev protobuf-compiler
$ git clone --depth 1 https://github.com/jcjohnson/neural-style.git
$ /root/torch/install/bin/luarocks install loadcaffe

Download models

$ cd neural-style
$ sh models/download_models.sh

Run neural style

First, grab a few images to test with

$ mkdir images
$ wget https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg/1280px-Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg -O images/vangogh.jpg
$ wget http://exp.cdn-hotels.com/hotels/1000000/10000/7500/7496/7496_42_z.jpg -O images/hotel_del_coronado.jpg

Run it:

$ th neural_style.lua -style_image images/vangogh.jpg -content_image images/hotel_del_coronado.jpg

CuDNN (optional)

CuDNN can potentially speed things up.

download cuDNN

Install via:

tar -xzvf cudnn-7.0-linux-x64-v3.0-prod.tgz
cd cuda/
sudo cp lib64/libcudnn* /usr/local/cuda-7.5/lib64/
sudo cp include/cudnn.h /usr/local/cuda-7.5/include
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-7.5/lib64/

Install the torch bindings for cuDNN:

luarocks install cudnn

References

Neural-Style INSTALL.md
ami-84c787ee — this AMI has everything pre-installed, however it is installed on the host rather than under docker, which was due to time constraints.

Running the Sync Gateway Amazon AMI

2015-11-03T11:11:00-08:00

How to run the Couchbase Sync Gateway AWS AMI

Kick off AWS instance

Browse to the Sync Gateway AMI in the AWS Marketplace
Click Continue
Change all ports to “MY IP” except for port 4984
Make sure you choose a key that you have locally

SSH in and start Sync Gateway

Go to the AWS console, find the EC2 instance, and find the instance public ip address. It should look like this: ec2-54-161-201-224.compute-1.amazonaws.com. The rest of the instructions will refer to this as
ssh ec2-user@ (this should let you in without prompting you for a password. if not, you chose a key when you launched that you don’t have locally)
Start the Sync Gateway with this command:

/opt/couchbase-sync-gateway/bin/sync_gateway -interface=0.0.0.0:4984 -url=http://localhost:8091 -bucket=sync_gateway -dbname=sync_gateway

You should see output like this:

2015-11-03T19:37:05.384Z ==== Couchbase Sync Gateway/1.1.0(28;86f028c) ====
2015-11-03T19:37:05.384Z Opening db /sync_gateway as bucket "sync_gateway", pool "default", server 
2015-11-03T19:37:05.384Z Opening Couchbase database sync_gateway on 
2015/11/03 19:37:05  Trying with selected node 0
2015/11/03 19:37:05  Trying with selected node 0
2015-11-03T19:37:05.536Z Using default sync function 'channel(doc.channels)' for database "sync_gateway"
2015-11-03T19:37:05.536Z     Reset guest user to config
2015-11-03T19:37:05.536Z Starting profile server on
2015-11-03T19:37:05.536Z Starting admin server on 127.0.0.1:4985
2015-11-03T19:37:05.550Z Starting server on localhost:4984 ...

Verify via curl

From your workstation:

$ curl http://:4984/sync_gateway/

You should get a response like:

{"committed_update_seq":1,"compact_running":false,"db_name":"sync_gateway","disk_format_version":0,"instance_start_time":1446579479331843,"purge_seq":0,"update_seq":1}

Customize configuration

For more advanced Sync Gateway configuration, you will want to create a JSON config file on the EC2 instance itself and pass that to Sync Gateway when you launch it, or host your config JSON on the internet somewhere and pass Sync Gateway the URL to the file.

View Couchbase Server UI

In order to login to the Couchbase Server UI, go to :8091 and use:

Username: Administrator
Password: