As a warning, this blog post is pretty messy because of all those stumbling blocks, so you’re probably better off just following the official autoware installation docs and referring to this in case you run into the same problems.
Good luck!!
Autoware currently supports both 20.04 and 22.04 (but not 18.04), and I decided to go with 20.04 since it was the next LTS version after the version I had installed (18.04).
I noticed that autoware recommended cuda version of 11.6, which only has official downloads for 20.04 and not 22.04, so that made me think that Ubuntu 20.04 was the better choice.
Here are the steps to upgrade to Ubuntu 20.04: official instructions.
I decided to go with the easier docker install until I had a need to use the source install.
1 2 3 |
|
Install docker engine based on these instructions. This links to the snapshot of the instructions that I used (as do below links). If you want to use the latest instructions, change the 0423b84ee8d763879bbbf910d249728410b16943
commit hash in the URL to main
.
Install the nvidia container toolkit based on these instructions.
After this step I was able to run nvidia-smi
within the container:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Rocker is an alternative to Docker compose used by Autoware.
Installed rocker based on these instructions.
After this step, running rocker
shows the rocker help.
I ran:
1
|
|
but got this error:
1 2 3 4 |
|
Using the approach suggested in this github post to add -e NVIDIA_DISABLE_REQUIRE=true
, I ran the new command:
1
|
|
which seemed to work, as it dropped me into a container:
1 2 |
|
Based on the response from the super helpful folks at Autoware in this discussion I determined I needed to upgrade my Cuda version based on these instructions. (see later step below)
In the source instructions, it mentions that autoware depends on vcstool, which is a tool that makes it easy to manage code from multiple repos.
Install with:
1 2 3 |
|
In the container shell (started above):
1 2 3 4 5 6 7 |
|
it took about 40 mins to build:
1 2 |
|
According to the docs: “Ad hoc simulation is a flexible method for running basic simulations on your local machine, and is the recommended method for anyone new to Autoware.”, but there are no docs on how to do run an ad hoc simulation, so I am going to try a planning simulation based on the planning simulation docs
This tool is needed to download the map data.
1
|
|
In the container started above:
1 2 3 |
|
From inside the container:
1 2 |
|
I’m seeing a ton of errors like:
1
|
|
and red-herring error
1
|
|
and possible red herring error
1
|
|
These issues seem related:
I think this error matters the most, since I get it if I try to launch rviz directly:
1
|
|
The same error was reported in https://github.com/ros2/rviz/issues/753 and https://github.com/NVIDIA/nvidia-docker/issues/1438.
I will update my nvidia driver as alluded to above, remove the -e NVIDIA_DISABLE_REQUIRE=true
workaround, and retry.
I erroneously used the official nvidia instructions for installing cuda, so instead use the official autoware instructions to install cuda rather than the steps below.
1 2 3 4 5 6 7 |
|
It failed on the last step:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Based on this advice, I’m going to re-install. First I am purging:
1
|
|
Reboot.
I still had some libnvidia packages, so I purged them with:
1
|
|
Since I’m running a system76 laptop, I went to them for support in order to upgrade the nvidia drivers.
1
|
|
and now I’m running nvidia 515.65.01:
1 2 3 4 |
|
Again for this step I erroneously used the official nvidia instructions for installing cuda, so instead use the official autoware instructions to install cuda rather than the steps below.
1 2 3 4 |
|
This succeeded, but now nvidia-smi
does not work:
1 2 |
|
I later realized that I diverged from the autoware instructions in two ways:
cuda_version=11-4 apt install cuda-${cuda_version} --no-install-recommends
Post-installation actions:
1 2 3 |
|
I simply rebooted, and now nvidia-smi works. Note that the cuda version went from 11.7 to 11.6. The strange thing is that previously I idn’t have the cuda packages installed.
1 2 3 4 5 |
|
1
|
|
but got error:
1 2 |
|
I realized I’m still missing several requirements:
I installed nvidia container toolkit based on these autoware instructions
And now it’s able to start a container and run nvidia-smi
:
1 2 3 4 5 |
|
1 2 3 4 5 6 |
|
This error was caused by another divergence from the autoware instructions, where I didn’t run this step:
1
|
|
I re-ran all of these steps from the autoware docs:
1 2 3 4 5 |
|
and now this step worked:
1
|
|
Pin the libraries at those versions with:
1
|
|
Using these instructions:
1 2 3 |
|
1 2 3 4 5 6 7 8 9 |
|
This worked, but at first I was very confused that it actually worked.
It drops you back at the prompt with no meaningful output, but if you look closely, it’s a different prompt. The hostname changes from your actual hostname (apollo in my case), to this cryptic container name (0b1ce9ed54bd
).
Note that if you run this in the container:
1 2 3 |
|
you will see meaningful output, whereas if you run that on your host, you will most likely see ros2: command not found
, unless you had installed ros2
on your host previously.
(also requires maps download, see above)
From inside the container:
1 2 |
|
But the same errors are showing up:
1 2 3 4 |
|
Note that these errors are also shown if I run rviz2
from within the container.
1 2 3 4 5 6 7 |
|
Relevant github issues:
The recommended fix is:
1
|
|
I don’t currently have either of those libraries installed:
1 2 |
|
I installed these packages on the host (outside the container), but that didn’t fix the issue.
I tried installing the packages in the container, but that didn’t work either.
There is a discrepancy between glxinfo
on the host vs container.
In this container:
1
|
|
It is returning an error:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Whereas on the host:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
This stack overflow post suggested a workaround, and according to the rocker docs
“For Intel integrated graphics support you will need to mount the /dev/dri directory as follows:”
1
|
|
After restarting a container with that flag, it no longer shows the libGL error: MESA-LOADER: failed to retrieve device information
error.
I posted a question on the autoware forum to find out why this workaround was needed. Apparently there is another way to solve this problem by forcing the use of the nvidia gpu rather than the intel graphics card:
1 2 3 4 |
|
but I haven’t verified this yet.
Add the --devices /dev/dri
flag:
1
|
|
And now it finally works!! Running rviz2
from within the container shows the rviz window:
(also requires maps download, see above)
From inside the container:
1 2 |
|
and rviz launched with autoware configured:
Phew! That was a lot harder than I thought it was going to be! It would have gone smoother if:
--devices /dev/dri
or nvidia “prime-select” workarounds.Unfortunately after these steps, rviz2
is still using the integrated cpu driver rather than the GPU.
See this follow-up post to see how to get it running on the GPU.
Non-requirements:
And here is the tech stack:
SQLite was chosen over MySQL since this is one less “moving part” and slightly easier to manage. See this blog post for the rationale.
Lightsail seems like a good value since you can get a decent sized instance and a static IP for $5.
Login to the AWS console and create a Lightsail instance with the following specs:
~/.ssh/id_rsa.pub
(maybe make a copy and rename it with a better name to easily find it in the AWS console later)You should see the following:
Go to the Lightsail Networking section, and choose “Attach static ip”. Associate the static ip with the lightsail instance, and make a note of it as you will need in the next step.
Go to your DNS register where you registered your blog domain name (eg, Namecheap), and add a new A record as follows:
ssh in via ssh ubuntu@<your ligthsail instance ip>
Update the apt package list:
1
|
|
Install nginx:
1 2 |
|
Install nodejs:
Add the NodeSource APT repository for Node 12, then install nodejs
1 2 |
|
Install Ghost-CLI
1
|
|
Create a directory to hold the blog:
1 2 3 4 |
|
Install Ghost:
1
|
|
If you get an error about the node.js version being out of date, see the “Error installing ghost due to node.js being out of date” section below.
Here is how I answered the setup questions, but you can customize to your needs:
I decided to setup SSL in a separate step rather than initially, but the more secure approach would be to use https instead, eg https://blog1.domainA.com for the blog URL and answer Yes to the setup SSL question, which will trigger SSL setup initially.
If you do setup SSL, you will need to open port 443 in the Lightsail console, otherwise it won’t work. See the “Setup SSL” section below for instructions.
This part is a little scary, (and ghosts are scary), but Ghost basically puts your blog unprotected to the world without an admin user. The first person that stumbles across it gets to become the admin user. You want that to be you!
Quickly go to http://blog1.domainA.com and create the Ghost admin user.
Go to your DNS register where you registered your blog domain name (eg, Namecheap), and add a new A record as follows:
1 2 3 4 |
|
Install Ghost:
1
|
|
Use the same steps above, except for the blog URL use: http://blog.domainB.com
You now have two separate Ghost blogging sites setup on a single $5 / mo AWS Lightsail instance.
If you see this error:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
Find an older version of ghost that is compatible with the node.js you have installed, then specify that version of ghost when installing it:
1
|
|
How do you find that version? I happened to have another blog folder that I had previously installed, so I just used that. Maybe on the ghost website they have a compatibility chart.
The downside of this approach is that you won’t have the latest and greatest version of ghost, including security updates. The upside though is that you won’t break any existing ghost blogs on the same machine by upgrading node.js.
In the error above, it mentions that ghost requires node.js 14.17.0 or above.
The downside is that this could potentially break other existing ghost blogs on the same machine that are not compatible with the later version of node.js. Using containers to isolate dependencies would be beneficial here.
Upgrade to that version of node.js based on these instructions:
1 2 |
|
Run node -v
to verify that you’re running a recent enough version:
1 2 |
|
Update the ghost cli version:
1
|
|
Retry the ghost install command:
1
|
|
and this time it should not complain about the node.js version.
During installation, you can answer “Yes” to setup SSL, and it will ask you for your email and use letsencrypt to generate a certificate for you. See this page for more details.
But you must also open port 443 in your Lightsail firewall, otherwise it won’t work.
Lets Encrypt certificates expire after 90 days. To avoid downtime on your site, you should auto-renew the certificates. See this blog post for details.
I tried to follow the blog post, and ran ghost setup ssl-renew
in my blog folder, but after switching to root with sudo su
, I noticed this existing cron entry:
1 2 |
|
So it looks like it is already setup to renew the certs every day.
1 2 3 4 5 6 7 8 |
|
You should see reams of output, followed by:
1 2 3 4 |
|
1
|
|
Now you can access the OpenWhisk CLI:
1 2 3 4 5 6 7 8 9 10 11 |
|
Re-run the “Hello world” via:
1 2 3 4 |
|
I tried following the instructions on James Thomas’ blog for running Go within Docker, but ran into an error (see Disqus comment), and so here’s how I worked around it.
First create a simple Go program and cross compile it. Save the following to exec.go
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Cross compile it for Linux:
1
|
|
Pull the upstream Docker image:
1
|
|
Create a custom docker image based on openwhisk/dockerskeleton
:
1 2 3 |
|
Build and test:
1 2 3 |
|
Push up the docker image to dockerhub:
1
|
|
Create the OpenWhisk action:
1
|
|
Invoke the action to verify it works:
1 2 3 4 5 6 7 8 |
|
Save this to main.go
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
|
Build and package into docker image, and push up to docker hub
1 2 3 |
|
Create an OpenWhisk action:
1
|
|
Invoke it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Create a Cloudant database via the Bluemix web admin.
Under the Permissions control panel section for the database, choose Generate a new API key.
Check the _writer permission and make a note of the Key and Password
Verify connectivity by making a curl request:
1 2 3 4 |
|
1
|
|
I’m currently getting this error:
1
|
|
At this point I swiched to the OpenWhisk on Bluemix, and downloaded the wsk
cli from the Bluemix website, and configure it with my api key per the instructions. Then I re-installed the action via:
1
|
|
and made sure it worked by running:
1
|
|
Following these instructions:
You can get your Bluemix Org name (maybe the first part of your email address by default) and BlueMix space (dev by default) from the Bluemix web admin.
1 2 |
|
Refresh packages:
1 2 3 4 5 |
|
It didn’t work according to the docs, and no bindings were created even though I had created a Cloudant database in the Bluemix admin earlier.
I retried the package bind
command that had failed earlier:
1
|
|
and this time success!!
1
|
|
Try writing to the db with:
1
|
|
and you should get a response like:
1 2 3 4 5 |
|
The /yournamespace/myCloudant/write
action expects a dbname
parameter, but the upstream fetch_aws_keys
doesn’t contain that parameter. (and it’s better that it doesn’t, to reduce decoupling). So if you try to connect the two actions in a sequence at this point, it will fail.
1
|
|
Create a sequence that will invoke these actions in sequence:
1
|
|
testdb
database bound to the myCloudantTestDb packageTry it out:
1 2 3 4 5 6 |
|
To view the resulting document:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Let’s say we wanted this to run every minute.
First create an alarm trigger that will fire every minute:
1
|
|
Now create a rule that will invoke the fetch_and_write_aws_keys
action (which is a sequence action) whenever the everyMinute
feed is triggered:
1
|
|
To verify that it is working, check your cloudant database to look for new docs:
1
|
|
Or you can also monitor the activations:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
1
|
|
The main parameter you will need to provide to postgres is a root db password. Replace *********
with a good password and run this command:
1
|
|
1 2 3 4 5 6 |
|
You now have a working postgres database server.
When running postgres under docker, most likely want to persist the database files on the host, rather than having them in the container.
First, remove the previous container with:
1
|
|
Go into the /tmp
directory:
1
|
|
Launch a container and use /tmp/pgdata
as the host directory to mount as a volume mount, which will be mounted in the container in /var/lib/postgresql/data
, which is the default location where Postgres stores it’s data. The /tmp/pgdata
directory will be created on the host if it doesn’t already exist.
1
|
|
List the contents of /tmp/pgdata
and you should see several Postgres files:
1 2 |
|
1
|
|
Now you will be in a shell inside the docker container
1 2 3 4 |
|
1
|
|
In your browser, open http://localhost:8080/ and you should see the phpadmin login screen:
Login with user/pass credentials created earlier:
testuser
**********
Security warning! This is not a secure deployment and it’s not recommended to run this in production without a thorough audit by a security specialist.
Create a new stack and paste it into the box
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
For example:
Find the postgres-server container and hit the Terminal menu to get a shell on that container.
Enter:
1 2 3 |
|
Find the phppgadmin service in the Docker Cloud Web UI, and look for the service endpoint, which should look something like this:
http://phppgadmin.postgres.071a32d40.svc.dockerapp.io:8085/
Login with user/pass credentials created earlier:
testuser
**********
Essentially you can think of them like stateful functions, in the sense that they encapsulate state. The state that they happen to capture (or “close over” — hence the name “closure”) is everything that’s in scope when they are defined.
First some very basic higher order functions.
Functions that take other functions and call them are called higher order functions. Here’s a trivial example:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
In the main()
function, we define a function called mySender
and pass it to the sendLoop()
function. sendLoop()
takes a confusing looking argument called sender func()
— the parameter name is sender
, and the parameter type is func()
, which is a function that takes no arguments and returns no values.
To make this slightly less confusing, we can define a named SenderFunc
function type and use that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
sendLoop()
has been updated to take SenderFunc
as an argument, which is easier to read than taking a func()
as an argument (which looks a bit like a function call!) If the SenderFunc
type took more parameters and/or returned more values, having this in a defined type would be crucial for readability.
Let’s make it slightly more realistic — let’s say that the sendLoop()
might need to retry calling the SenderFunc
passed to it a few times until it actually works. So the SenderFunc
definition will need to be updated so that it returns a boolean that indicates whether a retry is necessary.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
One thing to note here is the clean separation of concerns — all sendLoop()
knows is that it gets a SenderFunc
which it should call and it will return a boolean indicator of whether or not it worked or not. It knows absolutely nothing about the inner workings of the SenderFunc
, nor does it care.
You have a new requirement that you need to only retry the SenderFunc
10 times, and then you should give up.
Your first inclination might be to take this approach:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
This will work, but it makes the sendLoop()
less generally useful. What happens when your co-worker hears about this nifty sendLoop()
you wrote, and wants to use it with their own SenderFunc
but wants it to retry 100 times? (side note: your SenderFunc
implementation simply prints to the console, whereas theirs might write to a Slack channel, yet the sendLoop()
will still work!)
To make it more generic, you could take this approach:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
Which will work — but there’s a catch. Now that you’ve changed the method function signature of sendLoop()
to take a second argument, all of the code that consumes sendLoop()
will now be broken. If this were an exported function, it would be an even worse problem.
Luckily there is a much better way.
Rather than making sendLoop()
do the retry-related accounting and passing it parameters for that accounting, you can make the SenderFunc
handle this and encapsulate the state via a function closure. In this case, the state is the number of retries that have been attempted, which will start at 0 and then increase on every call to the SenderFunc
How can SenderFunc
keep internal state? It can “close over” any values that are in scope, which become associated with the function instance (I’m calling it an “instance” because it has state, as we shall see) and will be bound to the function instance as long as the function instance is around.
Here’s what the final code looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
The counter
state variable is bound to the mySender
function instance, which is able to update counter
on every failed send attempt since the function “closes over” the counter
variable that is in scope when the function instance is created. This is the heart of the idea of a function closure.
The sendLoop()
doesn’t know anything about the internals of the SenderFunc
in terms of how it tracks whether or not it should retry or not, it just treats it as a black box. Different SenderFunc
implementations could use vastly different rules and/or states for deciding whether the sendLoop()
should retry a failed send.
If you wanted to make it even more flexible, you could update the SenderFunc
to return a time.Duration
in addition to a bool
to indicate retry, which would allow you to implement “backoff retry” strategies and so forth.
If you’re passing the same function instances that have internal state (aka function closures) to multiple goroutines that are calling it, you’re going to end up causing data races. There’s nothing special about function closures that protect you from this.
The simplest way to deal with is to make a new function instance for each goroutine you are sending the function instance to, which is probably what you want. In theory though, you could also wrap the state update in a mutex, which is probably not what you want since that will cause goroutines to block eachother trying to grab the mutex.
]]>TIME_WAIT
state.
Here are a few ways to get into this situation and how to fix each one.
Run the following code on a linux machine:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
|
and in a separate terminal while the program is running, run:
1
|
|
and you will see this number constantly growing:
1 2 3 4 5 6 7 8 9 |
|
Update the startLoadTest()
method to add the following line of code (and related imports):
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Now when you re-run it, calling netstat -n | grep -i 8080 | grep -i time_wait | wc -l
while it’s running will return 0.
Another way to end up with excessive connections in the TIME_WAIT
state is to consistently exceed the connnection pool and cause many short-lived connections to be opened.
Here’s some code which starts up 100 goroutines which are all trying to make requests concurrently, and each request has a 50 ms delay:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
In another shell run netstat
, note that the number of connections in the TIME_WAIT
state is growing again, even though the response is being read
1 2 3 4 5 6 7 8 9 |
|
To understand what’s going on, we’ll need to dig in a little deeper into the TIME_WAIT
state.
TIME_WAIT
state anyway?So what’s going on here?
What’s happening is that we are creating lots of short lived TCP connections, and the Linux kernel networking stack is keeping tabs on the closed connections to prevent certain problems.
From The TIME-WAIT state in TCP and Its Effect on Busy Servers:
The purpose of TIME-WAIT is to prevent delayed packets from one connection being accepted by a later connection. Concurrent connections are isolated by other mechanisms, primarily by addresses, ports, and sequence numbers[1].
By default, the Golang HTTP client will do connection pooling. Rather than closing a socket connection after an HTTP request, it will add it to an idle connection pool, and if you try to make another HTTP request before the idle connection timeout (90 seconds by default), then it will re-use that existing connection rather than creating a new one.
This will keep the number of total socket connections low, as long as the pool doesn’t fill up. If the pool is full of established socket connections, then it will just create a new socket connection for the HTTP request and use that.
So how big is the connection pool? A quick look into transport.go tells us:
1 2 3 4 5 6 7 8 9 10 11 |
|
MaxIdleConns: 100
setting sets the size of the connection pool to 100 connections, but with one major caveat: this is on a per-host basis. See the comments on the DefaultMaxIdleConnsPerHost
below for more details on the implications of this.IdleConnTimeout
is set to 90 seconds, meaning that after a connection stays in the pool and is unused for 90 seconds, it will be removed from the pool and closed.DefaultMaxIdleConnsPerHost = 2
setting below it. What this means is that even though the entire connection pool is set to 100, there is a per-host cap of only 2 connections!In the above example, there are 100 goroutines trying to concurrently make requests to the same host, but the connection pool can only hold 2 sockets. So in the first “round” of the goroutines finishing their http request, 2 of the sockets will remain open in the pool, while the remaining 98 connections will be closed and end up in the TIME_WAIT
state.
Since this is happening in a loop, you will quickly accumulate thousands or tens of thousands of connections in the TIME_WAIT
state. Eventually, for that particular host at least, you will run out of ephemeral ports and not be able to open new client connections. For a load testing tool, this is bad news.
Here’s how to fix this issue.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
This bumps the total maximum idle connections (connection pool size) and the per-host connection pool size to 100.
Now when you run this and check the netstat
output, the number of TIME_WAIT
connections stays at 0
1 2 3 4 5 6 7 8 |
|
The problem is now fixed!
If you have higher concurrency requirements, you may want to bump this number to something higher than 100.
]]>Also available as a screencast
Launch a node cluster with the following settings:
Go to Services and hit the Create button:
Click the globe icon and Search Docker Hub for couchbase/server
. You should select the couchbase/server
image:
Hit the Select button and fill out the following values on the Services Wizard:
In the Ports section: Enable published on each port and set the Node Port to match the Container Port
Hit the Create and Deploy button. After a few minutes, you should see the Couchbase Server vervice running:
Go to the Container section and choose couchbaseserver-1.
Copy and paste the domain name (eca0fe88-7fee-446b-b006-99e8cae0dabf.node.dockerapp.io
) into your browser, adding 8091 at the end (eca0fe88-7fee-446b-b006-99e8cae0dabf.node.dockerapp.io:8091
)
You should now see the Couchbase Server setup screen:
You will need to find the container IP of Couchbase Server in order to configure it. To do that, go to the Terminal section of Containers/couchbaseserver-1, and enter ifconfig
.
Look for the ethwe1
interface and make a note of the ip: 10.7.0.2
— you will need it in the next step.
Switch back to the browser on the Couchbase Server setup screen. Leave the Start a new cluster button checked. Enter the 10.7.0.2
ip address (or whatever was returned for your ethwe1
interface) under the Hostname field.
and hit the Next button.
For the rest of the wizard, you can:
Create a new bucket for your application:
Go to the Container section and choose couchbaseserver-2.
As in the previous step, copy and paste the domain name (4d8c7be0-3f47-471b-85df-d2471336af75.node.dockerapp.io
) into your browser, adding 8091 at the end (4d8c7be0-3f47-471b-85df-d2471336af75.node.dockerapp.io:8091
)
Hit Setup and choose Join a cluster now with settings:
ifconfig
and looking for the ip of the ethwe1
interface)Trigger a rebalance by hitting the Rebalance button:
Now create a Sync Gateway service.
Before going through the steps in the Docker Cloud web UI, you will need to have a Sync Gateway configuration somewhere on the publicly accessible internet.
Warning: This is not a secure solution! Do not use any sensitive passwords if you follow these steps
To make it more secure, you could:
Create a Sync Gateway configuration on a github gist and get the raw url for the gist.
server
value to http://couchbaseserver:8091
so that it can connect to the Couchbase Service setup in a previous step.In the Docker Cloud web UI, go to Services and hit the Create button again.
Click the globe icon and Search Docker Hub for couchbase/sync-gateway
. You should select the couchbase/sync-gateway
image.
Hit the Select button and fill out the following values on the Services Wizard:
In the Container Configuration section, customize the Run Command to use the raw URL of your gist, eg: https://gist.githubusercontent.com/tleyden/f260b2d9b2ef828fadfad462f0014aed/raw/8f544be6b265c0b57848
In the Ports section, use the following values:
In the Links section, choose couchbaseserver and hit the Plus button
Click the Create and Deploy button.
Click the Containers section and you should have two Couchbase Server and two Sync Gateway containers running.
Click the sync-gateway-1 container and get the domain name (eca0fe88-7fee-446b-b006-99e8cae0dabf.node.dockerapp.io
) and paste it in your browser with a trailing :4984
, eg eca0fe88-7fee-446b-b006-99e8cae0dabf.node.dockerapp.io:4984
You should see the following JSON response:
1 2 3 4 5 6 7 8 |
|
Click the Services section and hit the Create button. In the bottom right hand corner look for Proxies and choose dockercloud/haproxy
General Settings:
Ports:
80
Links:
Hit the Create and Deploy button
Click the Containers section and choose sgloadbalancer-1.
Copy and paste the domain name (eg, eca0fe88-7fee-446b-b006-99e8cae0dabf.node.dockerapp.io
) into your browser.
You should see the following JSON response:
1 2 3 4 5 6 7 8 |
|
Congratulations! You have just setup a Couchbase Server + Sync Gateway cluster on Docker Cloud.
]]>Remember back in the day when you wanted to know what time it was, and you picked up your phone and dialed 853-1212 and it said “At the tone, the time will be 8:53 AM?”.
Those days are over, but the idea lives on. The time service is identical in principal to an internet server. You ask it something, and it gives you an answer.
A well designed service does one thing, and one thing well.
With the time service, you can only ask one kind of question: “What time is it?”
With a DNS server, you can only ask one kind of question: “What is the IP address of organic-juice-for-dogs.io”
Clients vs Servers:
A “Client” can essentially be thought of as being a “Customer”. In the case of calling the time, it’s the person dialing the phone number. In the case of DNS, it’s the Google Chrome browser asking for the IP address.
A “Server” can be thought of as being a “Service”. In the case of calling the time, it’s something running at the phone company. In the case of DNS, it’s a service run by a combination of universities, business, and governments.
The following programs are all web browsers, which are all technically HTTP Clients, meaning they are on the client end of the HTTP tube.
What web browsers do:
In the previous post, there were a few “protocols” mentioned, like HTTP.
What are protocols really?
Any protocol is something to make it possible for things that speak the same protocol to speak to each other over that protocol.
A protocol is just a language, and just like everyone in English-speaking countries agree to speak English and can therefore intercommunicate without issues, many things on the internet agree to speak HTTP to each other.
Here’s what a conversation looks like in the HTTP protocol:
1 2 |
|
Almost everything that happens on the Internet looks something like this:
1 2 3 4 5 6 7 8 9 10 11 |
|
Let’s look at a few protocols.
You can think of the internet as being made up of tubes. Two very common types of tubes are:
Here’s what you might imagine an internet tube looking like:
Really, you can think of TCP and UDP as internet tubes that are built from the same kind of concrete — and that concrete is called IP (Internet Protocol)
TCP wraps IP, in the sense that it is built on top of IP. If you took a slice of a TCP internet tube, it would look like this:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Ditto for UDP — it’s also built on top of IP. The slice of a UDP internet tube would look like this:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
IP, or “Internet Protocol”, is fancy way of saying “How machines on the Internet talk to each other”, and IP addresses are their equivalent of phone numbers.
Why do we need two types of tubes built on top of IP? They have different properties:
¯\_(ツ)_/¯
of internet tubes. If you send something down a UDP internet tube, you actually have no idea whether it will make it down the tube or not. It might seem useless, but it’s not. Pretty much all real time gaming, voice, and video transmissions go through UDP tubes.If you take a slice of an HTTP tube, it looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Because HTTP sits on top of TCP, which in turn sits on top of IP.
DNS tubes are very similar to HTTP tubes, except they sit on top of UDP tubes. Here’s what a slice might look like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
So when your Google Chrome web browser gets a web page over an HTTP tube, it actually looks more like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
Each of these random computers in between are called routers, and they basically shuttle traffic across the internet. They make it possible that any two computers on the internet can communicate with each other, without having a direct connection.
If you’re curious to know which computers are in the middle of your connection between you and another computer on the internet, you can run a nifty little utility called traceroute
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
So from my computer to the computer at google.com, it goes through all of those intermediate computers. Some have DNS names, like be-33651-cr01.sunnyvale.ca.ibone.comcast.net
, but some only have IP addresses, like 162.151.78.93
Any one of those computers could sniff the traffic going through the tubes (even the IP tubes that all the other ones sit on top of!). That’s one of the reasons you don’t want to send your credit cards over the internet without using encryption.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
It all starts with a DNS lookup.
Your Google Chrome software contacts a server on the Internet called a DNS server and asks it “Hey what’s the IP of organic-juice-for-dogs.io?”.
DNS has an official sounding acronym, and for good reason, because it’s a very authoritative and fundamental Internet service.
So what exactly is DNS useful for?
It transforms Domain names into IP addresses
1 2 3 4 5 6 7 8 9 10 11 12 |
|
A Domain name, also referred to as a “Dot com name”, is an easy-to-remember word or group of words, so people don’t have to memorize a list of meaningless numbers. You could think of it like dialing 1-800-FLOWERS, which is a lot easier to remember than 1-800-901-1111
The IP address 63.120.10.5
is just like a phone number. If you are a human being and want to call someone, you might dial 415-555-1212
. But if you’re a thing on the internet and you want to talk to another thing on the internet, you instead dial the IP address 63.120.10.5
— same concept though.
So, that’s DNS in a nutshell. Not very complicated on the surface.
In this step, Google Chrome sends an HTTP GET /
HTTP request to the HTTP Server software running on a computer somewhere on the Internet that has the IP address 63.120.10.5
.
You can think of the GET /
as “Get me the top-most web page from the website”. This is known as the root of the website, in contrast to things deeper into the website, like GET /juices/oakland
, which might return a list of dog juice products local to Oakland, CA. Since the root is a the top, that means the tree is actually upside down, and folks tend to think of websites as being structured as inverted trees.
The back-and-forth is going to look something like this:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
These things are speaking HTTP to each other. What is HTTP?
You can think of things that communicate with each other over the internet as using tubes. There are lots of different types of tubes, and in this case it’s an HTTP tube. As long as the software on both ends agree on the type of tube they’re using, everything just works and they can send stuff back and forth. HTTP is a really common type of tube, but it’s not the only one — for example the DNS lookup in the previous step used a completely different type of tube.
Usually the stuff sent back from the HTTP Server is something called HTML, which stands for HyperText Markup Language.
But HTML is not the only kind of stuff that can be sent through an HTTP tube. In fact, JSON (Javascript Object Notation) and XML (eXtensible Markup Language) are also very common. In fact there are tons of different types of things that can be sent through HTTP tubes.
So at this point in our walk through, the Google Chrome web browser software has some HTML text, and it needs to render it in order for it to appear on your screen in a nice easy to view format. That’s the next step.
HTML is technically a markup language, which means that the text contains formatting directives which has an agreed upon standard on how it should be formatted. You can think of HTML as being similar to a Microsoft Word document, but MS Word is obfuscated while HTML is very transparent and simple:
For example, here is some HTML:
1 2 3 4 5 6 7 |
|
Which gets rendered into:
So, you’ll notice that the <Header>
element is in a larger font. And the <Paragraph>
has spaces in between it and the other text.
How does the Google Chrome Web Browser do the rendering? It’s just a piece of software, and rendering HTML is one of it’s primary responsibilities. There are tons of poor engineers at Google who do nothing all day but fix bugs in the Google Chrome rendering code.
Of course, there’s a lot more to it, but that’s the essence of rendering HTML into a web page.
So this step is optional because not all web pages will execute JavaScript in your web browser software, however it’s getting more and more common these days. When you open the Gmail website in your browser, it’s running tons of Javascript code to make the website as fast and responsive as possible.
Essentially, JavaScript adds another level of dynamic abilities to HTML, because when the browser is given HTML and it renders it .. that’s it! There’s no more action, it just sits there — it’s completely inert.
JavaScript, on the other hand, is basically a program-within-a-program.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
How does the JavaScript get to the web browser? It sneaks in over the HTML! It’s embedded in the HTML, since it’s just another form of text, and your Web Browser (Google Chrome) executes it.
1 2 3 4 5 6 7 8 |
|
What can JavaScript do exactly? The list is really, really long. But as a simple example, if you click a button on a webpage:
A JavasScript program can pop up a little “Alert Box”, like this:
And that’s the World Wide Web! You just went from typing a URL in your browser, from a shiny web page in your Google Chrome. Soup to nuts.
And you can finally buy some juice for your dog!
So that’s it for the high level stuff.
If you’re dying to know more, continue on to Deep Dive of What Happens Under The Hood When You Open A Web Page
]]>1 2 3 4 |
|
Versions at the time of this writing:
Create db named “db”
1 2 |
|
Open /usr/local/etc/telegraf.conf
in your favorite text editor and uncomment the entire statsd server section:
1 2 3 4 5 6 |
|
Set the database to use the “db” database created earlier, under the outputs.influxdb
section of the telegraf config
1 2 3 4 5 6 7 8 |
|
1
|
|
In order for the field we want to show up on the grafana dashboard, we need to push some data points to the telegraf statds daemon.
Run this in a shell to push the foo:1|c
data point, which is a counter with value increasing by 1 on the key named “foo”.
1
|
|
Here’s how to update to your golang
application to push new datapoints.
1
|
|
statds
telegraf process from your go program:1 2 3 4 5 6 7 8 9 10 |
|
This will push statsd “timing” data points under the key “open_website”, with the normal sample rate (set to 0.1 to downsample and only take every 10th sample). Run the code in a loop and it will start pushing stats to statsd
.
Now, create a new Grafana dashboard with the steps above, but from the select measurement field choose open_website, and under SELECT choose field (mean) instead of field (value).
]]>1 2 3 4 5 6 7 8 9 10 11 12 |
|
Goroutine 44 was running this code:
1 2 3 |
|
and nil’ing out the r.EventChan field.
While goroutine 27 was calling this code on the same *Replication
instance:
1 2 3 |
|
It didn’t make sense, because they were accessing different fields of the Replication
— one was writing to r.EventChan
while the other was reading from r.Stats
.
Then I changed the GetStats()
method to this:
1 2 3 |
|
and it still failed!
I started wandering around the Couchbase office looking for help, and got Steve Yen to help me.
He was asking me about using a pointer receiver vs a value receiver here, and then we realized that by using a value reciever it was copying all the fields, and therefore reading all of the fields, including the r.EventChan
field that the other goroutine was concurrently writing to! Hence, the data race that was subtly caused by using a value receiver..
The fix was to convert this over to a pointer reciever, and the data race disappeared!
1 2 3 |
|
ssh ubuntu@<aws-instance>
and install docker
Go to github and register a new OAuth application using the following values:
It will give you a Client ID and Client Secret
/etc/drone/dronerc
config fileOn the ubuntu host:
1 2 |
|
Configure Remote Driver
Add these values:
1 2 |
|
and replace client_id
and client_secret
with the values returned from github.
Configure Database
Add these values:
1 2 |
|
1 2 3 4 5 6 7 8 9 |
|
Check the logs via docker logs <container-id>
and they should look something like this
With your instance selected, look for the security groups in the instance details:
Add a new inbound port with the following settings:
It should look like this when you’re done:
Paste the hostname of your aws instance into your browser (eg, http://ec2-54-163-185-45.compute-1.amazonaws.com
), and you should see a page like this:
If you click the login button, you should see:
And then:
Click one of the repositories you have access to, and you should get an “activate now” option:
which will take you to your project home screen:
.drone.yml
file to the root of the repositoryIn the repository you have chosen (in my case I’m using tleyden/sync_gateway
, which is a golang project, and may refer to it later), add a .drone.yml
file to the root of the repository with:
1 2 3 4 5 6 |
|
Commit your change, but do not push to github yet, that will be in the next step.
1 2 |
|
Now push your change up to github.
1
|
|
and in your drone UI you should see a build in progress:
when it finishes, you’ll see either a pass or a failure. If you get a failure (which I did), it will look like this:
In my case, the above failure was due to a dependency not building. Since nothing else needs to be pushed to the repo to fix the build, I’m just going to manually trigger a build.
On the build failure screen above, there is a Restart button, which triggers a new build.
Now it works!
I could run this on my OSX workstation, but I decided to run this on a linux docker container. The rest of the steps assume you have spun up and are inside a linux docker container.
1 2 |
|
Go to your Profile page in the drone UI, and click Show Token.
Now set these environment variables
1 2 |
|
Query repos
To test the CLI tool works, try the following commands:
1 2 3 4 5 |
|
After doing some research, I decided to try gvt
since it seemed simple and well documented, and integrated well with exiting tools like go get
.
1 2 |
|
I’m going to update todolite-appserver to use vendored dependencies for some of it’s dependencies, just to see how things go.
1
|
|
I’m going to vendor the dependency on kingpin since it has transitive dependencies of it’s own (github.com/alecthomas/units, etc). gvt
handles this by automatically pulling all of the transitive dependencies.
1
|
|
Now my directory structure looks like this:
1 2 3 4 5 6 7 |
|
Here is the manifest
gvt list
shows the following:
1 2 3 4 5 |
|
I opened up the vendor/github.com/alecthomas/kingpin/global.go
and made the following change:
1 2 3 4 5 |
|
Now verify that code is getting compiled and run:
1 2 3 |
|
(note: export GO15VENDOREXPERIMENT=1
is still in effect in my shell)
Before I check in the vendor
directory to git, I want to reset it to it’s previous state before I made the above change to the global.go
source file.
1
|
|
Now if I open global.go
again, it’s back to it’s original state. Nice!
1 2 3 |
|
Also, I updated the README to tell users to set the GO15VENDOREXPERIMENT=1
variable:
1 2 3 |
|
but the instructions otherwise remained the same. If someone tries to use this but forgets to set GO15VENDOREXPERIMENT=1
in Go 1.5, it will still work, it will just use the kingpin dependency in the $GOPATH
rather than the vendor/
directory. Ditto for someone using go 1.4 or earlier.
As it turns out, I don’t even need kingpin in this project, since I’m using cobra. The kingpin dependency was caused by some leftover code I forgot to cleanup.
To remove it, I ran:
1 2 3 4 |
|
In this case, since it was my only dependency, it was easy to identify the transitive dependencies. In general though it looks like it’s up to you as a user to track down which ones to remove. I filed gvt issue 16 to hopefully address that.
I have emacs setup using the steps in this blog post, and I’m running into the following annoyances:
godef
to jump into the code of vendored dependency, it takes me to source code that lives in the GOPATH
, which might be different than what’s under vendor/
. Also, if I edit it there, my changes won’t be reflected when I rebuild.M-x rgrep
, but now it’s searching through every repo under vendor/
and returning things I’m not interested in .. since most of the time I only want to search within my project.Currently:
I like the taming-mr-arneson-theme
, so let’s install that one. Feel free to browse the emacs themes and find one that you like more.
1 2 |
|
Update your ~/emacs.d/init.el
to add the following lines to the top of the file:
1 2 |
|
Now when you restart emacs it should look like this:
## Directory Tree
1 2 |
|
Update your ~/emacs.d/init.el
to add the following lines:
1 2 |
|
Open a .go
file and the enter M-x neotree-dir
to show a directory browser:
Ref: NeoTree
]]>That’s beyond the scope of this blog post, but what I ended up doing on my new OSX installation was to:
1
|
|
What’s in ~/Documents/blog/
? Basically, the octopress instance I’d setup as described in Octopress Setup Part I.
From inside the docker container:
1 2 |
|
On OSX, open up ~/Documents/blog/source/_posts/path-to-post
and make some minor edits
1 2 3 |
|
Attempt 1
1 2 3 4 5 |
|
I have no idea why this is happening, but I just conceded defeat against these ruby weirdisms, wished I was using Go (and thought about converting my blog to Hugo), and took their advice and prefixed every command thereafter with bundle exec
.
Attempt 2
1 2 3 |
|
Success!
]]>mkdir -p volumes/uniqush
wget https://git.io/vgSYM -O volumes/uniqush/uniqush-push.conf
Security note: the above config has Uniqush listening on all interfaces, but depending on your setup you probably want to change that to localhost
or something more restrictive.
Copy and paste this content into docker-compose.yml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
1
|
|
Run this curl
command outside of the docker container to verify that Uniqush is responding to HTTP requests:
1 2 |
|
In my case, I already had an app id for my app (com.couchbase.todolite
), but push notifications are not enabled, so I needed to enable them:
Create a new push cert:
Choose the correct app id:
Generate CSR according to instructions in keychain:
This will save a CSR on your file system, and the next wizard step will ask you to upload this CSSR and generate the certificate. Now you can download it:
Double click the downloaded cert and it will be added to your keychain.
This is where I got a bit confused, since I had to also download the cert from the app id section — go to the app id and hit “Edit”, then download the cert and double click it to add to your keychain. (I’m confused because I thought these were the same certs and this second step felt redundant)
Go to the Provisioning Profiles / Development section and hit the “+” button:
Choose all certs and all devices, and then give your provisioning profile an easy to remember name.
Download this provisioning profile and double click it to install it.
In xcode under Build Settings, choose this provisioning profile:
Add the following code to your didFinishLaunchingWithOptions:
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
And the following callback methods which will be called if remote notification is successful:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
and this callback which will be called if it’s not unsuccessful:
1 2 3 4 5 |
|
If you now run this app on a simulator, you can expect an error like Error registering device token. Push notifications will not workError
.
Run the app on a device you should see a popup dialog in the app asking if it’s OK to receive push notifications, and the following log messages in the xcode console:
1 2 |
|
Open keychain, select the login
keychain and the My Certificates
category:
apns-prod-cert.p12
file somewhere you can access it.apns-prod-key.p12
.Now they need to be converted from .p12
to .pem
format.
1 2 3 |
|
1 2 3 4 |
|
Remove the PEM passphrase:
1 2 3 |
|
When you call the Uniqush REST API to add a Push Service Provider, it expects to find the PEM files on it’s local file system. Use the following commands to get these files into the running container in the /tmp
directory:
1 2 3 |
|
1 2 3 4 5 |
|
(Note: I’m using a development cert, but if this was a distribution cert you’d want to use sandbox=false
)
You should get a 200 OK
response with:
1
|
|
Using the cleaned up device token from the previous step 281c87101b029fdb16c8e13439436336116001cebf6519e68edefab523dab1e9
, create a subscriber with the name mytestsubscriber
via:
1 2 3 4 |
|
You should receive a 200 OK
response with:
1
|
|
The moment of truth!
First, you need to either background your app by pressing the home button, or add some code like this so that an alert will be shown if the app is foregrounded.
1 2 3 |
|
You should get a 200 OK
response with:
1 2 |
|
And a push notification on the device!
1
|
|
1 2 |
|
1 2 3 4 5 6 7 |
|
You will get a dialog regarding the menu.lst
file, just choose the default option it gives you.
Do some cleanup:
1
|
|
1
|
|
For an explanation of why this is needed, see Caffe on EC2 Ubuntu 14.04 Cuda 7 and search for this command.
1 2 |
|
1
|
|
You should see:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Make sure kernel module and devices are present:
1 2 3 4 5 6 |
|
Follow these instructions to install CUDA 7.5 on AWS GPU Instance Running Ubuntu 14.04.
1
|
|
1 2 |
|
As the post-install message suggests, enable docker for non-root users:
1
|
|
Verify correct install via:
1
|
|
Mount
1 2 3 |
|
You should see something like this:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Verify: Find all your nvidia devices
1
|
|
You should see:
1 2 3 |
|
1 2 |
|
As reported in the Torch7 Google Group and in Kaixhin/dockerfiles, there is an API version mismatch with the docker container and the host’s version of CUDA.
The workaround is to re-install CUDA 7.5 via:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Running:
1
|
|
Should show info about the GPU driver and not return any errors.
Running this torch command:
1
|
|
Should produce this output:
1 2 3 4 5 |
|
The following should be run inside the docker container:
1 2 3 |
|
Download models
1 2 |
|
First, grab a few images to test with
1 2 3 |
|
Run it:
1
|
|
CuDNN can potentially speed things up.
Install via:
1 2 3 4 5 |
|
Install the torch bindings for cuDNN:
1
|
|
ec2-54-161-201-224.compute-1.amazonaws.com
. The rest of the instructions will refer to this as ssh ec2-user@<instance public ip>
(this should let you in without prompting you for a password. if not, you chose a key when you launched that you don’t have locally)1
|
|
1 2 3 4 5 6 7 8 9 10 |
|
From your workstation:
1
|
|
You should get a response like:
1
|
|
For more advanced Sync Gateway configuration, you will want to create a JSON config file on the EC2 instance itself and pass that to Sync Gateway when you launch it, or host your config JSON on the internet somewhere and pass Sync Gateway the URL to the file.
In order to login to the Couchbase Server UI, go to
<aws instance id, eg: i-8a9f8335>