Seven Story Rabbit Hole

Sometimes awesome things happen in deep rabbit holes. Or not.

   images

Tuning the Go HTTP Client Settings for Load Testing

While working on a load testing tool in Go, I ran into a situation where I was seeing tens of thousands of sockets in the TIME_WAIT state.

Here are a few ways to get into this situation and how to fix each one.

Repro #1: Create excessive TIME_WAIT connections by forgetting to read the response body

Run the following code on a linux machine:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
package main

import (
  "fmt"
  "html"
  "log"
  "net"
  "net/http"
  "time"
)

func startWebserver() {

  http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
      fmt.Fprintf(w, "Hello, %q", html.EscapeString(r.URL.Path))
  })

  go http.ListenAndServe(":8080", nil)

}

func startLoadTest() {
  count := 0
  for {
      resp, err := http.Get("http://localhost:8080/")
      if err != nil {
          panic(fmt.Sprintf("Got error: %v", err))
      }
      resp.Body.Close()
      log.Printf("Finished GET request #%v", count)
      count += 1
  }

}

func main() {

  // start a webserver in a goroutine
  startWebserver()

  startLoadTest()

}

and in a separate terminal while the program is running, run:

1
netstat -n | grep -i 8080 | grep -i time_wait | wc -l

and you will see this number constantly growing:

1
2
3
4
5
6
7
8
9
root@14952c2356a7:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
166
root@14952c2356a7:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
231
root@14952c2356a7:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
293
root@14952c2356a7:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
349
... 

Fix: Read Response Body

Update the startLoadTest() method to add the following line of code (and related imports):

1
2
3
4
5
6
7
8
9
10
11
12
func startLoadTest() {
  for {
          ...
      if err != nil {
          panic(fmt.Sprintf("Got error: %v", err))
      }
      io.Copy(ioutil.Discard, resp.Body)  // <-- add this line
      resp.Body.Close()
                ...
  }

}

Now when you re-run it, calling netstat -n | grep -i 8080 | grep -i time_wait | wc -l while it’s running will return 0.

Repro #2: Create excessive TIME_WAIT connections by exceeding connection pool

Another way to end up with excessive connections in the TIME_WAIT state is to consistently exceed the connnection pool and cause many short-lived connections to be opened.

Here’s some code which starts up 100 goroutines which are all trying to make requests concurrently, and each request has a 50 ms delay:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53

package main

import (
  "fmt"
  "html"
  "io"
  "io/ioutil"
  "log"
  "net/http"
  "time"
)

func startWebserver() {

  http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {

      time.Sleep(time.Millisecond * 50)

      fmt.Fprintf(w, "Hello, %q", html.EscapeString(r.URL.Path))
  })

  go http.ListenAndServe(":8080", nil)

}

func startLoadTest() {
  count := 0
  for {
      resp, err := http.Get("http://localhost:8080/")
      if err != nil {
          panic(fmt.Sprintf("Got error: %v", err))
      }
      io.Copy(ioutil.Discard, resp.Body)
      resp.Body.Close()
      log.Printf("Finished GET request #%v", count)
      count += 1
  }

}

func main() {

  // start a webserver in a goroutine
  startWebserver()

  for i := 0; i < 100; i++ {
      go startLoadTest()
  }

  time.Sleep(time.Second * 2400)

}

In another shell run netstat, note that the number of connections in the TIME_WAIT state is growing again, even though the response is being read

1
2
3
4
5
6
7
8
9
root@14952c2356a7:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
166
root@14952c2356a7:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
231
root@14952c2356a7:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
293
root@14952c2356a7:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
349
... 

To understand what’s going on, we’ll need to dig in a little deeper into the TIME_WAIT state.

What is the socket TIME_WAIT state anyway?

So what’s going on here?

What’s happening is that we are creating lots of short lived TCP connections, and the Linux kernel networking stack is keeping tabs on the closed connections to prevent certain problems.

From The TIME-WAIT state in TCP and Its Effect on Busy Servers:

The purpose of TIME-WAIT is to prevent delayed packets from one connection being accepted by a later connection. Concurrent connections are isolated by other mechanisms, primarily by addresses, ports, and sequence numbers[1].

Why so many TIME_WAIT sockets? What about connection re-use?

By default, the Golang HTTP client will do connection pooling. Rather than closing a socket connection after an HTTP request, it will add it to an idle connection pool, and if you try to make another HTTP request before the idle connection timeout (90 seconds by default), then it will re-use that existing connection rather than creating a new one.

This will keep the number of total socket connections low, as long as the pool doesn’t fill up. If the pool is full of established socket connections, then it will just create a new socket connection for the HTTP request and use that.

So how big is the connection pool? A quick look into transport.go tells us:

1
2
3
4
5
6
7
8
9
10
11

var DefaultTransport RoundTripper = &Transport{
        ... 
  MaxIdleConns:          100,
  IdleConnTimeout:       90 * time.Second,
        ... 
}

// DefaultMaxIdleConnsPerHost is the default value of Transport's
// MaxIdleConnsPerHost.
const DefaultMaxIdleConnsPerHost = 2
  • The MaxIdleConns: 100 setting sets the size of the connection pool to 100 connections, but with one major caveat: this is on a per-host basis. See the comments on the DefaultMaxIdleConnsPerHost below for more details on the implications of this.
  • The IdleConnTimeout is set to 90 seconds, meaning that after a connection stays in the pool and is unused for 90 seconds, it will be removed from the pool and closed.
  • The DefaultMaxIdleConnsPerHost = 2 setting below it. What this means is that even though the entire connection pool is set to 100, there is a per-host cap of only 2 connections!

In the above example, there are 100 goroutines trying to concurrently make requests to the same host, but the connection pool can only hold 2 sockets. So in the first “round” of the goroutines finishing their http request, 2 of the sockets will remain open in the pool, while the remaining 98 connections will be closed and end up in the TIME_WAIT state.

Since this is happening in a loop, you will quickly accumulate thousands or tens of thousands of connections in the TIME_WAIT state. Eventually, for that particular host at least, you will run out of ephemeral ports and not be able to open new client connections. For a load testing tool, this is bad news.

Fix: Tuning the http client to increase connection pool size

Here’s how to fix this issue.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import (
     .. 
)

var myClient *http.Client

func startWebserver() {
      ... same code as before

}

func startLoadTest() {
        ... 
  for {
      resp, err := myClient.Get("http://localhost:8080/")  // <-- use a custom client with custom *http.Transport
                ... everything else is the same
  }

}


func main() {

  // Customize the Transport to have larger connection pool
  defaultRoundTripper := http.DefaultTransport
  defaultTransportPointer, ok := defaultRoundTripper.(*http.Transport)
  if !ok {
      panic(fmt.Sprintf("defaultRoundTripper not an *http.Transport"))
  }
  defaultTransport := *defaultTransportPointer // dereference it to get a copy of the struct that the pointer points to
  defaultTransport.MaxIdleConns = 100
  defaultTransport.MaxIdleConnsPerHost = 100

  myClient = &http.Client{Transport: &defaultTransport}

  // start a webserver in a goroutine
  startWebserver()

  for i := 0; i < 100; i++ {
      go startLoadTest()
  }

  time.Sleep(time.Second * 2400)

}

This bumps the total maximum idle connections (connection pool size) and the per-host connection pool size to 100.

Now when you run this and check the netstat output, the number of TIME_WAIT connections stays at 0

1
2
3
4
5
6
7
8
root@bbe9a95545ae:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
0
root@bbe9a95545ae:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
0
root@bbe9a95545ae:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
0
root@bbe9a95545ae:/# netstat -n | grep -i 8080 | grep -i time_wait | wc -l
0

The problem is now fixed!

If you have higher concurrency requirements, you may want to bump this number to something higher than 100.

Install Couchbase Server + Mobile on Docker Cloud

Deploy Couchbase Server and Sync Gateway on Docker Cloud behind a load balancer.

Also available as a screencast

Launch node cluster

Launch a node cluster with the following settings:

  • Provider: AWS
  • Region: us-east-1 (or whatever region makes sense for you)
  • VPC: Auto (if you don’t choose auto, you will need to customize your security group)
  • Type/Size: m3.medium or greater
  • IAM Roles: None

Create Couchbase Server service

Go to Services and hit the Create button:

Click the globe icon and Search Docker Hub for couchbase/server. You should select the couchbase/server image:

Hit the Select button and fill out the following values on the Services Wizard:

  • Service Name: couchbaseserver
  • Containers: 2
  • Deployment strategy: High Availability
  • Autorestart: On failure
  • Network: bridge

In the Ports section: Enable published on each port and set the Node Port to match the Container Port

Hit the Create and Deploy button. After a few minutes, you should see the Couchbase Server vervice running:

Configure Couchbase Server Container 1 + Create Buckets

Go to the Container section and choose couchbaseserver-1.

Copy and paste the domain name (eca0fe88-7fee-446b-b006-99e8cae0dabf.node.dockerapp.io) into your browser, adding 8091 at the end (eca0fe88-7fee-446b-b006-99e8cae0dabf.node.dockerapp.io:8091)

You should now see the Couchbase Server setup screen:

You will need to find the container IP of Couchbase Server in order to configure it. To do that, go to the Terminal section of Containers/couchbaseserver-1, and enter ifconfig.

Look for the ethwe1 interface and make a note of the ip: 10.7.0.2 — you will need it in the next step.

Switch back to the browser on the Couchbase Server setup screen. Leave the Start a new cluster button checked. Enter the 10.7.0.2 ip address (or whatever was returned for your ethwe1 interface) under the Hostname field.

and hit the Next button.

For the rest of the wizard, you can:

  • skip adding the samples
  • skip adding the default bucket
  • uncheck Update Notifications
  • leave Product Registration fields blank
  • check “I agree ..”
  • make sure to write down your password somewhere, otherwise you will be locked out of the web interface

Create a new bucket for your application:

Configure Couchbase Server Container 2

Go to the Container section and choose couchbaseserver-2.

As in the previous step, copy and paste the domain name (4d8c7be0-3f47-471b-85df-d2471336af75.node.dockerapp.io) into your browser, adding 8091 at the end (4d8c7be0-3f47-471b-85df-d2471336af75.node.dockerapp.io:8091)

Hit Setup and choose Join a cluster now with settings:

  • IP Address: 10.7.0.2 (the IP address you setup the first Couchbase Server node with)
  • Username: Administrator (unless you used a different username in the previous step)
  • Password: enter the password you used in the previous step
  • Configure Server Hostname: 10.7.0.3 (you can double check this by going to the Terminal for Containers/couchbaseserver-2 and running ifconfig and looking for the ip of the ethwe1 interface)

Trigger a rebalance by hitting the Rebalance button:

Sync Gateway Service

Now create a Sync Gateway service.

Before going through the steps in the Docker Cloud web UI, you will need to have a Sync Gateway configuration somewhere on the publicly accessible internet.

Warning: This is not a secure solution! Do not use any sensitive passwords if you follow these steps

To make it more secure, you could:

  • Use a Volume mount and have Sync Gateway read the configuration from the container filesystem
  • Use a HTTPS + Basic Auth for the URL that hosts the Sync Gateway configuration

Create a Sync Gateway configuration on a github gist and get the raw url for the gist.

  • Make sure to set the server value to http://couchbaseserver:8091 so that it can connect to the Couchbase Service setup in a previous step.
  • Use the bucket created in the Couchbase Server setup step above

In the Docker Cloud web UI, go to Services and hit the Create button again.

Click the globe icon and Search Docker Hub for couchbase/sync-gateway. You should select the couchbase/sync-gateway image.

Hit the Select button and fill out the following values on the Services Wizard:

  • Service Name: sync-gateway
  • Containers: 2
  • Deployment strategy: High Availability
  • Autorestart: On failure
  • Network: bridge

In the Container Configuration section, customize the Run Command to use the raw URL of your gist, eg: https://gist.githubusercontent.com/tleyden/f260b2d9b2ef828fadfad462f0014aed/raw/8f544be6b265c0b57848

In the Ports section, use the following values:

In the Links section, choose couchbaseserver and hit the Plus button

Click the Create and Deploy button.

Verify Sync Gateway

Click the Containers section and you should have two Couchbase Server and two Sync Gateway containers running.

Click the sync-gateway-1 container and get the domain name (eca0fe88-7fee-446b-b006-99e8cae0dabf.node.dockerapp.io) and paste it in your browser with a trailing :4984, eg eca0fe88-7fee-446b-b006-99e8cae0dabf.node.dockerapp.io:4984

You should see the following JSON response:

1
2
3
4
5
6
7
8
{
   "couchdb":"Welcome",
   "vendor":{
      "name":"Couchbase Sync Gateway",
      "version":1.3
   },
   "version":"Couchbase Sync Gateway/1.3.1(16;f18e833)"
}

Setup Load Balancer

Click the Services section and hit the Create button. In the bottom right hand corner look for Proxies and choose dockercloud/haproxy

General Settings:

  • Service Name: sgloadbalancer
  • Containers: 1
  • Deployment Strategy: High Availability
  • Autorestart: Always
  • Network: Bridge

Ports:

  • Port 80 should be Published and the Node Port should be set to 80

Links:

  • Choose sync-gateway and hit the Plus button

Hit the Create and Deploy button

Verify Load Balancer

Click the Containers section and choose sgloadbalancer-1.

Copy and paste the domain name (eg, eca0fe88-7fee-446b-b006-99e8cae0dabf.node.dockerapp.io) into your browser.

You should see the following JSON response:

1
2
3
4
5
6
7
8
{
   "couchdb":"Welcome",
   "vendor":{
      "name":"Couchbase Sync Gateway",
      "version":1.3
   },
   "version":"Couchbase Sync Gateway/1.3.1(16;f18e833)"
}

Congratulations! You have just setup a Couchbase Server + Sync Gateway cluster on Docker Cloud.

Deep Dive of What Happens Under the Hood When You Open a Web Page

This is a continuation of What Happens Under The Hood When You Open A Web Page, and it’s meant to be a deeper dive.

Clients and Servers

Remember back in the day when you wanted to know what time it was, and you picked up your phone and dialed 853-1212 and it said “At the tone, the time will be 8:53 AM?”.

Those days are over, but the idea lives on. The time service is identical in principal to an internet server. You ask it something, and it gives you an answer.

A well designed service does one thing, and one thing well.

  • With the time service, you can only ask one kind of question: “What time is it?”

  • With a DNS server, you can only ask one kind of question: “What is the IP address of organic-juice-for-dogs.io”

Clients vs Servers:

  • A “Client” can essentially be thought of as being a “Customer”. In the case of calling the time, it’s the person dialing the phone number. In the case of DNS, it’s the Google Chrome browser asking for the IP address.

  • A “Server” can be thought of as being a “Service”. In the case of calling the time, it’s something running at the phone company. In the case of DNS, it’s a service run by a combination of universities, business, and governments.

Web Browsers

The following programs are all web browsers, which are all technically HTTP Clients, meaning they are on the client end of the HTTP tube.

  • Google Chrome
  • Safari
  • Firefox
  • Internet Explorer
  • Etc..

What web browsers do:

  • Lookup IP addresses from DNS servers over the DNS protocol (which in turn sits on top of the UDP protocol)
  • Retrieve web pages, images, and more from web servers over the HTTP protocol (which in turn sits on top of the TCP protocol)
  • Render HTML into formatted “pages”
  • Executes JavaScript code to add a level of dynamic behavior to web pages

Protocols

In the previous post, there were a few “protocols” mentioned, like HTTP.

What are protocols really?

Any protocol is something to make it possible for things that speak the same protocol to speak to each other over that protocol.

A protocol is just a language, and just like everyone in English-speaking countries agree to speak English and can therefore intercommunicate without issues, many things on the internet agree to speak HTTP to each other.

Here’s what a conversation looks like in the HTTP protocol:

1
2
HTTP Client: GET /
HTTP Server: <html>I'm a <blink>amazing</blink> HTML web page!!</html>

Almost everything that happens on the Internet looks something like this:

1
2
3
4
5
6
7
8
9
10
11
                                                                                  
 ┌────────────────────┐                                         ┌────────────────────┐
 │                    │                                         │                    │
 │                    │                                         │                    │
 │                    │                                         │                    │
 │     Internet       ◀──────────────Protocol───────────────────▶    Internet        │
 │     Thing 1        │                                         │    Thing 2         │
 │                    │                                         │                    │
 │                    │                                         │                    │
 │                    │                                         │                    │
 └────────────────────┘                                         └────────────────────┘

Let’s look at a few protocols.

TCP and UDP

You can think of the internet as being made up of tubes. Two very common types of tubes are:

  • TCP (Transmission Control Protocol)
  • UDP (User Datagram Protocol)

Here’s what you might imagine an internet tube looking like:

image

IP

Really, you can think of TCP and UDP as internet tubes that are built from the same kind of concrete — and that concrete is called IP (Internet Protocol)

TCP wraps IP, in the sense that it is built on top of IP. If you took a slice of a TCP internet tube, it would look like this:

1
2
3
4
5
6
7
8
9
10
11
12
 ┌───────────────────────────────────────────┐
 │   TCP - (Transmission Control Protocol)   │
 │                                           │
 │                                           │
 │       ┌──────────────────────────┐        │
 │       │ IP - (Internet Protocol) │        │
 │       │                          │        │
 │       │                          │        │
 │       │                          │        │
 │       └──────────────────────────┘        │
 │                                           │
 └───────────────────────────────────────────┘

Ditto for UDP — it’s also built on top of IP. The slice of a UDP internet tube would look like this:

1
2
3
4
5
6
7
8
9
10
11
12
 ┌───────────────────────────────────────────┐
 │    UDP - (Universal Datagram Protocol)    │
 │                                           │
 │                                           │
 │       ┌──────────────────────────┐        │
 │       │ IP - (Internet Protocol) │        │
 │       │                          │        │
 │       │                          │        │
 │       │                          │        │
 │       └──────────────────────────┘        │
 │                                           │
 └───────────────────────────────────────────┘

IP, or “Internet Protocol”, is fancy way of saying “How machines on the Internet talk to each other”, and IP addresses are their equivalent of phone numbers.

Why do we need two types of tubes built on top of IP? They have different properties:

  • TCP tubes are heavy weight, they take a long time to build, and a long time to tear down, but they are super reliable.
  • UDP tubes are light weight, and have no guarantees. They’re like the ¯\_(ツ)_/¯ of internet tubes. If you send something down a UDP internet tube, you actually have no idea whether it will make it down the tube or not. It might seem useless, but it’s not. Pretty much all real time gaming, voice, and video transmissions go through UDP tubes.

HTTP tubes

If you take a slice of an HTTP tube, it looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
┌───────────────────────────────────────────────────────────┐
│           HTTP - (HyperText Transfer Protocol)            │
│                                                           │
│       ┌───────────────────────────────────────────┐       │
│       │   TCP - (Transmission Control Protocol)   │       │
│       │                                           │       │
│       │        ┌──────────────────────────┐       │       │
│       │        │ IP - (Internet Protocol) │       │       │
│       │        │                          │       │       │
│       │        └──────────────────────────┘       │       │
│       │                                           │       │
│       └───────────────────────────────────────────┘       │
│                                                           │
└───────────────────────────────────────────────────────────┘

Because HTTP sits on top of TCP, which in turn sits on top of IP.

DNS tubes

DNS tubes are very similar to HTTP tubes, except they sit on top of UDP tubes. Here’s what a slice might look like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
┌───────────────────────────────────────────────────────────┐
│                DNS - (Domain Name Service)                │
│                                                           │
│       ┌───────────────────────────────────────────┐       │
│       │    UDP - (Universal Datagram Protocol)    │       │
│       │                                           │       │
│       │        ┌──────────────────────────┐       │       │
│       │        │ IP - (Internet Protocol) │       │       │
│       │        │                          │       │       │
│       │        └──────────────────────────┘       │       │
│       │                                           │       │
│       └───────────────────────────────────────────┘       │
│                                                           │
└───────────────────────────────────────────────────────────┘

Actually, internet tubes are more complicated

So when your Google Chrome web browser gets a web page over an HTTP tube, it actually looks more like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
                                             
          ┌────────────────────┐             
          │                    │             
          │       Chrome       │             
          │       Browser      │             
          │                    │             
          └─────────┬────▲─────┘             
                    │    │                   
                    │    │                   
          ┌─────────▼────┴─────┐             
          │                    │             
          │   Some random      │             
          │  computer in WA    │             
          │                    │             
          └─────────┬─────▲────┘             
          ┌─────────▼─────┴────┐             
          │                    │             
          │   Some random      │             
          │  computer in IL    │             
          │                    │             
          └────────┬───▲───────┘             
          ┌────────▼───┴───────┐             
          │                    │             
          │   Some random      │             
          │  computer in MA    │             
          │                    │             
          └──────────┬───▲─────┘             
                     │   │                   
                     │   │                   
                     │   │                   
 Send me the HTML    │   │ <html>stuff</html>
                     │   │                   
                     │   │                   
                     │   │                   
                     │   │                   
          ┌──────────▼───┴─────┐             
          │                    │             
          │    HTTP Server     │             
          │                    │             
          └────────────────────┘

Each of these random computers in between are called routers, and they basically shuttle traffic across the internet. They make it possible that any two computers on the internet can communicate with each other, without having a direct connection.

If you’re curious to know which computers are in the middle of your connection between you and another computer on the internet, you can run a nifty little utility called traceroute:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ traceroute google.com
traceroute to google.com (172.217.5.110), 64 hops max, 52 byte packets
 1  dd-wrt (192.168.11.1)  1.605 ms  1.049 ms  0.953 ms
 2  96.120.90.157 (96.120.90.157)  9.334 ms  8.796 ms  8.850 ms
 3  te-0-7-0-18-sur03.oakland.ca.sfba.comcast.net (68.87.227.209)  9.744 ms  9.416 ms  9.120 ms
 4  162.151.78.93 (162.151.78.93)  12.310 ms  11.559 ms  11.662 ms
 5  be-33651-cr01.sunnyvale.ca.ibone.comcast.net (68.86.90.93)  11.276 ms  11.187 ms  12.426 ms
 6  hu-0-13-0-1-pe02.529bryant.ca.ibone.comcast.net (68.86.84.14)  11.624 ms
    hu-0-12-0-1-pe02.529bryant.ca.ibone.comcast.net (68.86.87.14)  11.637 ms
    hu-0-13-0-0-pe02.529bryant.ca.ibone.comcast.net (68.86.86.94)  12.404 ms
 7  as15169-3-c.529bryant.ca.ibone.comcast.net (23.30.206.102)  11.024 ms  11.498 ms  11.148 ms
 8  108.170.243.1 (108.170.243.1)  11.037 ms
    108.170.242.225 (108.170.242.225)  12.246 ms
    108.170.243.1 (108.170.243.1)  11.482 ms

So from my computer to the computer at google.com, it goes through all of those intermediate computers. Some have DNS names, like be-33651-cr01.sunnyvale.ca.ibone.comcast.net, but some only have IP addresses, like 162.151.78.93

Any one of those computers could sniff the traffic going through the tubes (even the IP tubes that all the other ones sit on top of!). That’s one of the reasons you don’t want to send your credit cards over the internet without using encryption.

The End

What Happens Under the Hood When You Open a Web Page?

First, the bird’s eye view:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
                                                                        
┌────┐                   ┌────────────────┐               ┌────────────────┐
│You │                   │ Google Chrome  │               │    Internet    │
└────┘                   └────────────────┘               └────────────────┘
 │                               │                                  │   
 │    Show me the website for    │                                  │   
 │───organic-juice-for-dogs.io──▶│       1. Hey what's the IP of    │   
 │                               │─────organic-juice-for-dogs.io?──▶│   
 │                               │                                  │   
 │                               │                                  │   
 │                               │◀───────────63.120.10.5───────────│   
 │                               │                                  │   
 │                               │                                  │   
 │                               │        2. HTTP GET / to          │   
 │                               │───────────63.120.10.5───────────▶│   
 │                               │                                  │   
 │                               │                                  │   
 │                               │     HTML Content for homepage    │   
 │                               │◀───────────────of ───────────────│   
 │                               │     organic-juice-for-dogs.io    │   
 │                               │                                  │   
 │                               │                                  │   
 │         3. Render HTML into   │                                  │   
 │◀────────────a Web Page────────│                                  │   
 │                               │                                  │   
 │                               │                                  │   
 │      Click stuff in Google    │                                  │   
 │─────────────Chrome───────────▶│                                  │   
 │                               │                                  │   
 │                               │                                  │   
 │         4. Execute JavaScript │                                  │   
 │◀─────────and update Web Page──┤                                  │   
 │                               │                                  │   
 ▼                               ▼                                  ▼

It all starts with a DNS lookup.

Step 1. The DNS Lookup

Your Google Chrome software contacts a server on the Internet called a DNS server and asks it “Hey what’s the IP of organic-juice-for-dogs.io?”.

DNS has an official sounding acronym, and for good reason, because it’s a very authoritative and fundamental Internet service.

So what exactly is DNS useful for?

It transforms Domain names into IP addresses

1
2
3
4
5
6
7
8
9
10
11
12
                                                                               
 ┌────────────────────┐                                     ┌────────────────────┐
 │                    │      What's the IP address of       │                    │
 │                    │─────organic-juice-for-dogs.io?──────▶                    │
 │                    │                                     │                    │
 │       Chrome       │                                     │      DNS Server    │
 │       Browser      ◀───────────63.120.10.5───────────────│                    │
 │                    │                                     │                    │
 │                    │                                     │                    │
 │                    │                                     │                    │
 └────────────────────┘                                     └────────────────────┘
 

A Domain name, also referred to as a “Dot com name”, is an easy-to-remember word or group of words, so people don’t have to memorize a list of meaningless numbers. You could think of it like dialing 1-800-FLOWERS, which is a lot easier to remember than 1-800-901-1111

The IP address 63.120.10.5 is just like a phone number. If you are a human being and want to call someone, you might dial 415-555-1212. But if you’re a thing on the internet and you want to talk to another thing on the internet, you instead dial the IP address 63.120.10.5 — same concept though.

So, that’s DNS in a nutshell. Not very complicated on the surface.

Step 2. Contact the IP address and fetch the HTML over HTTP

In this step, Google Chrome sends an HTTP GET / HTTP request to the HTTP Server software running on a computer somewhere on the Internet that has the IP address 63.120.10.5.

You can think of the GET / as “Get me the top-most web page from the website”. This is known as the root of the website, in contrast to things deeper into the website, like GET /juices/oakland, which might return a list of dog juice products local to Oakland, CA. Since the root is a the top, that means the tree is actually upside down, and folks tend to think of websites as being structured as inverted trees.

The back-and-forth is going to look something like this:

1
2
3
4
5
6
7
8
9
10
11
12

 ┌────────────────────┐                                         ┌────────────────────┐
 │                    │          What's the HTML for            │                    │
 │                    ├──────────http://63.120.10.5/?───────────▶                    │
 │                    │                                         │                    │
 │       Chrome       │                                         │    HTTP Server     │
 │       Browser      ◀──────────────<html>stuff</html>─────────│                    │
 │                    │                                         │                    │
 │    HTTP CLIENT     │                                         │                    │
 │                    │                                         │                    │
 └────────────────────┘                                         └────────────────────┘
 

These things are speaking HTTP to each other. What is HTTP?

You can think of things that communicate with each other over the internet as using tubes. There are lots of different types of tubes, and in this case it’s an HTTP tube. As long as the software on both ends agree on the type of tube they’re using, everything just works and they can send stuff back and forth. HTTP is a really common type of tube, but it’s not the only one — for example the DNS lookup in the previous step used a completely different type of tube.

Usually the stuff sent back from the HTTP Server is something called HTML, which stands for HyperText Markup Language.

But HTML is not the only kind of stuff that can be sent through an HTTP tube. In fact, JSON (Javascript Object Notation) and XML (eXtensible Markup Language) are also very common. In fact there are tons of different types of things that can be sent through HTTP tubes.

So at this point in our walk through, the Google Chrome web browser software has some HTML text, and it needs to render it in order for it to appear on your screen in a nice easy to view format. That’s the next step.

Step 3. Render HTML in a Web page

HTML is technically a markup language, which means that the text contains formatting directives which has an agreed upon standard on how it should be formatted. You can think of HTML as being similar to a Microsoft Word document, but MS Word is obfuscated while HTML is very transparent and simple:

For example, here is some HTML:

1
2
3
4
5
6
7
<html>
   <Header>My first web page, circa, 1993!</Header>
   <Paragraph>
        I am so proud to have made my very first web page, I <blink>Love</blink> the World Wide Web
   <Paragraph>
   <Footer>Best Viewed on NCSA Mosaic</Footer>
</html>

Which gets rendered into:

image

So, you’ll notice that the <Header> element is in a larger font. And the <Paragraph> has spaces in between it and the other text.

How does the Google Chrome Web Browser do the rendering? It’s just a piece of software, and rendering HTML is one of it’s primary responsibilities. There are tons of poor engineers at Google who do nothing all day but fix bugs in the Google Chrome rendering code.

Of course, there’s a lot more to it, but that’s the essence of rendering HTML into a web page.

Step 4: Execute JavaScript in your Google Chrome Web Browser

So this step is optional because not all web pages will execute JavaScript in your web browser software, however it’s getting more and more common these days. When you open the Gmail website in your browser, it’s running tons of Javascript code to make the website as fast and responsive as possible.

Essentially, JavaScript adds another level of dynamic abilities to HTML, because when the browser is given HTML and it renders it .. that’s it! There’s no more action, it just sits there — it’s completely inert.

JavaScript, on the other hand, is basically a program-within-a-program.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
                                                              
 ┌───────────────────────────────────────────────────────────────┐
 │                         Google Chrome                         │
 │           (A program written in C++ you downloaded)           │
 │                                                               │
 │                                                               │
 │      ┌──────────────────────────────────────────────────┐     │
 │      │                                                  │     │
 │      │                                                  │     │
 │      │     JavaScript for organic-juice-for-dogs.io     │     │
 │      │  (A program in JavaScript that snuck in via the  │     │
 │      │                  HTML document)                  │     │
 │      │                                                  │     │
 │      │                                                  │     │
 │      └──────────────────────────────────────────────────┘     │
 │                                                               │
 │                                                               │
 └───────────────────────────────────────────────────────────────┘

How does the JavaScript get to the web browser? It sneaks in over the HTML! It’s embedded in the HTML, since it’s just another form of text, and your Web Browser (Google Chrome) executes it.

1
2
3
4
5
6
7
8
<html>
     <Javascript>
          if (Paragraph == CLICKED) {
              Window.Alert("YOU MAY BE INFECTED BY A VIRUS, CLICK HERE IMMEDIATELY")
    }
     </Javascript>
    ...
</html>

What can JavaScript do exactly? The list is really, really long. But as a simple example, if you click a button on a webpage:

html button

A JavasScript program can pop up a little “Alert Box”, like this:

javascript alert

Done!

And that’s the World Wide Web! You just went from typing a URL in your browser, from a shiny web page in your Google Chrome. Soup to nuts.

And you can finally buy some juice for your dog!

dogecoin dog

So that’s it for the high level stuff.

If you’re dying to know more, continue on to Deep Dive of What Happens Under The Hood When You Open A Web Page

Configuring InfluxDB and Grafana With Go Client Library

Create a beautiful Grafana dashboard with realtime performance stats:

screen shot 2016-09-13 at 3 33 20 pm

Install InfluxDB and Grafana

1
2
3
4
brew install influxdb grafana telegraf
brew services start influxdb
brew services start grafana
brew services start telegraf

Versions at the time of this writing:

  • InfluxDB: 1.0
  • Grafana: 3.1.1

Verify

  • The Grafana Web UI should be available at localhost:3000 — login with admin/admin
  • The InfluxDB Web UI should be available at localhost:8083

Create database on influx

Create db named “db”

1
2
$ influx
> create database db

Edit telegraf conf

Open /usr/local/etc/telegraf.conf in your favorite text editor and uncomment the entire statsd server section:

1
2
3
4
5
6
# Statsd Server
[[inputs.statsd]]
  ## Address and port to host UDP listener on
  service_address = ":8125"

  .. etc .. 

Set the database to use the “db” database created earlier, under the outputs.influxdb section of the telegraf config

1
2
3
4
5
6
7
8
[[outputs.influxdb]]
  ## The full HTTP or UDP endpoint URL for your InfluxDB instance.
  ## Multiple urls can be specified as part of the same cluster,
  ## this means that only ONE of the urls will be written to each interval.
  # urls = ["udp://localhost:8089"] # UDP endpoint example
  urls = ["http://localhost:8086"] # required
  ## The target database for metrics (telegraf will create it if not exists).
  database = "db" # required

Restart telegraf

1
brew services restart telegraf

Create Grafana Data Source

  • Open the Grafana Web UI in your browsers (login with admin/admin)
  • Use the following values:

screen shot 2016-09-13 at 3 39 43 pm

Create Grafana Dashboard

  • Go to Dashboards / + New
  • Click the green thing on the left, and choose Add Panel / Graph

screen shot 2016-09-13 at 3 43 50 pm

  • Delete the test metric, which is not needed, by clicking the trash can to the right of “Test Metric”

screen shot 2016-09-13 at 3 45 04 pm

  • Under Panel / Datasource, choose db, and then hit + Add Query, you will end up with this

screen shot 2016-09-13 at 3 47 21 pm

Push sample data point from command line

In order for the field we want to show up on the grafana dashboard, we need to push some data points to the telegraf statds daemon.

Run this in a shell to push the foo:1|c data point, which is a counter with value increasing by 1 on the key named “foo”.

1
while true; do echo "foo:1|c" | nc -u -w0 127.0.0.1 8125; sleep 1; echo "pushed data point"; done

Create Grafana Dashboard, Part 2

  • Under select measurement, choose foo from the pulldown
  • On the top right of the screen near the clock icon, choose “Last 5 minutes” and set Refreshing every to 5 seconds
  • You should see your data point counter being increased!

screen shot 2016-09-13 at 3 51 16 pm

Add Go client library and push data points

Here’s how to update to your golang application to push new datapoints.

  • Install the g2s client library via:
1
$ go get github.com/peterbourgon/g2s
  • Here is some sample code to push data points to the statds telegraf process from your go program:
1
2
3
4
5
6
7
8
9
10
statdsClient, err := g2s.Dial("udp", "http://localhost:8125")
if err != nil {
  panic("Couldn't connect to statsd!")
}
req, err := http.NewRequest("GET", "http://waynechain.com/")
resp, err := http.DefaultClient.Do(req)
if err != nil {
  return err
}
s.StatsdClient.Timing(1.0, "open_website", time.Since(startTime))

This will push statsd “timing” data points under the key “open_website”, with the normal sample rate (set to 0.1 to downsample and only take every 10th sample). Run the code in a loop and it will start pushing stats to statsd.

Now, create a new Grafana dashboard with the steps above, but from the select measurement field choose open_website, and under SELECT choose field (mean) instead of field (value).

Go Race Detector Gotcha With Value Receivers

I ran into the following race detector error:

1
2
3
4
5
6
7
8
9
10
11
12
WARNING: DATA RACE
Write by goroutine 44:
  github.com/couchbaselabs/sg-replicate.stateFnActiveFetchCheckpoint()
      /Users/tleyden/Development/gocode/src/github.com/couchbaselabs/sg-replicate/replication_state.go:53 +0xb1d
  github.com/couchbaselabs/sg-replicate.(*Replication).processEvents()
      /Users/tleyden/Development/gocode/src/github.com/couchbaselabs/sg-replicate/synctube.go:120 +0xa3

Previous read by goroutine 27:
  github.com/couchbaselabs/sg-replicate.(*Replication).GetStats()
      <autogenerated>:24 +0xef
  github.com/couchbase/sync_gateway/base.(*Replicator).populateActiveTaskFromReplication()
      /Users/tleyden/Development/gocode/src/github.com/couchbase/sync_gateway/base/replicator.go:241 +0x145

Goroutine 44 was running this code:

1
2
3
func (r *Replication) shutdownEventChannel() {
  r.EventChan = nil
}

and nil’ing out the r.EventChan field.

While goroutine 27 was calling this code on the same *Replication instance:

1
2
3
func (r Replication) GetStats() ReplicationStats {
  return r.Stats
}

It didn’t make sense, because they were accessing different fields of the Replication — one was writing to r.EventChan while the other was reading from r.Stats.

Then I changed the GetStats() method to this:

1
2
3
func (r Replication) GetStats() ReplicationStats {
  return ReplicationStats{}
}

and it still failed!

I started wandering around the Couchbase office looking for help, and got Steve Yen to help me.

He was asking me about using a pointer receiver vs a value receiver here, and then we realized that by using a value reciever it was copying all the fields, and therefore reading all of the fields, including the r.EventChan field that the other goroutine was concurrently writing to! Hence, the data race that was subtly caused by using a value receiver..

The fix was to convert this over to a pointer reciever, and the data race disappeared!

1
2
3
func (r *Replication) GetStats() ReplicationStats {
     return r.Stats
}

Setting Up a Self-hosted drone.io CI Server

Spin up AWS server

  • Ubuntu Server 14.04 LTS (HVM), SSD Volume Type – ami-fce3c696
  • m3.medium
  • 250MB magnetic storage

Install docker

ssh ubuntu@<aws-instance> and install docker

Register github application

Go to github and register a new OAuth application using the following values:

It will give you a Client ID and Client Secret

Create /etc/drone/dronerc config file

On the ubuntu host:

1
2
$ sudo mkdir /etc/drone
$ emacs /etc/drone/dronerc

Configure Remote Driver

Add these values:

1
2
REMOTE_DRIVER=github
REMOTE_CONFIG=https://github.com?client_id=${client_id}&client_secret=${client_secret}

and replace client_id and client_secret with the values returned from github.

Configure Database

Add these values:

1
2
DATABASE_DRIVER=sqlite3
DATABASE_CONFIG=/var/lib/drone/drone.sqlite

Run Docker container

1
2
3
4
5
6
7
8
9
sudo docker run \
  --volume /var/lib/drone:/var/lib/drone \
  --volume /var/run/docker.sock:/var/run/docker.sock \
  --env-file /etc/drone/dronerc \
  --restart=always \
  --publish=80:8000 \
  --detach=true \
  --name=drone \
  drone/drone:0.4

Check the logs via docker logs <container-id> and they should look something like this

Edit AWS security group

With your instance selected, look for the security groups in the instance details:

screenshot

Add a new inbound port with the following settings:

  • Protocol TCP
  • Port Range 80
  • Source 0.0.0.0

It should look like this when you’re done:

screenshot

Verify it’s running

Paste the hostname of your aws instance into your browser (eg, http://ec2-54-163-185-45.compute-1.amazonaws.com), and you should see a page like this:

screenshot

Login

If you click the login button, you should see:

screenshot

And then:

screenshot

Activate a repository

Click one of the repositories you have access to, and you should get an “activate now” option:

screenshot

which will take you to your project home screen:

screenshot

Add a .drone.yml file to the root of the repository

In the repository you have chosen (in my case I’m using tleyden/sync_gateway, which is a golang project, and may refer to it later), add a .drone.yml file to the root of the repository with:

1
2
3
4
5
6
build:
  image: golang
  commands:
    - go get
    - go build
    - go test

Commit your change, but do not push to github yet, that will be in the next step.

1
2
$ git add .drone.yml
$ git commit -m "Add drone.yml"

Kickoff a build

Now push your change up to github.

1
$ git push origin master

and in your drone UI you should see a build in progress:

screenshot

when it finishes, you’ll see either a pass or a failure. If you get a failure (which I did), it will look like this:

screenshot

Manually triggering another build

In my case, the above failure was due to a dependency not building. Since nothing else needs to be pushed to the repo to fix the build, I’m just going to manually trigger a build.

On the build failure screen above, there is a Restart button, which triggers a new build.

screenshot

Now it works!

Setup the Drone CLI

I could run this on my OSX workstation, but I decided to run this on a linux docker container. The rest of the steps assume you have spun up and are inside a linux docker container.

1
2
$ curl http://downloads.drone.io/drone-cli/drone_linux_amd64.tar.gz | tar zx
$ install -t /usr/local/bin drone

Go to your Profile page in the drone UI, and click Show Token.

Now set these environment variables

1
2
$ export DRONE_SERVER=http://ec2-54-163-185-45.compute-1.amazonaws.com
$ export DRONE_TOKEN=eyJhbGci...

Query repos

To test the CLI tool works, try the following commands:

1
2
3
4
5
# drone repo ls
couchbase/sync_gateway
tleyden/sync_gateway
# drone repo info tleyden/sync_gateway
tleyden/sync_gateway

Adding Vendoring to a Go Project

Install gvt

After doing some research, I decided to try gvt since it seemed simple and well documented, and integrated well with exiting tools like go get.

1
2
$ export GO15VENDOREXPERIMENT=1
$ go get -u github.com/FiloSottile/gvt

Go get target project to be updated

I’m going to update todolite-appserver to use vendored dependencies for some of it’s dependencies, just to see how things go.

1
$ go get -u github.com/tleyden/todolite-appserver

Vendor dependencies

I’m going to vendor the dependency on kingpin since it has transitive dependencies of it’s own (github.com/alecthomas/units, etc). gvt handles this by automatically pulling all of the transitive dependencies.

1
$ gvt fetch github.com/alecthomas/kingpin

Now my directory structure looks like this:

1
2
3
4
5
6
7
├── main.go
└── vendor
    ├── github.com
    │   └── alecthomas
    ├── gopkg.in
    │   └── alecthomas
    └── manifest

Here is the manifest

gvt list shows the following:

1
2
3
4
5
$  gvt list
github.com/alecthomas/kingpin  https://github.com/alecthomas/kingpin  master 46aba6af542541c54c5b7a71a9dfe8f2ab95b93a
github.com/alecthomas/template https://github.com/alecthomas/template master 14fd436dd20c3cc65242a9f396b61bfc8a3926fc
github.com/alecthomas/units    https://github.com/alecthomas/units    master 2efee857e7cfd4f3d0138cc3cbb1b4966962b93a
gopkg.in/alecthomas/kingpin.v2 https://gopkg.in/alecthomas/kingpin.v2 master 24b74030480f0aa98802b51ff4622a7eb09dfddd

Verify it’s using the vendor folder

I opened up the vendor/github.com/alecthomas/kingpin/global.go and made the following change:

1
2
3
4
5
// Errorf prints an error message to stderr.
func Errorf(format string, args ...interface{}) {
  fmt.Println("CALLED IT!!")
  CommandLine.Errorf(format, args...)
}

Now verify that code is getting compiled and run:

1
2
3
$ go run main.go changesfollower
CALLED IT!!
main: error: URL is empty

(note: export GO15VENDOREXPERIMENT=1 is still in effect in my shell)

Restore the dependency

Before I check in the vendor directory to git, I want to reset it to it’s previous state before I made the above change to the global.go source file.

1
$ gvt restore

Now if I open global.go again, it’s back to it’s original state. Nice!

Add the vendor folder and push

1
2
3
$ git add vendor
$ git commit -m "..."
$ git push origin master

Also, I updated the README to tell users to set the GO15VENDOREXPERIMENT=1 variable:

1
2
3
$ export GO15VENDOREXPERIMENT=1
$ go get -u github.com/tleyden/todolite-appserver
$ todolite-appserver --help

but the instructions otherwise remained the same. If someone tries to use this but forgets to set GO15VENDOREXPERIMENT=1 in Go 1.5, it will still work, it will just use the kingpin dependency in the $GOPATH rather than the vendor/ directory. Ditto for someone using go 1.4 or earlier.

Removing a vendored dependency

As it turns out, I don’t even need kingpin in this project, since I’m using cobra. The kingpin dependency was caused by some leftover code I forgot to cleanup.

To remove it, I ran:

1
2
3
4
$ gvt delete github.com/alecthomas/kingpin
$ gvt delete github.com/alecthomas/template
$ gvt delete github.com/alecthomas/units
$ gvt delete gopkg.in/alecthomas/kingpin.v2

In this case, since it was my only dependency, it was easy to identify the transitive dependencies. In general though it looks like it’s up to you as a user to track down which ones to remove. I filed gvt issue 16 to hopefully address that.

Editor annoyances

I have emacs setup using the steps in this blog post, and I’m running into the following annoyances:

  • When I use godef to jump into the code of vendored dependency, it takes me to source code that lives in the GOPATH, which might be different than what’s under vendor/. Also, if I edit it there, my changes won’t be reflected when I rebuild.
  • I usually search for things in the project via M-x rgrep, but now it’s searching through every repo under vendor/ and returning things I’m not interested in .. since most of the time I only want to search within my project.

Configure Emacs as a Go Editor From Scratch Part 3

This is a continuation from a previous blog post. In this post I’m going to focus on making emacs look a bit better.

Currently:

screenshot

Install a nicer theme

I like the taming-mr-arneson-theme, so let’s install that one. Feel free to browse the emacs themes and find one that you like more.

1
2
$ `mkdir ~/.emacs.d/color-themes`
$ `wget https://raw.githubusercontent.com/emacs-jp/replace-colorthemes/d23b086141019c76ea81881bda00fb385f795048/taming-mr-arneson-theme.el`

Update your ~/emacs.d/init.el to add the following lines to the top of the file:

1
2
(add-to-list 'custom-theme-load-path "/Users/tleyden/.emacs.d/color-themes/")
(load-theme 'taming-mr-arneson t)

Now when you restart emacs it should look like this:

screenshot

## Directory Tree

1
2
$ cd ~/DevLibraries
$ git clone https://github.com/jaypei/emacs-neotree.git neotree

Update your ~/emacs.d/init.el to add the following lines:

1
2
(add-to-list 'load-path "/some/path/neotree")
(require 'neotree)

Open a .go file and the enter M-x neotree-dir to show a directory browser:

screnshot

Ref: NeoTree

Octopress Under Docker

I’m setting up a clean install of El Capitan, and want to get my Octopress blog going. However, I don’t want to install it directly on my OSX workstation — I want to have it contained in a docker container.

Install Docker

That’s beyond the scope of this blog post, but what I ended up doing on my new OSX installation was to:

Run tleyden5iwx/octopress

1
$ docker run -itd -v ~/Documents/blog/:/blog tleyden5iwx/octopress /bin/bash

What’s in ~/Documents/blog/? Basically, the octopress instance I’d setup as described in Octopress Setup Part I.

Bundle install

From inside the docker container:

1
2
# cd /blog/octopress
# bundle install

Edit a blog post

On OSX, open up ~/Documents/blog/source/_posts/path-to-post and make some minor edits

Push source

1
2
3
# git push origin source
Username for 'https://github.com': [enter your username]
Password for 'https://username@github.com': [enter your password]

Generate and push to master

Attempt 1

1
2
3
4
5
# rake generate
rake aborted!
Gem::LoadError: You have already activated rake 10.4.2, but your Gemfile requires rake 0.9.6. Using bundle exec may solve this.
/blog/octopress/Rakefile:2:in `<top (required)>'
(See full trace by running task with --trace) 

I have no idea why this is happening, but I just conceded defeat against these ruby weirdisms, wished I was using Go (and thought about converting my blog to Hugo), and took their advice and prefixed every command thereafter with bundle exec.

Attempt 2

1
2
3
# bundle exec rake generate && bundle exec rake deploy
Username for 'https://github.com': [enter your username]
Password for 'https://username@github.com': [enter your password]

Success!