/etc/nsswitch.conf and /etc/hosts woes with the Alpine (and others) Docker image and Golang

[Message from the author: are you, dear reader, looking for someone with Python, C, Go, Linux, DevOps expertise? Look no further! I am always looking for new remote work opportunities and part-time freelancing gigs. If you are interested, contact me: giedriuswork (at) gmail (dot) com]

By default, the alpine Docker image which is mostly used for Golang programs does not contain /etc/nsswitch.conf. Golang’s net package used to do a sane thing when that file did not exist back in the day (before April 30, 2015) i.e. they checked the /etc/hosts file first, and then moved on to trying to query the DNS servers.

However, it was changed later – now Go’s resolver tries to query the local machine as per the manual page of nsswitch.conf(5). You can find the commit here.

Most distributions have a file /etc/nsswitch.conf installed by default which is set to use the files database for name resolution first:

hosts: files dns

Thus, the easiest way to actually fix it is to mount your own local, normalnsswitch.conf inside the Alpine container by passing this extra parameter to docker run like so: docker run -v /etc/nsswitch.conf:/etc/nsswitch.conf.

To further customize the name resolution, configure /etc/nsswitch.conf as per your needs.

Even forcing Go to use the cgo resolver wouldn’t help much in most of the cases because glibc (the most popular libc) follows the same exact steps in case /etc/nsswitch.conf does not exist.

What makes it more painful is that you have to restart your Go programs after adding /etc/nsswitch.confor making a change in it if you use the Go’s internal resolver because it does not watch for changes and it does not automatically reload what it has in memory. I guess that Go, again, follows the principle out-lined in the aforementioned manual:

Within each process that uses nsswitch.conf, the entire file is read only once.  If the file is later changed, the process will continue using the old configuration.

This affects a lot of publicly available container images that are based on Alpine. You could find a example list here.

Some of the other popular images had this issue too. For example, the Prometheus 2 months ago didn’t have that file too in their Docker image quay.io/prometheus/busybox. This was fixed here.

So, in any way, tread the GNU/Linux container world carefully if you are developing a Go program. You might run into some confusing behavior if /etc/nsswitch.conf does not exist. At least add a minimum one to have a proper name resolution.

Go’s http.Transport and 408 response code – what’s the relationship?

[Message from the author: are you, dear reader, looking for someone with Python, C, Go, Linux, DevOps expertise? Look no further! I am always looking for new remote work opportunities and part-time freelancing gigs. If you are interested, contact me: giedriuswork (at) gmail (dot) com]

Intro

Golang’s standard library’s http package provides a type http.Transport which implements some low-level methods for transporting HTTP requests hence the name. It is very useful – usually other libraries want variables of types which implement that interface – however, you might have noticed that using it in combination with haproxy and HTTP keep-alive connections sometimes make these kinds of messages appear in your program:

2018/08/21 11:22:33 Unsolicited response received on idle HTTP channel starting with "HTTP/1.0 408 Request Time-out\r\nCache-Control: no-cache\r\nConnection: close\r\nContent-Type: text/html\r\n\r\n<html><body><h1>408 Request Time-out</h1>\nYour browser didn't send a complete request in time.\n</body></html>\n"; err=<nil>

And in your haproxy logs:

<142>Aug 21 11:22:33 myhost[32584]: 1.2.3.4:44444 [21/Aug/2018:11:22:32.041] http~ http/<NOSRV> -1/-1/-1/-1/10050 408 212 - - cR-- 0/0/0/0/0 0/0 "<BADREQ>"

This blog post goes through the reasons why that happens, if it is harmful, and how you can avert these kinds of things.


If you might not have known, once you start using the net/http package, it disables the embedded albeit rudimentary Go’s dead-lock checker if keep-alive connections are used because it internally spawns new goroutines to keep track of them. Otherwise, that tracking would have to “bubble up” back to the caller. So by disabling it, harmless messages about all locked goroutines are avoided when the main goroutine is blocked as well due to performing some other actions.

To actually keep the HTTP connections alive, a certain liveness check is needed. These things are called HTTP probes. Those probes are periodical sends of bytes on HTTP sockets to see if the other end is “still alive”.

However, it could happen that our client which uses http.Transport still expects the HTTP probe to work if there is a mismatch in the keep-alive timeouts on both sides. On the client side, this is controlled by IdleConnTimeout which is specified in http.Transport.  On the haproxy end, it is controlled by timeout http-keep-alive <timeout> in the configuration file.

The error message mentioned in the first paragraph comes up when that certain HTTP probe is being sent but the server (in our case, haproxy) and at the same time, while the packets are on the wire, the connection has been already closed on the other end since it had timed out. Thus, to fix it, you need to have identical IdleConnTimeout on both ends.

Let’s also analyze the haproxy log messages which are being printed whenever this happens. The capital C letter means that the connection was closed from the haproxy side. Obviously, this had happened because the keep-alive connection reached the timeout value and then it was closed automatically. As you can see, the actual duration of the connection is very close to the actual timeout value – 10 seconds. Of course, it is not a real time system so a margin of error of a few milliseconds is OK.

The same thing is being reported by the error message in our Golang program: our “browser” or, in other words, a client using http.Transport did not send any request so that was sent to a background goroutine that kept track of the keep-alive connection, and because no actual request had been sent, it was marked as an error and that was printed to the console to inform the user what had happened.


How to fix this?

Make sure that the keep-alive connection timeout is lower on the client end i.e. the one that initiates the HTTP connections. On Golang programs this can be done by accordingly modifying the value of IdleConnTimeout in http.Transport so that it would be lower than on the server. Obviously, if you use some kind of other libraries for making (you should really use the stdlib one, though, unless you have some serious issues with it) HTTP requests then modify some kind of other option or field which controls the timeout.

Also, haproxy provides an option to ignore HTTP probes: option http-ignore-probes. However, it seems that it could make haproxy ignore some kind of other, legit errors. So, your mileage may vary. Use it with caution. I recommend the first option, if you can modify the program.


I wrote this since my proposals to include such documentation in minio-go were met with a negative response and I feel like this should be public knowledge so that others would know.