/ Compiler says no!

Kubernetes and nsswitch.conf in nix containers

Those who do not know about nsswitch.conf are going to have a bad time.

Sometimes ops related issues end up a veritable black hole. Things that should have worked from the get go but do not end up taking time, while providing plenty of frustration instead. Turnaround time increases and sometimes one has to throw in the towel, chalk it up to fate and commit a workaround. Luckily, this is not one of those stories.

Kubernetes and its little cousin k3s, for all the good they offer, often slow down development when running a binary becomes building a container and scheduling a pod. The most frustrating variation is when things break at the very last minute, usually due to suble differences in container runtime environments or builders. One of those issues is DNS failures.

A curious DNS failure

Taking an example from the wild, imagine a container in a pod failing due to DNS issues. After replacing the command in the pod with /bin/sh (it is rarely wrong to include busybox in even the most bare containers) to stop it from crashing

command: ["/bin/sh", "-c", "sleep 3600"]

we can experiment:

$ kubectl exec -it example-pod -- /bin/sh
/ # hostname
example-pod
/ # hostname -i
hostname: example-pod: Unknown host
/ # ping example-pod
ping: bad address 'example-pod'

The pod knows its own hostname, but cannot resolve it — even though it is listed in /etc/hosts! Swapping out the image entirely for busybox

image: busybox:latest

and repeating the experiment works:

/ # hostname
example-pod
/ # hostname -i
10.42.1.33

Going local

At this point, we can stop our experiments on Kubernetes and try with local containers to gain insight faster. One so far untold caveat is that the original image was built using the nix dockerTools, by building a minimal example for experimentation:

{ pkgs ? (import <nixpkgs>) { } }:
pkgs.dockerTools.buildImage {
  name = "nixtest";
  tag = "latest";

  contents = with pkgs; [ busybox strace ];
}
$ nix-build experiment.nix && docker load -i result
Loaded image: nixtest:latest

This image will fail in the same manner as our problematic one above. In contrast a simple debian:stable will work just fine.

Conveniently strace is included in our derivation, so after installing it using apt in the debian container as well, we can compare the output of an strace on hostname. For the broken container, part of the output is:

/ # strace -e file hostname -i
[...]
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=79, ...}) = 0
openat(AT_FDCWD, "/etc/host.conf", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[...]

Comparing with the debian output, we see that an attempt to open /etc/hosts is notably absent. One key difference 1 though is the failing open of nsswitch.conf, whose secrets man nsswitch.conf2 quickly spills:

The Name Service Switch (NSS) configuration file, /etc/nsswitch.conf, is used by the GNU C Library and certain other applications to determine the sources from which to obtain name-service information in a range of categories, and in what order.

[…]

hosts Host names and numbers, used by gethostbyname(3) and related functions.

Solving the DNS issue

Manually adding a minimal nsswitch.conf instantly fixes our problem:

/ # echo 'hosts:     files dns' > /etc/nsswitch.conf
/ # hostname -i
172.17.0.2

We can finalize this in our minimal image for good, although dockerTools unfortunately lacks an uncomplicated way to put arbitrary data onto paths inside the containter in a declarative manner, so we resort to extraCommands:

{ pkgs ? (import <nixpkgs>) { } }:
pkgs.dockerTools.buildImage {
  name = "nixtest";
  tag = "latest";

  # Add `nsswitch.conf` to ensure DNS queries are resolved properly.
  extraCommands = ''
    mkdir -p etc
    echo 'hosts:     files dns' > etc/nsswitch.conf
  '';

  contents = with pkgs; [ busybox ];
}

Conclusion

Containers are in many cases convenient, but the container ecosystem surprises the unsuspecting user quite often. Somewhat obscure features of a Linux system are cast into the limelight, prompting an informative but time-consuming search for answers.

While /etc/hosts is fairly common knowledge, I would wager nsswitch.conf is not; hopefully these notes will change that. As usual, in the end the simple fix in no way indicates how much of a ruckus it caused in the first place.


  1. Not shown here: A painful dive into the glibc sources that yielded no results. ↩︎

  2. I will readily admit that I googled it first. ↩︎