Docker, Kubernetes and the Walking Dead

Yesterday, I was rewriting the screenshot utility that is used across MailerLite and MailerSend. I wrote it several years ago using Express.js and Puppeteer. It was a quick hack as we needed a microservice that would return a PNG or PDF of a page via API. We noticed some problems with microservice as we have grown, such as memory leaks and occasional unresponsiveness. We also wanted to rewrite it in Golang so that it could make use of goroutines and concurrency. Yesterday, while I was rewriting it in Go and using the exceptional library go-rod, I noticed that we still had occasional zombie Chromium processes.

Then it hit me - we’re running Docker and Kubernetes, and the containers in Docker don’t have a process management system in place. This means that if a process in a container spawns child processes, they exist without waiting for termination. Thus, child processes become zombie processes, and they pile up and cause instability. Without an init system, there is no process to clean up zombie processes.

Since Docker 1.13.0, they have added the –init flag when you run Docker via the CLI. This is acceptable for our local development purposes, however, in the production environment, we are utilizing GKE and required a solution to address the issue. There are a couple of ways to run it in k8s:

Use tini/dumb-init or any other init system as ENTRYPOINT in your Dockerfile, with CMD as your program:

FROM debian:stable-slim
...
RUN apt install tini
...
ENTRYPOINT ["tini" "--"]
CMD ["/app/screenshoter"

Use init container that starts before your main app in Kubernetes deployment:

apiVersion: v1
kind: Pod
metadata:
  name: screenshoter
spec:
  initContainers:
  - name: init-tini
  # sample image with tini
    image: krallin/ubuntu-tini:latest
    command: ["tini", "--", "true"]
  containers:
  - name: screenshoter
    image: screenshoter:latest
    command: ["/app/screenshoter"]

After init container completes, any zombie process will be reaped by tini.

(theoretically) set shareProcessNamespace: true in securityContext in Kubernetes. I haven’t tried this and I do not plan to, as I only touch securityContext when there is an absolute need for it.

I chose to go with option number 1 as it was the simplest and I like to have same setup locally and on all other environments, meaning same Docker image with init system both locally with Docker (actually, Orbstack) and in dev/prod in GKE.

If you read more carefully than me, you will find out that even NodeJS doesn’t recommend running NodeJS processes as PID 1 as they would not respond to SIGINT. This isn’t only applicable for NodeJS, but for any process like this, as we can see that it also happens on Golang app.