Last month, our kubernetes cluster for development was being unstable. Pods cannot be deleted or created, and even can’t SSH into the node. We found a lots of defunct process in the cluster. By running the below command we can show the process tree in the node and trace back to their parent.

ps -ef

We found that many defunct process are created by the RabbitMQ healthcheck command rabbitmqctl node_health_check. We just used this command as pod healthcheck in 10 seconds interval. Normally, a few defunct process is no harm to the system, but in our case, all available PIDs in the system can be exhausted in short time if new defunct process generated every 10 seconds.

To solve the problem, we have to understand what is a defunct process. A defunct process is a process that complete executation but still have entry in the process table. This is needed for child process to allow parent process to read its exit status. The parent process can remove the entry by reading the exit status by the wait system call. The process is said to be “reaped” when entry in process table is removed. Normally, those defunct process will be destroyed by “init” when the parent process exit.

docker run on your local machine use PID 1 for the command/entrypoint, but in kubernetes PID 1 is assigned to a special process “pause”. In this way, we can’t use tini as entrypoint to kill defunct process. “pause” is a infrastructure container used to setup namespace and we expect it work like “init” to kill zombie process. We found the v3.0 “pause” container didn’t handle the reaped of defunct process but was fixed in v3.1. So we decided to bump the “pause” container in all nodes from v3.0 to v3.1 and the problem is finally resolved.

References

https://github.com/kubernetes/kubernetes/issues/50865 https://github.com/kubernetes/kubernetes/pull/36853 https://github.com/kubernetes/kubernetes/blob/release-1.10/build/pause/CHANGELOG.md https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/