This document discusses how to debug pods in Kubernetes that are difficult to debug. It begins by introducing the author and their background. It then covers common causes of pod problems like Kubernetes, node, and application issues. Specific techniques are presented for debugging pods that continuously restart or do not have sufficient tools available. These include adding debugging containers, using the container host's process information, and inserting debugging binaries. The challenges of read-only filesystems are also addressed. Overall, the document provides guidance on debugging pods in different difficult situations.
How to debug the pod which is hard to debug (디버그 하기 어려운 POD 디버그 하기)
1.
2. Eohyung Lee, Cloud engineer, Kakao enterprise
How to debug the pod
which is hard to debug
3. Who am I?
● Eohyung Lee
○ A.K.A. 어형부형 in facebook
● Bio
○ Kakao enterprise (NOW)
■ Build kakao cloud service
○ Line plus (~2019)
■ Build cloud native platform by kubernetes
○ Kakao (~2017)
■ Build private cloud service by openstack
○ KT (2010~2014)
■ Build public cloud storage service by openstack swift
4. Problems while using kubernetes
kubernetes problem
CNI
CSI
kubernetes API
...
node problem
kubelet
kernel
physical network
...
application problem
code
config
5. Cause of POD problems
POD
problem
kubernetes problem
CNI
CSI
kubernetes API
...
node problem
kubelet
kernel
physical network
...
application problem
code
config
6. Kind of POD problems
● Stuck in some status
○ pending, waiting, unknown status
● Dying repeatedly
○ crashloopback, error status
7. How to solve general problems
● Kubernetes problem
○ Check control plane logs
○ Check events
○ Check important component logs
○ ...
● Node problem
○ Check kubelet logs
○ Check kernel logs
○ Test physical network
○ ...
8. How to solve general problems (2)
● Application problem
○ Check code
○ Check config
○ Check container logs
○ ...
● Basic knowhow
○ Debug Pods and ReplicationControllers
■ https://kubernetes.io/docs/tasks/debug-application-cluster/d
ebug-pod-replication-controller/
10. How to debug mixed POD problems?
●Watch logs, events, status about POD
●Add more logs into application code
●Reproduce problem in the stage environment
11. If Reproduced problem failed
●Run command inside the container
$ kubectl exec ${POD_NAME} -c ${CONTAINER_NAME} -- ${CMD}
${ARG1} ${ARG2} ... ${ARGN}
●Or ssh into container host node then do command
$ ssh ${CONTAINER_HOST}
CONTAINER_HOST:~$ docker exec ${CONTAINER_ID}
13. When is it hard to debug POD?
● Continuously restarting POD (CrashLoopBack,
Error, ...)
● Not enough environment for debugging pod
14. How to debug
continuously restarting POD?
● Continuously restarting
POD (CrashLoopBack,
Error)
○ Command failed
■ Replace with command that
do not fail while debugging
POD
● e.g. sleep
○ Liveness probe have failed
■ Remove liveness
configuration temporarily
while debugging POD
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: mycontainer
image: k8s.gcr.io/busybox
command:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 3
periodSeconds: 5
15. How to debug POD
with not enough environment?
● Representative difficult case
○ Based on scratch image
■ No tools for debugging, including shell
● General solution (with restart container)
○ Deploy new image with debugging tools
■ After debugging, redeploy to the original image, but it is
inconvenient
● Proposed solutions (without restart container)
○ Use container host information
○ Enter container namespace in container host
○ Insert debugging tools into the POD
16. Using container host information
● Gathering information by
checking directories under
/proc/{container_pid}/
○ Check container root directory
$ cd /proc/${CONTAINER_PROCESS_ID}/root/
○ Check container network
information
$ cd /proc/${CONTAINER_PROCESS_ID}/net/
○ More information
■ https://www.linux.com/news/discover-p
ossibilities-proc-directory/
container
/ directory
17. Enter container namespace in
container host
● Enter container namespace
using nsenter
○ Use lsns to check namespace lists
○ Use network namespace with
container host binaries
$ nsenter -t ${CONTAINER_PROCESS_ID} -n ss
○ But, when using mount namespace
can’t use container host binaries
○ More information
■ http://man7.org/linux/man-pages/man1/n
senter.1.html
UTS
name
space
IPC
name
space
PID
name
space
USR
name
space
NET
name
space
MNT
name
space
container
host
UTS
host
IPC
host
PID
host
USR
host
MNT
nsenter
( only enter container network
namespace)
18. HOST
Insert debugging tools into the POD
● scratch-debugger
○ Insert busybox binary into the
POD based on scratch image
■ https://github.com/kubernetes-re
tired/contrib/tree/master/scratc
h-debugger
● But, sometimes it can not
work with
○ Using containerd runtime
○ Using read only file system
POD
target
container
1
POD
busybox
container
CREATE
1
HOST
POD
target
container
2
POD
busybox
container
docker cp
COPY BUSYBOX
19. When using containerd runtime
● Container Runtime
Interface(CRI)
○ No feature for copy binaries into
container like docker cp
○ All other container runtimes has same
problem
● Solution
○ Copy debugging tools into
/proc/${container_pid}/root directory
$ cp busybox /proc/${CONTAINER_PROCESS_ID}/root
$ crictl exec -ti ${CONTAINER_ID} /busybox sh
HOST
host process data
/proc/${CONTAINER_PROCESS_ID}/root
disk
container
/ directory
COPY BUSYBOX
/
disk
same
busybox
21. When using read only file system
● Can not copy binary into read
only file system
● docker cp command is not
working
$ docker cp binary 0cf670cd0f25:/
Error response from daemon: container rootfs is
marked read-only
● under /proc directories is not
working too
$ cp binary /proc/33608/root
cp: cannot create regular file `binary':
Read-only file system
HOST
host process data
/proc/${CONTAINER_PROCESS_ID}/root
read only disk
container
/ directory
COPY BUSYBOX
/
disk
same
busybox
FAIL
22. When using read only file system (2)
● Use mount points directory
$ cd
/run/containerd/io.containerd.runtime.v1.linux
/k8s.io/${CONTAINER_ID}/rootfs
$ wget
https://busybox.net/downloads/binaries/1.31.0-
i686-uclibc/busybox
$ chmod +x busybox
$ mkdir bin
$ ./busybox --install ./bin
$ crictl exec -ti ${CONTAINER_ID} /busybox sh
HOST
ephemeral container data
/run/containerd/io.containerd.runtime.v1.lin
ux/k8s.io/${CONTAINER_ID}/rootfs
host process data
/proc/${CONTAINER_PROCESS_ID}/root
disk
read only disk
container
/ directory
INSTALL BUSYBOX
bind mount
src
/
disk
dst
same
busybox
23. Important thing!
● Most of dynamically-linked executable
$ crictl exec -ti 0cf670cd0f25 /busybox sh
starting container process caused "exec: "/busybox": stat /busybox: no
such file or directory": unknown
● Need to use statically-linked executable
$ ldd busybox
not a dynamic executable