Kubernetes Troubleshooting¶
This guide covers common issues when running pyproc on Kubernetes and how to diagnose them.
CrashLoopBackOff¶
Symptoms¶
Diagnosis¶
Check container logs:
Check events:
Common Causes¶
- Missing socket directory: The container cannot create the UDS socket.
Fix: Verify the emptyDir volume is mounted at the socket path.
- Permission denied on socket: The container user cannot write to the socket directory.
Fix: Set fsGroup in the Pod securityContext to match the container user group.
- OOMKilled: The container exceeds its memory limit.
Fix: Increase resources.limits.memory for the affected container.
- Python worker binary not found:
pyproc-workeris not installed or not in PATH.
Fix: Verify the Docker image installs pyproc-worker correctly.
Probe Failures¶
Liveness Probe Failure¶
Symptoms: Pod keeps restarting. Events show Liveness probe failed.
Diagnosis:
# Test the health endpoint from inside the container
kubectl exec <pod-name> -c app -- wget -qO- http://localhost:8080/healthz
Common fixes:
- Increase
initialDelaySecondsif the app needs more startup time - Increase
timeoutSecondsif the health check is slow under load - Increase
failureThresholdto tolerate transient failures
Readiness Probe Failure¶
Symptoms: Pod is Running but not Ready. Traffic is not routed to the Pod.
Diagnosis:
Common causes:
- Python worker pool is still initializing
- Worker processes failed to connect to UDS
- Insufficient resources causing slow startup
UDS Permission Issues¶
Socket Not Found¶
Symptoms: Go application logs show connection refused or socket not found errors.
Diagnosis checklist:
- Both containers mount the same volume at the same path
PYPROC_SOCKET_DIRenvironment variable matches the mount path- In sidecar mode, the worker container creates the socket before the Go app connects
Permission Denied on Socket¶
Symptoms: permission denied errors when connecting to the UDS.
# Check socket permissions
kubectl exec <pod-name> -c app -- ls -la /var/run/pyproc/worker.sock
# Check user identity in each container
kubectl exec <pod-name> -c app -- id
kubectl exec <pod-name> -c worker -- id
Fix: Both containers must run as the same user/group, or fsGroup must be set in the Pod securityContext:
Debug Commands¶
Pod Status and Events¶
# Overview
kubectl get pods -l app=pyproc-app -o wide
# Detailed status
kubectl describe pod <pod-name>
# Events for the namespace
kubectl get events --sort-by=.metadata.creationTimestamp
Container Logs¶
# Current logs
kubectl logs <pod-name> -c app
kubectl logs <pod-name> -c worker
# Previous container logs (after restart)
kubectl logs <pod-name> -c app --previous
# Follow logs
kubectl logs <pod-name> -c app -f
# Last 100 lines
kubectl logs <pod-name> -c app --tail=100
Interactive Debugging¶
# Shell into the container
kubectl exec -it <pod-name> -c app -- /bin/sh
# Check processes
kubectl exec <pod-name> -c app -- ps aux
# Check network (UDS)
kubectl exec <pod-name> -c app -- ls -la /var/run/pyproc/
# Check resource usage
kubectl top pod <pod-name> --containers
Ephemeral Debug Container¶
When the container has a read-only filesystem or minimal tooling:
Resource Issues¶
Throttling (CPU)¶
Symptoms: High latency, slow responses, but no crashes.
Check if CPU usage is near the limit. If throttled, increase resources.limits.cpu.
OOMKilled (Memory)¶
Symptoms: Container restarts with reason OOMKilled.
Fix:
- Increase
resources.limits.memory - Investigate if the Python worker has a memory leak
- Check payload sizes (large JSON payloads consume memory)
Related Documentation¶
- Kubernetes Deployment: Pod configuration and manifests
- Docker Deployment: Container image building
- Monitoring: Metrics and observability