Troubleshooting Guide¶
Quick solutions to common problems when using pyproc.
Quick Checklist¶
Before diving into specific issues, verify these basics:
- Python dependencies installed:
pip list | grep pyproc-worker - Socket path exists and writable:
ls -la /tmp/pyproc.sock - Worker script has no syntax errors:
python3 worker.py - Go version is 1.22+:
go version - Python version is 3.9+:
python3 --version - No firewall/SELinux blocking: Check system logs
Worker Won't Start¶
Symptom¶
Common Causes & Solutions¶
1. Python Not Found¶
Diagnosis:
Solution:
WorkerConfig{
PythonExec: "/usr/bin/python3", // Use absolute path
// or
PythonExec: "/path/to/venv/bin/python", // Virtual env
}
2. Missing pyproc-worker Package¶
Diagnosis:
Solution:
3. Worker Script Syntax Error¶
Diagnosis:
Solution: Fix Python syntax errors
4. Socket Path Not Writable¶
Diagnosis:
Solution:
# Ensure directory is writable
mkdir -p /tmp
chmod 755 /tmp
# Or use user-specific path
export XDG_RUNTIME_DIR=${XDG_RUNTIME_DIR:-/tmp}
SocketPath: "$XDG_RUNTIME_DIR/pyproc.sock"
5. Stale Socket File¶
Diagnosis:
Solution:
High Latency¶
Symptom¶
Diagnosis¶
Enable metrics to identify bottleneck:
pool, _ := pyproc.NewPool(pyproc.PoolOptions{
Config: pyproc.PoolConfig{
Workers: 4,
MaxInFlight: 10,
MaxInFlightPerWorker: 1,
},
// ...
}, logger)
// Check pool health
health := pool.Health()
fmt.Printf("Workers: %d, Active: %d\n", health.Workers, health.ActiveRequests)
Common Causes & Solutions¶
1. Too Few Workers¶
Symptom: All workers constantly busy
Diagnosis:
health := pool.Health()
if health.ActiveRequests >= health.Workers * maxInFlight {
// Pool is saturated
}
Solution: Increase worker count
2. Python GIL Contention¶
Symptom: Single worker performing poorly
Solution: Use multiple processes (already default in pyproc)
// pyproc automatically bypasses GIL with multiple processes
Workers: 4, // Each is a separate Python process
3. Large Payloads¶
Diagnosis: Log request/response sizes
Solution: Consider MessagePack for large payloads
// Switch to MessagePack codec (requires Python msgpack)
import "github.com/YuminosukeSato/pyproc/pkg/pyproc/codec"
transport := pyproc.NewUnixTransport(cfg)
transport.SetCodec(codec.NewMsgpackCodec())
See Codec Performance Reference for benchmarks.
4. Slow Python Logic¶
Diagnosis: Profile Python worker
import time
@expose
def predict(req):
start = time.time()
# Your logic
result = expensive_operation(req)
elapsed = time.time() - start
print(f"Processing took {elapsed*1000:.2f}ms")
return {"result": result}
Solution: Optimize Python code - Cache model loading - Use numpy vectorization - Consider multiprocessing for CPU-bound tasks
Memory Leaks¶
Symptom¶
Worker memory grows unbounded over time.
Diagnosis¶
Go Side¶
import _ "net/http/pprof"
// In main()
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
Python Side¶
import tracemalloc
tracemalloc.start()
@expose
def predict(req):
# Your logic
result = process(req)
# Check memory
current, peak = tracemalloc.get_traced_memory()
print(f"Current memory: {current / 10**6:.2f} MB, Peak: {peak / 10**6:.2f} MB")
return result
Common Causes & Solutions¶
1. Model Not Cached in Python¶
Problem: Loading model on every request
@expose
def predict(req):
model = load_model("model.pkl") # ❌ Reloads every time!
return model.predict(req)
Solution: Load once at module level
# Load model once when worker starts
MODEL = load_model("model.pkl")
@expose
def predict(req):
return MODEL.predict(req) # ✅ Reuse cached model
2. Large Response Accumulation¶
Problem: Keeping references to responses
Solution: Let Python GC clean up
@expose
def batch_predict(req):
samples = req["samples"]
# Process in batches to avoid accumulation
results = []
for batch in chunks(samples, 100):
batch_results = process_batch(batch)
results.extend(batch_results)
# batch_results is freed by GC here
return {"results": results}
3. Go Connection Leaks¶
Problem: Not closing connections
Solution: Always defer pool shutdown
Connection Errors¶
Symptom¶
Causes & Solutions¶
1. Worker Not Started¶
Diagnosis: Check worker process
Solution: Ensure pool.Start() is called
2. Socket Path Mismatch¶
Diagnosis: Check actual socket path
Solution: Ensure paths match
// Go side
SocketPath: "/tmp/pyproc.sock"
// Python side (via env var)
export PYPROC_SOCKET_PATH=/tmp/pyproc.sock
3. SELinux/AppArmor Blocking¶
Diagnosis: Check audit logs
# SELinux
sudo ausearch -m avc -ts recent | grep pyproc
# AppArmor
sudo dmesg | grep DENIED | grep python
Solution: Add SELinux/AppArmor rules or use permissive mode (dev only)
Type Errors¶
Symptom¶
Cause¶
Python returns wrong type for field.
Solution¶
Check Python return types:
@expose
def predict(req):
# ❌ Bad: Returns string
return {"result": "84.0"}
# ✅ Good: Returns float
return {"result": 84.0}
Use type hints in Python:
from typing import TypedDict
class PredictRequest(TypedDict):
value: float
class PredictResponse(TypedDict):
result: float
model: str
@expose
def predict(req: PredictRequest) -> PredictResponse:
return {"result": req["value"] * 2.0, "model": "test"}
Worker Crashes¶
Symptom¶
Worker process exits unexpectedly.
Diagnosis¶
Check Worker Logs¶
logger := pyproc.NewLogger(pyproc.LoggingConfig{
Level: "debug", // Enable debug logs
Format: "json",
TraceEnabled: true,
})
pool, _ := pyproc.NewPool(opts, logger)
Check Python Stderr¶
import sys
import traceback
@expose
def predict(req):
try:
return process(req)
except Exception as e:
# Log full traceback
traceback.print_exc(file=sys.stderr)
raise
Common Causes¶
1. Uncaught Python Exception¶
Solution: Add error handling in Python
@expose
def predict(req):
try:
return {"result": req["value"] * 2}
except KeyError as e:
# Return error response instead of crashing
return {"error": f"Missing field: {e}"}
except Exception as e:
return {"error": f"Unexpected error: {e}"}
2. Out of Memory¶
Diagnosis: Check system logs
Solution: Configure memory limits
3. Segfault in Native Library¶
Symptom: Worker exits with signal 11 (SIGSEGV)
Solution: Update native dependencies
Debugging Techniques¶
Enable Verbose Logging¶
logger := pyproc.NewLogger(pyproc.LoggingConfig{
Level: "debug",
Format: "json",
TraceEnabled: true,
})
Test Worker Manually¶
# Start worker manually
export PYPROC_SOCKET_PATH=/tmp/test.sock
python3 worker.py
# In another terminal, test with netcat
echo '{"id": 1, "method": "predict", "body": {"value": 42}}' | \
nc -U /tmp/test.sock
Use Python pdb Debugger¶
@expose
def predict(req):
import pdb; pdb.set_trace() # Breakpoint
result = req["value"] * 2
return {"result": result}
Check Health Endpoint¶
health := pool.Health()
log.Printf("Workers: %d, Active: %d, Total Requests: %d",
health.Workers, health.ActiveRequests, health.TotalRequests)
Getting Help¶
If you're still stuck:
- Search existing issues: GitHub Issues
- Ask in Discussions: GitHub Discussions
- File a bug report: New Issue
Bug Report Template¶
When filing an issue, include:
**Environment**:
- OS: Linux/macOS/Windows
- Go version: go version
- Python version: python3 --version
- pyproc version: go list -m github.com/YuminosukeSato/pyproc
**Config**:
- Workers: 4
- MaxInFlight: 10
- MaxInFlightPerWorker: 1
- Socket path: /tmp/pyproc.sock
**Code**:
[Minimal reproducible example]
**Error**:
[Full error message and logs]
**Steps to Reproduce**:
1. ...
2. ...
Next Steps¶
- Performance Tuning: Optimize for production
- Monitoring Guide: Set up observability
- Architecture Reference: Understand internals