pyproc Operations Guide¶

Deployment Models¶

Single Host Deployment¶

# Standard deployment on a single machine
pool:
  workers: 4           # Number of Python processes
  max_in_flight: 10    # Max concurrent requests across the pool
  max_in_flight_per_worker: 1  # Max in-flight requests per worker
  health_interval: 30s # Health check frequency

python:
  executable: python3
  worker_script: /app/worker.py
  env:
    PYTHONUNBUFFERED: "1"

socket:
  dir: /tmp
  prefix: pyproc
  permissions: 0600

Kubernetes Deployment¶

Place Go app and Python workers in the same pod for UDS communication:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    spec:
      volumes:
      - name: pyproc-sockets
        emptyDir: {}

      containers:
      - name: app
        image: myapp:latest
        volumeMounts:
        - name: pyproc-sockets
          mountPath: /var/run/pyproc
        env:
        - name: PYPROC_SOCKET_DIR
          value: /var/run/pyproc
        - name: PYPROC_POOL_WORKERS
          value: "4"

Docker Compose¶

version: '3.8'
services:
  app:
    build: .
    volumes:
      - sockets:/var/run/pyproc
    environment:
      PYPROC_SOCKET_DIR: /var/run/pyproc
      PYPROC_POOL_WORKERS: 4

volumes:
  sockets:
    driver: local

Process Model¶

One Go process manages one or more Python workers
Each worker listens on a dedicated Unix domain socket
Workers are isolated - crash of one doesn't affect others
Automatic restart on worker failure (configurable)

Configuration¶

Worker Configuration¶

cfg := pyproc.WorkerConfig{
    ID:           "worker-1",
    SocketPath:   "/tmp/pyproc.sock",
    PythonExec:   "python3",
    WorkerScript: "worker.py",
    StartTimeout: 30 * time.Second,
    Env: map[string]string{
        "PYTHONUNBUFFERED": "1",
        "MODEL_PATH": "/models/latest",
    },
}

Pool Configuration¶

poolCfg := pyproc.PoolConfig{
    Workers:        4,               // Number of workers
    MaxInFlight:    10,              // Global concurrency across the pool
    MaxInFlightPerWorker: 1,         // Per-worker in-flight cap
    HealthInterval: 30 * time.Second, // Health check frequency
    Restart: pyproc.RestartConfig{
        MaxAttempts:    5,
        InitialBackoff: 1 * time.Second,
        MaxBackoff:     30 * time.Second,
        Multiplier:     2.0,
    },
}

Health and Monitoring¶

Health Checks¶

Python workers automatically expose a health endpoint:

# Automatically registered by pyproc_worker
def health(req):
    return {
        "status": "healthy",
        "pid": os.getpid(),
        "uptime": time.time() - start_time,
        "requests_handled": request_count
    }

Metrics Collection¶

Export Prometheus metrics from Go:

// Recommended metrics endpoint
http.Handle("/metrics", promhttp.Handler())
http.ListenAndServe(":9090", nil)

Key metrics to track: - pyproc_worker_requests_total - Total requests per worker - pyproc_worker_request_duration_seconds - Request latency - pyproc_worker_errors_total - Error count by type - pyproc_worker_restarts_total - Worker restart count - pyproc_pool_inflight_requests - Current in-flight requests

Logging¶

Structured logging with trace IDs:

logger := pyproc.NewLogger(pyproc.LoggingConfig{
    Level:        "info",
    Format:       "json",
    TraceEnabled: true,
})

Log aggregation recommendations: - Use structured JSON logging - Include trace IDs for request correlation - Ship logs to centralized system (ELK, Datadog, etc.)

Lifecycle Management¶

Startup Sequence¶

Go application starts
Worker pool initialized
Python workers spawned
Socket connections established
Health checks begin
Ready to serve requests

Graceful Shutdown¶

// Handle shutdown signals
sigCh := make(chan os.Signal, 1)
signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)

<-sigCh
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

if err := pool.Shutdown(ctx); err != nil {
    log.Printf("Shutdown error: %v", err)
}

Worker Restart Strategy¶

Configure automatic restart with exponential backoff:

restart:
  max_attempts: 5
  initial_backoff: 1s
  max_backoff: 30s
  multiplier: 2.0

Resource Management¶

Memory Considerations¶

Python processes can consume significant memory
Monitor RSS (Resident Set Size) per worker
Set memory limits in container deployments
Consider worker recycling after N requests

CPU Allocation¶

# Kubernetes resource limits
resources:
  requests:
    cpu: "2"
    memory: "4Gi"
  limits:
    cpu: "4"
    memory: "8Gi"

File Descriptors¶

Ensure sufficient file descriptors:

# Check current limit
ulimit -n

# Increase limit (add to systemd service or container)
ulimit -n 65536

Performance Tuning¶

Worker Count¶

# Optimal worker count formula
workers = min(
    cpu_cores * 2,  # CPU-bound workloads
    cpu_cores * 4   # I/O-bound workloads
)

Socket Buffer Sizes¶

// Tune socket buffers for large payloads
conn.SetReadBuffer(1024 * 1024)  // 1MB
conn.SetWriteBuffer(1024 * 1024) // 1MB

Connection Pool Size¶

// Match MaxInFlight to expected concurrency
MaxInFlight: runtime.NumCPU() * 2
// Keep per-worker at 1 unless the Python worker can process concurrent requests
MaxInFlightPerWorker: 1

Troubleshooting¶

Common Issues¶

Worker won't start
Check Python path and dependencies
Verify socket permissions
Review worker script syntax
High latency
Monitor worker CPU usage
Check for GIL contention
Increase worker count
Connection refused
Verify socket path exists
Check filesystem permissions
Ensure worker is running
Memory leaks
Monitor Python process memory
Implement worker recycling
Profile Python code

Debug Mode¶

Enable debug logging:

logger := pyproc.NewLogger(pyproc.LoggingConfig{
    Level: "debug",
})

Health Check Failures¶

Check worker health status:

# Manual health check
echo '{"id":1,"method":"health","body":{}}' | \
  nc -U /tmp/pyproc.sock

Security Best Practices¶

Run workers with least privilege user
Set restrictive socket permissions (0600)
Validate input in Python functions
Use separate Python virtual environments
Regular dependency updates
Monitor for anomalous behavior