Kubernetes Deployment

Deploy voicetyped on Kubernetes with Helm, GPU scheduling, and autoscaling.

The Kubernetes deployment model is designed for production workloads requiring high availability, autoscaling, and GPU-accelerated speech processing. voicetyped ships as a Helm chart with sensible defaults and extensive customization options.

Prerequisites

  • Kubernetes 1.26+
  • Helm 3.12+
  • PersistentVolume provisioner (for model storage)
  • Optional: NVIDIA GPU Operator (for GPU-accelerated ASR)
  • Optional: cert-manager (for automatic TLS certificate management)

Quick Start

# Add the voicetyped Helm repository
helm repo add voicetyped https://charts.voicetyped.com
helm repo update

# Install with defaults
helm install voice-gateway voicetyped/voice-gateway \
  --namespace voice-gateway \
  --create-namespace

# Install with custom values
helm install voice-gateway voicetyped/voice-gateway \
  --namespace voice-gateway \
  --create-namespace \
  -f values.yaml

Helm Values

Complete values.yaml

# values.yaml

# Global settings
global:
  image:
    repository: voicetyped/voice-gateway
    tag: "latest"
    pullPolicy: IfNotPresent

# ──────────────────────────────────
# Media Gateway
# ──────────────────────────────────
mediaGateway:
  replicas: 2
  resources:
    requests:
      cpu: "500m"
      memory: "512Mi"
    limits:
      cpu: "2"
      memory: "1Gi"

  service:
    type: LoadBalancer              # or NodePort
    sipPort: 5060
    annotations: {}

  config:
    sipTransport: udp
    rtpPortRange: "10000-20000"
    codecs:
      - g711-ulaw
      - g711-alaw
      - opus
    jitterBufferMs: 60

  # Host networking required for RTP
  hostNetwork: true
  dnsPolicy: ClusterFirstWithHostNet

  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 10
    targetCPUUtilization: 70

# ──────────────────────────────────
# Speech Gateway
# ──────────────────────────────────
speechGateway:
  replicas: 1
  resources:
    requests:
      cpu: "2"
      memory: "4Gi"
    limits:
      cpu: "4"
      memory: "8Gi"
      nvidia.com/gpu: 1             # Request 1 GPU

  config:
    engine: whisper
    model: whisper-medium
    language: en
    maxWorkers: 4
    vad:
      threshold: 0.5
      minSilenceMs: 500

  # Model storage
  modelStorage:
    enabled: true
    storageClass: "standard"        # Your PV storage class
    size: 10Gi
    mountPath: /models

  # GPU scheduling
  gpu:
    enabled: true
    type: nvidia                    # nvidia or amd
    count: 1
  nodeSelector:
    nvidia.com/gpu.present: "true"
  tolerations:
    - key: nvidia.com/gpu
      operator: Exists
      effect: NoSchedule

  autoscaling:
    enabled: true
    minReplicas: 1
    maxReplicas: 5
    targetGPUUtilization: 80

# ──────────────────────────────────
# Conversation Runtime
# ──────────────────────────────────
runtime:
  replicas: 2
  resources:
    requests:
      cpu: "250m"
      memory: "256Mi"
    limits:
      cpu: "1"
      memory: "512Mi"

  config:
    maxConcurrentCalls: 100
    defaultTimeout: 10s
    bargeIn: true
    stateStore: redis               # Use Redis for HA

  # Load dialog definitions from ConfigMap
  dialogConfigMap: voice-gateway-dialogs

  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 20
    targetCPUUtilization: 60

# ──────────────────────────────────
# Integration Gateway
# ──────────────────────────────────
integration:
  replicas: 2
  resources:
    requests:
      cpu: "250m"
      memory: "256Mi"
    limits:
      cpu: "1"
      memory: "512Mi"

  service:
    apiPort: 8080

  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 10
    targetCPUUtilization: 60

# ──────────────────────────────────
# Redis (for state store)
# ──────────────────────────────────
redis:
  enabled: true
  architecture: replication
  auth:
    enabled: true
    existingSecret: voice-gateway-redis
  replica:
    replicaCount: 3
  persistence:
    enabled: true
    size: 1Gi

# ──────────────────────────────────
# Observability
# ──────────────────────────────────
observability:
  metrics:
    enabled: true
    port: 9100
    serviceMonitor:
      enabled: true                 # Create Prometheus ServiceMonitor
      interval: 15s
  tracing:
    enabled: false
    otlpEndpoint: ""

# ──────────────────────────────────
# Security
# ──────────────────────────────────
security:
  mtls:
    enabled: true
    certManager:
      enabled: true                 # Use cert-manager for certificates
      issuerRef:
        name: voice-gateway-ca
        kind: ClusterIssuer
  networkPolicy:
    enabled: true

# ──────────────────────────────────
# Ingress (for HTTP API)
# ──────────────────────────────────
ingress:
  enabled: false
  className: nginx
  hosts:
    - host: vg-api.internal
      paths:
        - path: /
          pathType: Prefix
  tls: []

Architecture on Kubernetes

┌─────────────────────────────────────────────┐
│  Kubernetes Cluster                          │
│                                              │
│  ┌──────────────┐  ┌──────────────────────┐ │
│  │ LoadBalancer  │  │ ServiceMonitor       │ │
│  │ :5060 (SIP)  │  │ (Prometheus)         │ │
│  └──────┬───────┘  └──────────────────────┘ │
│         │                                    │
│  ┌──────▼───────┐                           │
│  │ Media Gateway │ (hostNetwork, 2-10 pods) │
│  └──────┬───────┘                           │
│         │                                    │
│  ┌──────▼───────────┐                       │
│  │ Speech Gateway    │ (GPU nodes, 1-5 pods)│
│  │ + PV (models)     │                      │
│  └──────┬───────────┘                       │
│         │                                    │
│  ┌──────▼───────┐  ┌───────────┐            │
│  │ Runtime       │──│ Redis     │            │
│  │ (2-20 pods)   │  │ (HA)      │            │
│  └──────┬───────┘  └───────────┘            │
│         │                                    │
│  ┌──────▼────────────┐                      │
│  │ Integration GW     │ (2-10 pods)         │
│  │ :8080 (REST)       │                     │
│  └────────────────────┘                     │
└─────────────────────────────────────────────┘

Dialog ConfigMap

Store dialog definitions in a ConfigMap:

# Create ConfigMap from dialog files
kubectl create configmap voice-gateway-dialogs \
  --from-file=/path/to/dialogs/ \
  --namespace voice-gateway

Or declaratively:

apiVersion: v1
kind: ConfigMap
metadata:
  name: voice-gateway-dialogs
  namespace: voice-gateway
data:
  helpdesk.yaml: |
    name: helpdesk
    states:
      start:
        on_enter:
          - action: play_tts
            text: "Welcome to IT support."
        transitions:
          - event: speech
            target: process
  # ... more dialogs

GPU Scheduling

NVIDIA GPU Operator

Install the NVIDIA GPU Operator if not already present:

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm install gpu-operator nvidia/gpu-operator \
  --namespace gpu-operator \
  --create-namespace

GPU Node Labels

Ensure GPU nodes are labeled:

kubectl label nodes gpu-node-1 nvidia.com/gpu.present=true

Multiple GPU Types

If you have different GPU types, use node affinity:

speechGateway:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: nvidia.com/gpu.product
                operator: In
                values:
                  - Tesla-T4
                  - NVIDIA-A100-SXM4-40GB

Network Configuration

SIP Load Balancing

SIP over UDP requires special load balancing considerations. Use hostNetwork: true on Media Gateway pods and a UDP-capable load balancer:

mediaGateway:
  hostNetwork: true
  service:
    type: LoadBalancer
    annotations:
      # AWS NLB
      service.beta.kubernetes.io/aws-load-balancer-type: nlb
      # GCP
      cloud.google.com/l4-rbs: "enabled"

Network Policies

When security.networkPolicy.enabled: true, the Helm chart creates NetworkPolicies that restrict traffic:

  • Media Gateway accepts SIP/RTP from external
  • Speech Gateway only accepts from Media Gateway
  • Runtime only accepts from Speech Gateway
  • Integration Gateway only accepts from Runtime and external API clients

Scaling Guidelines

Concurrent CallsMedia GW PodsSpeech GW Pods (GPU)Runtime PodsIntegration Pods
102122
503243
1005385
5001052010

Next Steps