Conversation Runtime

Deterministic dialog execution engine with finite state machines, turn detection, and optional LLM hooks.

The Conversation Runtime is the core differentiator of voicetyped. It is not a chatbot builder — it is a deterministic runtime for executing voice dialog flows. Dialogs are defined as finite state machines (FSMs) that process events (speech, DTMF, timeouts, backend results) and produce actions (play TTS, transfer, hang up, call hooks).

Why a State Machine?

Most voice automation platforms use either:

  • Scripted flows — rigid, hard to maintain
  • LLM-driven conversations — unpredictable, hard to audit, slow

voicetyped uses a finite state machine because:

  • Deterministic — the same input always produces the same output
  • Auditable — every state transition is logged
  • Fast — no LLM inference latency on the critical path
  • Reliable — no hallucinations, no unexpected behavior
  • Serializable — call state survives restarts
  • LLM-optional — add LLM nodes where you actually need them

Configuration

# /etc/voice-gateway/config.yaml — runtime section

runtime:
  dialog_dir: /etc/voice-gateway/dialogs/   # Directory containing dialog YAML files
  default_timeout: 10s                       # Default timeout per state
  max_concurrent_calls: 100                  # Maximum simultaneous calls
  max_dialog_depth: 50                       # Maximum state transitions per call
  state_store: memory                        # memory, redis, postgres
  barge_in: true                             # Allow caller to interrupt TTS

Dialog Definition

Dialogs are defined in YAML files in the dialog_dir directory. Each file defines one dialog flow.

Basic Structure

# /etc/voice-gateway/dialogs/helpdesk.yaml

name: helpdesk
description: IT helpdesk intake flow
version: "1.0"

# Variables available throughout the dialog
variables:
  caller_name: ""
  issue_type: ""
  ticket_id: ""

# Routing: which calls use this dialog
routing:
  match:
    - sip_to: "sip:helpdesk@*"
    - sip_to: "sip:+18001234567@*"

states:
  # Initial state — every dialog must have a 'start' state
  start:
    on_enter:
      - action: play_tts
        text: >
          Thank you for calling IT support.
          Please briefly describe your issue.
    transitions:
      - event: speech
        target: classify_issue
      - event: dtmf
        digits: "0"
        target: transfer_to_human
      - event: timeout
        after: 15s
        target: no_input

  classify_issue:
    on_enter:
      - action: call_hook
        service: issue_classifier
        method: Classify
        payload:
          transcript: "{{ .Event.Transcript }}"
    transitions:
      - event: hook_result
        condition: "{{ .Result.Category == 'password_reset' }}"
        target: password_reset
      - event: hook_result
        condition: "{{ .Result.Category == 'hardware' }}"
        target: hardware_issue
      - event: hook_result
        target: general_issue
      - event: hook_error
        target: fallback

  password_reset:
    on_enter:
      - action: set_variable
        name: issue_type
        value: password_reset
      - action: play_tts
        text: >
          I understand you need a password reset.
          Let me create a ticket for you.
      - action: call_hook
        service: ticketing
        method: CreateTicket
        payload:
          type: password_reset
          caller: "{{ .Call.CallerID }}"
    transitions:
      - event: hook_result
        target: ticket_created
      - event: hook_error
        target: fallback

  hardware_issue:
    on_enter:
      - action: set_variable
        name: issue_type
        value: hardware
      - action: play_tts
        text: >
          For hardware issues, I will transfer you
          to our on-site support team.
    transitions:
      - event: tts_complete
        target: transfer_to_hardware

  general_issue:
    on_enter:
      - action: play_tts
        text: >
          I have noted your issue. A ticket has been created
          and a support engineer will contact you shortly.
      - action: call_hook
        service: ticketing
        method: CreateTicket
        payload:
          type: general
          caller: "{{ .Call.CallerID }}"
          transcript: "{{ .Event.Transcript }}"
    transitions:
      - event: hook_result
        target: ticket_created
      - event: hook_error
        target: fallback

  ticket_created:
    on_enter:
      - action: set_variable
        name: ticket_id
        value: "{{ .Result.TicketID }}"
      - action: play_tts
        text: >
          Your ticket number is {{ .Variables.ticket_id }}.
          Is there anything else I can help you with?
    transitions:
      - event: speech
        condition: "{{ contains .Event.Transcript 'yes' }}"
        target: start
      - event: speech
        target: goodbye
      - event: timeout
        after: 10s
        target: goodbye

  transfer_to_human:
    on_enter:
      - action: play_tts
        text: "Transferring you to a human agent. Please hold."
      - action: transfer
        target: "sip:[email protected]"

  transfer_to_hardware:
    on_enter:
      - action: transfer
        target: "sip:[email protected]"

  no_input:
    on_enter:
      - action: play_tts
        text: "I did not hear anything. Let me try again."
    transitions:
      - event: tts_complete
        target: start

  fallback:
    on_enter:
      - action: play_tts
        text: >
          I am having trouble processing your request.
          Let me transfer you to a human agent.
      - action: transfer
        target: "sip:[email protected]"

  goodbye:
    on_enter:
      - action: play_tts
        text: "Thank you for calling IT support. Goodbye."
      - action: hangup

Events

The runtime processes these event types:

EventSourceDescription
speechSpeech GatewayCaller said something (transcript available)
dtmfMedia GatewayCaller pressed a key
timeoutRuntime clockNo event received within the configured time
hook_resultIntegration GatewayBackend service responded
hook_errorIntegration GatewayBackend service failed
tts_completeSpeech GatewayTTS playback finished
call_startedMedia GatewayCall was connected
call_terminatedMedia GatewayCall was ended (by either party)

Event Data

Each event carries contextual data accessible in templates:

# Speech event
.Event.Transcript    # Full transcript text
.Event.Confidence    # Confidence score (0.0–1.0)
.Event.Language      # Detected language
.Event.DurationMs    # Speech duration in milliseconds

# DTMF event
.Event.Digit         # The digit pressed (0–9, *, #)
.Event.DurationMs    # Key press duration

# Hook result event
.Result              # The full response object from the backend
.Result.FieldName    # Access specific fields

# Call context (always available)
.Call.SessionID      # Unique call identifier
.Call.CallerID       # Caller phone number
.Call.CalledNumber   # Dialed number
.Call.StartTime      # Call start timestamp
.Variables           # User-defined variables

Actions

Actions are executed when entering a state or during transitions:

play_tts

Renders text to speech and plays it to the caller:

- action: play_tts
  text: "Hello, how can I help you?"
  voice: en_US-amy-medium     # Optional: override default voice
  speed: 1.0                   # Optional: playback speed
  barge_in: true               # Optional: allow caller to interrupt

call_hook

Calls a customer backend service via the Integration Gateway:

- action: call_hook
  service: ticketing           # Registered service name
  method: CreateTicket         # HTTP endpoint path
  payload:                     # Data to send
    type: "{{ .Variables.issue_type }}"
    caller: "{{ .Call.CallerID }}"
  timeout: 5s                  # Optional: override default timeout

transfer

Transfers the call to another SIP endpoint:

- action: transfer
  target: "sip:[email protected]"
  headers:                     # Optional: custom SIP headers
    X-Transfer-Reason: "escalation"

hangup

Terminates the call:

- action: hangup
  reason: normal               # normal, busy, rejected

set_variable

Sets a dialog variable:

- action: set_variable
  name: issue_type
  value: "{{ .Result.Category }}"

play_audio

Plays a pre-recorded audio file:

- action: play_audio
  file: /var/lib/voice-gateway/audio/hold-music.wav
  loop: true                   # Optional: loop playback

Turn Detection

Turn detection determines when the caller has finished speaking and it is the system’s turn to respond. voicetyped uses a combination of:

  1. Voice Activity Detection (VAD) — detects silence after speech
  2. Endpoint detection — confirms the utterance is complete
  3. Barge-in handling — allows the caller to interrupt TTS playback

Barge-In

When barge_in is enabled (default), the caller can interrupt TTS playback by speaking:

System: "Thank you for calling IT support. Our hours are—"
Caller: "I need a password reset"  ← Barge-in
System: [stops TTS, processes speech]

This is controlled globally or per-action:

runtime:
  barge_in: true               # Global default

# Or per-action:
- action: play_tts
  text: "Important disclaimer..."
  barge_in: false              # Don't allow interruption

DTMF Menus

Build traditional IVR menus with DTMF:

states:
  main_menu:
    on_enter:
      - action: play_tts
        text: >
          Press 1 for billing.
          Press 2 for technical support.
          Press 3 for account information.
          Press 0 to speak with an agent.
    transitions:
      - event: dtmf
        digits: "1"
        target: billing
      - event: dtmf
        digits: "2"
        target: tech_support
      - event: dtmf
        digits: "3"
        target: account_info
      - event: dtmf
        digits: "0"
        target: transfer_agent
      - event: timeout
        after: 10s
        target: main_menu  # Repeat

Multi-digit DTMF

Collect multi-digit input (e.g., account numbers):

states:
  collect_account:
    on_enter:
      - action: play_tts
        text: "Please enter your account number followed by the pound key."
    transitions:
      - event: dtmf
        digits: "*#"          # Terminated by #
        min_digits: 6
        max_digits: 12
        inter_digit_timeout: 3s
        target: verify_account

Optional LLM Nodes

For states that need natural language understanding beyond keyword matching, you can add LLM nodes:

states:
  understand_request:
    on_enter:
      - action: call_hook
        service: llm_service
        method: Classify
        payload:
          prompt: >
            Classify the following customer request into one of:
            password_reset, hardware, software, network, other.
            Request: {{ .Event.Transcript }}
          max_tokens: 50
          temperature: 0.0
    transitions:
      - event: hook_result
        condition: "{{ .Result.Category == 'password_reset' }}"
        target: password_reset
      # ... more conditions

Important: LLM nodes add latency (typically 200ms–2s). Use them only where keyword matching or simple pattern matching is insufficient. The FSM structure ensures that LLM failures are handled gracefully through hook_error transitions.

State Store

For high-availability deployments, call state can be persisted externally:

runtime:
  state_store: redis
  redis:
    address: redis:6379
    password: "${REDIS_PASSWORD}"
    db: 0
    key_prefix: "vg:session:"

This enables:

  • Call state survival across pod restarts
  • Graceful failover between runtime instances
  • Call state inspection via external tools

Metrics

MetricTypeDescription
vg_runtime_active_sessionsGaugeCurrently active call sessions
vg_runtime_state_transitionsCounterTotal state transitions
vg_runtime_dialog_errorsCounterDialog execution errors
vg_runtime_hook_latency_secondsHistogramTime waiting for hook responses
vg_runtime_turn_duration_secondsHistogramTime per conversation turn

Next Steps