"Tutorial: Workflow Durability"

Tutorial: Workflow Durability

Learn how to build resilient, long-running processes using Vox workflows. This tutorial explains the durability story Vox supports today: interpreted workflow step replay, stable activity ids, and idempotent activities.

[!WARNING] Interpreted workflow runtime durability and generated-Rust workflow durability are different things. The durable replay and recovery story shown here uses the interpreted path (vox mens workflow ...), not compiled native async functions.

1. The Challenge of Long-Running Tasks

Traditional async functions lose their state if the server restarts or a network error occurs. Vox workflows are intended to solve that by recording progress in a database.

2. Defining a Workflow

Use the bare activity and workflow keywords to describe long-running orchestration.

Use the bare activity and workflow keywords to describe long-running orchestration.

@query fn get_notes() -> List[Note] {
    ret db.Note.all()
}

@mutation fn create_note(title: str, content: str) -> Result[Id[Note]] {
    let id = db.Note.insert({ title: title, content: content })?
    ret Ok(id)
}

workflow order(id: str) -> Result[Unit] {
    let status = check_inventory(id)
    ret Ok(Unit)
}

The with block provides execution options for the activity:

  • retries: Number of attempts before failing the workflow step
  • timeout: Maximum duration allowed for a single execution
  • initial_backoff: Delay before the first retry attempt

3. How It Works

  1. Step tracking: The interpreted runtime records activity progress in Codex workflow tracking tables.
  2. Recovery: If the workflow is restarted with the same run identity, the runtime skips steps that completed successfully by reading their result from the journal.
  3. Idempotency: Activities should still be safe to retry on timeout or failure. Durable step replay is not the same thing as a universal exactly-once guarantee.

4. Workflows vs. Tasks

FeatureRegular TaskVox Workflow
SurvivalDies on rebootInterpreted workflow runtime resumes steps
RetryManual try/catchwith { retries } support
StateIn-memoryDurable step tracking

5. Best Practices

  • Idempotency: Activities should be idempotent since they might be retried after a failure.
  • Deterministic: Workflow logic must be deterministic. Avoid using rand() directly inside the workflow body; use an activity instead.
  • Stable step ids: Use explicit activity_id values for steps you expect to resume safely across restarts. with { id: "..." } sets this.

Next Steps: