Skip to content

Retries & Resilience

AI pipelines fail in transient ways: a provider rate-limits you, a connection drops, a model returns malformed JSON. Pipelex has a deliberate, honest model for handling those failures — and an equally deliberate line about what it does not do.

The short version: direct execution is for simplicity, durable execution is for resilience. Both paths shrug off brief transport blips. Only the durable path survives a crash and retries under failure. There is no half-measure pretending otherwise.


The two execution paths

Pipelex runs your .mthds methods one of two ways, and they make different promises.

Path What it promises What it does on failure
Direct execution (default) Simplicity, zero infrastructure Retries transient transport failures via the provider SDK, then surfaces the error
Durable execution (opt-in) Durability and resilience The same transport retry, plus activity-level retry and crash survival

Direct execution makes one pipeline-level attempt. If a pipe fails after the transport layer has done its retrying, the failure surfaces — the run does not restart, and it does not survive a process crash. That is the intended contract, not a gap. The value of the direct path is that it needs no cluster, no workers, no setup.

When you need retry-under-failure and durability, you run the same methods on a durable backend — see Durable Execution.


The retry model

Retry in Pipelex lives in two well-defined tiers. Nothing retries outside them.

Tier What it does Where it runs Both paths?
Tier 1 — Transport retry Retries connection errors and HTTP 408 / 409 / 429 / 5xx, honoring Retry-After Inside the inference SDK / HTTP client ✅ Yes
Tier 2 — Durable retry Activity-level retry keyed off the error category, plus workflow durability and redelivery The durable backend's worker ❌ Durable only

On the direct path there is nothing between Tier 1 and Tier 2 — that gap is the difference between the two products.

Direct execution:    [ Tier 1: transport retry ] → surface the error
Durable execution:   [ Tier 1: transport retry ] → [ Tier 2: activity retry + durability ]

This section goes deeper on each piece:

  • Automatic Retries — Tier 1 transport retry, structured-output re-ask, and bounded fan-out — the always-on mechanisms.
  • Failure Classification — how each failure is categorized, and which categories are worth retrying.
  • Durable Execution — Tier 2: crash survival and retry-under-failure on a durable backend.

Configuration reference

Setting Location Default Controls
transport_max_retries [cogt] 2 Tier 1 — transport retry attempts per request
schema_reask_max_attempts [cogt.llm_config] 3 instructor schema re-ask attempts (structured output)
max_concurrency [pipelex.pipeline_execution_config] 8 PipeBatch bounded fan-out (set "unbounded" to disable)
is_enabled [temporal] false Opt into durable execution (Tier 2)

Next steps