Skip to content

Automatic Retries

Whichever execution path you run, Pipelex applies a fixed set of automatic, bounded mechanisms to absorb transient failures and protect your providers. Transport retry is the one true retry here; the other two — schema re-ask and bounded fan-out — are closely related behaviors that shape output and control load. Together they are the always-on parts of the retry model.

Tier 1 — transport retry

Every inference SDK call retries transient transport failures before giving up. Pipelex makes this an explicit, uniform policy rather than inheriting each provider SDK's silent default.

It is controlled by one top-level setting, which you override in your project's .pipelex/pipelex.toml:

[cogt]
transport_max_retries = 2

transport_max_retries (default 2) is the number of retries attempted on top of the initial request. A value of 2 allows up to 3 attempts total. What Pipelex sets uniformly is this retry count; the exact set of failures a retry fires on is the provider SDK's own transient-failure set — broadly connection errors, request timeouts (408), rate limits (429), and server errors (5xx), with minor per-SDK variation (Google's client, for instance, omits 409). Pipelex's own SDK-less path (the raw-HTTP Azure image-generation client) pins this set explicitly to 408 / 409 / 429 / 5xx. A Retry-After response header is honored where the SDK or that path supports it.

This setting is wired into every inference SDK client that exposes a client-side retry budget — Anthropic, OpenAI / Azure OpenAI, the Pipelex Gateway LLM clients, Mistral, and Google — as well as the raw-HTTP Azure image-generation path. So the retry posture is one deliberate policy across those provider SDK clients. (The Pipelex Gateway's document-extraction and image-generation calls go through the Portkey SDK, which has no client-side retry budget; transport retries for those are owned by the gateway itself.)

Transport retry is not pipeline retry

Tier 1 retries a single HTTP request to a provider. It does not re-run a pipe, re-run a step, or restart a pipeline. If a call still fails after its transport retries are exhausted, the error surfaces.

Structured output — schema re-ask

When a pipe asks an LLM for a structured object, the model sometimes returns JSON that does not match the requested schema. Pipelex re-asks the model on that specific failure, via the instructor library.

This is output shaping, not resilience. The re-ask happens only on a schema-validation failure — a transport error is not re-asked here; it propagates to Tier 1, which is the sole transport-retry layer. The re-ask count is configured separately:

[cogt.llm_config]
schema_reask_max_attempts = 3   # instructor schema re-ask attempts — distinct from transport_max_retries

Keep the two settings distinct in your mind: transport_max_retries handles a flaky network; schema_reask_max_attempts handles a model that produced the wrong shape.

Bounded fan-out for batches

When PipeBatch maps a pipe over a large list, it does not spawn every branch at once. Branches run in bounded concurrent chunks, capped by max_concurrency (default 8):

[pipelex.pipeline_execution_config]
max_concurrency = 8

This is admission control, not retry — it stops a batch over thousands of items from triggering a self-inflicted rate-limit storm against your provider. See PipeBatch for details.