Inference Backend Plugins

Every model call in Pipelex — an LLM completion, an image generation, a document extraction, a web search — is served by an inference worker. Which worker handles a given model is decided entirely by data: a model's sdk field selects a backend, and a backend plugin is what teaches Pipelex how to build the worker for that sdk.

Core names no backend by import or by string. The built-in drivers (OpenAI, Gateway, Anthropic, Mistral, Bedrock, Google, FAL, HuggingFace, Docling, …) are plugins too — they ride the exact same seam an out-of-tree plugin would. This page documents that seam, the Inference SPI a plugin compiles against, and how to write one.

The seam in one view

boot (Pipelex.setup)
  └─ build_registrar(config)            # pure, import-light
       ├─ for each plugin in BUILTIN_PLUGINS
       └─ for each installed "pipelex.plugins" entry point
            └─ plugin.register(registrar)        # side-effect-free
                 └─ registrar.add_inference_backend(family=…, sdk=…, make_worker=…)
  └─ InferenceBackendRegistry(registrar.inference_backends)   # stored on the hub

run time (e.g. LLMWorkerFactory.make_llm_worker)
  └─ model_handle = ModelHandle.make_for_inference_model(...)
  └─ make_worker = get_inference_backend_registry().lookup(family=LLM, sdk=model_handle.sdk)
  └─ worker = make_worker(inference_model=…, backend=…, sdk_clients=…, reporting_delegate=…)

The worker factories (LLMWorkerFactory, ImgGenWorkerFactory, ExtractWorkerFactory, SearchWorkerFactory) hold no match over SDK strings. They build a ModelHandle, resolve the InferenceBackend config, look up the backend's make_worker by (family, sdk), and call it. A lookup miss raises a friendly NotImplementedError ("… Is its plugin installed?").

The contract: `PipelexPlugin`

A plugin is any object satisfying the @runtime_checkable PipelexPlugin protocol:

class PipelexPlugin(Protocol):
    name: str            # unique, lowercase identifier (e.g. "openai")
    targets_api: int     # must equal PLUGIN_API_VERSION
    def register(self, registrar: PluginRegistrar) -> None: ...

register is the only method core calls, and it is side-effect-free: it may call the registrar's menu methods and nothing else — no hub access, no I/O, no client/SDK construction. This is what makes build_registrar safe to run more than once (it runs at boot, and again at CLI-build to harvest plugin-contributed commands).

targets_api is checked against PLUGIN_API_VERSION. A mismatch fails loud with PluginApiVersionMismatchError — a single coarse integer gate, not semver-range matching.

Registering an inference backend

A backend plugin's register calls one menu method per (family, sdk) it serves:

registrar.add_inference_backend(
    family=InferenceFamily.LLM,         # LLM | IMG_GEN | EXTRACT | SEARCH
    sdk="acme",                          # the model's `sdk` string
    make_worker=_make_acme_worker,       # a MakeWorkerFn (a plain callable)
)

A registry key is (family, sdk). The same sdk string may appear in two families (e.g. google serves both LLM and IMG_GEN); they are distinct keys. A duplicate (family, sdk) fails loud with DuplicateInferenceBackendError naming both contributing plugins.

One plugin may register across several families from a single register — the built-in gateway plugin serves all four, mistral serves LLM + EXTRACT, linkup serves EXTRACT + SEARCH. This is the cross-family-vendor coordination point: one plugin, many backends.

`MakeWorkerFn` — the locked call shape

A backend is a typed callable, not a one-method object. Its signature is the contract:

def _make_acme_worker(
    *,
    inference_model: InferenceModelSpec,
    backend: InferenceBackend,
    sdk_clients: SdkClientRegistry,
    reporting_delegate: ReportingProtocol | None,
) -> InferenceWorkerAbstract:
    ...

The factory always passes all four keyword arguments. A stateless backend simply ignores the ones it doesn't need (# noqa: ARG001 on the unused parameter — see the built-in pypdfium2 and azure_rest plugins).

Two invariants shape what goes inside the closure:

Import-light. The plugin module must import no backend SDK at module load. Do the SDK import inside the closure (# noqa: PLC0415), so merely discovering the plugin never pulls a heavy/optional dependency. Booting Pipelex with the built-ins registered imports none of anthropic, mistralai, google.genai, boto3, fal_client, huggingface_hub, docling, linkup, … — enforced by a subprocess import-blocker guard.
Fail at use, not at boot. Guard an optional dependency with require_sdk(...) inside the closure. A missing extra then raises MissingDependencyError (naming the package and the pipelex[<extra>] install hint) only when the backend is actually used.

Client memoization goes through sdk_clients.get_or_create(handle=…, build=lambda: …) — the registry caches one SDK client per ModelHandle, so repeated worker construction reuses the connection.

Listing models — an optional capability

Alongside its worker factory, a backend plugin may register a model lister — the callable behind pipelex show models <backend>. It is optional: a backend that cannot enumerate its models simply never calls add_model_lister, and the listing command reports that SDK as unsupported-for-listing. The contract grows by one optional method, so progressive disclosure is preserved.

registrar.add_model_lister(sdk="acme", lister=_list_acme_models)

A ListModelsFn mirrors MakeWorkerFn — import-light to reference, lazy inside. It is always async (the listing loop awaits it) and is keyed by sdk alone (listing is per-SDK, not per-(family, sdk)):

async def _list_acme_models(
    *,
    sdk: str,
    backend_name: str,
    backend: InferenceBackend,
    flat: bool,
    any_listed: bool,
) -> None:
    from my_pkg.acme_list import list_acme_models  # noqa: PLC0415

    await list_acme_models(sdk=sdk, backend_name=backend_name, backend=backend, flat=flat, any_listed=any_listed)

The same import-light / fail-at-use rules apply: a missing optional extra must raise MissingDependencyError only when the backend is actually listed — use the require_sdk(...) helper, or an equivalent find_spec guard inside the function the lister delegates to (the built-in listers do the latter, reusing the guard already in their list_*_models functions). If a registered lister's client variant cannot enumerate models at runtime (e.g. a Bedrock-backed Anthropic client), raise ModelListingUnsupportedError(sdk=…): the loop treats that as the same soft "unsupported for listing" outcome as a missing lister — never a hard failure. A duplicate sdk lister fails loud with DuplicateModelListerError naming both plugins.

Authoring a backend plugin (minimal example)

A complete LLM backend plugin for a hypothetical acme SDK:

# acme_plugin.py
from pipelex.cogt.inference.inference_worker_abstract import InferenceWorkerAbstract
from pipelex.cogt.model_backends.backend import InferenceBackend
from pipelex.cogt.model_backends.model_spec import InferenceModelSpec
from pipelex.plugins.contract import PLUGIN_API_VERSION
from pipelex.plugins.inference_backend_registry import InferenceFamily, require_sdk
from pipelex.plugins.model_handle import ModelHandle
from pipelex.plugins.registrar import PluginRegistrar
from pipelex.plugins.sdk_client_registry import SdkClientRegistry
from pipelex.reporting.reporting_protocol import ReportingProtocol

_ACME_MISSING_MSG = "The acme SDK is required in order to use Acme models."


def _make_acme_worker(
    *,
    inference_model: InferenceModelSpec,
    backend: InferenceBackend,
    sdk_clients: SdkClientRegistry,
    reporting_delegate: ReportingProtocol | None,
) -> InferenceWorkerAbstract:
    require_sdk(spec="acme", extra="acme", msg=_ACME_MISSING_MSG)

    from acme import AcmeClient  # noqa: PLC0415

    from my_pkg.acme_llm_worker import AcmeLLMWorker  # noqa: PLC0415

    model_handle = ModelHandle.make_for_inference_model(inference_model=inference_model)
    sdk_instance = sdk_clients.get_or_create(
        handle=model_handle,
        build=lambda: AcmeClient(api_key=backend.api_key),
    )
    return AcmeLLMWorker(
        sdk_instance=sdk_instance,
        inference_model=inference_model,
        reporting_delegate=reporting_delegate,
    )


class AcmePlugin:
    name = "acme"
    targets_api = PLUGIN_API_VERSION

    def register(self, registrar: PluginRegistrar) -> None:
        registrar.add_inference_backend(family=InferenceFamily.LLM, sdk="acme", make_worker=_make_acme_worker)

Shipping it as an out-of-tree plugin

Declare an entry point in the pipelex.plugins group; discovery finds it automatically once the distribution is installed (no central enable-list — presence is the source of truth):

# pyproject.toml of your plugin package
[project.entry-points."pipelex.plugins"]
acme = "my_pkg.acme_plugin:AcmePlugin"

The entry point may resolve to a plugin instance or a zero-argument factory. A broken entry point is isolated and reported as BrokenPluginError, never a silent skip.

Disabling a discovered plugin

build_registrar skips (and logs) any plugin whose name appears in the core-owned plugins.disabled denylist:

# .pipelex/pipelex.toml
[plugins]
disabled = ["acme"]

For external entry points the denylist is matched against the entry-point name before the plugin is loaded, so a broken or dependency-missing installed plugin can still be disabled to recover startup (it would otherwise raise BrokenPluginError at load). A working external plugin that sets a name different from its entry-point name is denylistable by either name.

Disabling a core-unconditional plugin (openai) is a configuration error, not a no-op — it raises CoreUnconditionalPluginDisabledError. There is intentionally no allowlist: a plugin's presence (built-in or installed entry point) is what enables it.

Use pipelex plugins list to see every discovered plugin, what each contributed, and its denylist state.

The Inference SPI

What an out-of-tree backend plugin imports is the contract. The published surface:

Symbol	Module	Role
`PipelexPlugin`, `PLUGIN_API_VERSION`	`pipelex.plugins.contract`	the plugin protocol + version gate
`PluginRegistrar`	`pipelex.plugins.registrar`	the accumulator `register` writes into
`InferenceFamily`, `MakeWorkerFn`, `require_sdk`	`pipelex.plugins.inference_backend_registry`	family enum, callable type, dependency guard
`ListModelsFn`	`pipelex.plugins.model_lister_registry`	the optional model-listing callable type
`ModelListingUnsupportedError`	`pipelex.cogt.exceptions`	soft signal a lister raises when its client variant cannot list
`ModelHandle`	`pipelex.plugins.model_handle`	the backend selector derived from a model spec
`SdkClientRegistry`	`pipelex.plugins.sdk_client_registry`	per-handle client memoization
`InferenceModelSpec`	`pipelex.cogt.model_backends.model_spec`	the resolved model record
`InferenceBackend`	`pipelex.cogt.model_backends.backend`	the backend config record (api key, extras)
`InferenceWorkerAbstract` and the `{LLM,ImgGen,Extract,Search}WorkerAbstract` subclasses	`pipelex.cogt.…`	the worker contracts a plugin returns
`MissingDependencyError`	`pipelex.exceptions`	raised by `require_sdk`

The SPI is a documented, versioned module/symbol list gated by PLUGIN_API_VERSION — not an __init__.py re-export shim (the repo bans re-exports; import by full path). Anything a plugin needs to import outside this surface is a design gap to resolve, not an accident to live with.

Fail-loud guarantees

Condition	Error
`targets_api` ≠ `PLUGIN_API_VERSION`	`PluginApiVersionMismatchError`
duplicate `(family, sdk)`	`DuplicateInferenceBackendError` (names both plugins)
duplicate `sdk` model lister	`DuplicateModelListerError` (names both plugins)
`name` in `plugins.disabled` but core-unconditional	`CoreUnconditionalPluginDisabledError`
entry point raises while loading/registering	`BrokenPluginError`
lookup for an unregistered `(family, sdk)`	`InferenceBackendNotFoundError` ("… Is its plugin installed and enabled?")
optional SDK missing at use	`MissingDependencyError` (package + `pipelex[<extra>]` hint)

Pipe Routing & Execution — where worker construction sits in the run path
Error Model — how these errors render and dereference
Architecture Overview