Build-time Elaboration
Some MTHDS directives are shorthands: they describe what the user wants in one line, but the executable form needs several pipes wired together. Rather than handle them in the runtime, Pipelex rewrites them at load time, before any pipe runs. The runtime sees only normal pipes — PipeLLM, PipeStructure, PipeSequence, and so on.
Today, exactly one directive is elaborated: structuring_method = "preliminary_text" on a PipeLLM. This page documents the mechanism so contributors and AI agents can reason about it.
Where it runs
BundleElaborator.elaborate(bundle) runs inside PipelexInterpreter.make_pipelex_bundle_blueprint(), right after the bundle dict has been validated into a PipelexBundleBlueprint. Every code path that loads a bundle from a .mthds file goes through it.
The elaborator has a fast-path short-circuit: if no pipe in the bundle carries structuring_method = "preliminary_text", the input bundle is returned unchanged (identity-preserving). Loading a bundle without preliminary-text pipes pays no overhead.
What preliminary_text becomes
Given a user-authored pipe like:
[pipe.review_restaurant]
type = "PipeLLM"
description = "Write a structured review of a restaurant from a transcript"
inputs = { transcript = "Text" }
output = "RestaurantReview"
structuring_method = "preliminary_text"
prompt = """
Write a thorough review of this restaurant based on the transcript:
@transcript
"""
the elaborator replaces the single entry review_restaurant with three blueprints:
review_restaurant__draft_text— aPipeLLMBlueprintthat inherits the originalinputs,system_prompt,prompt, andmodel. Its output is always literalText, regardless of the original output's multiplicity.structuring_methodis reset toNoneso the elaborator never recurses.review_restaurant__structure— aPipeStructureBlueprintwithinputs = { draft_text = "Text" },output = original.output(preserving multiplicity —Foo,Foo[],Foo[N]), andmodel = original.model_to_structure. Whenmodel_to_structureisNone,PipeStructurefalls back to the deck'sfor_objectLLM choice at runtime — same behavior as the legacy text-then-object path.review_restaurant(replaced) — aPipeSequenceBlueprintwrapping the two synthetic steps, keeping the originalinputs,output, anddescription. The originalpipe_codeis preserved so anything that calls or references the pipe (other pipes,main_pipe, the run API) keeps working unchanged.
The two synthetic codes are recorded in bundle.elaboration_metadata, a side-table on PipelexBundleBlueprint that maps each synthetic code to its parent_pipe_code and a step_role (DRAFT_TEXT or STRUCTURE). The wrapping sequence is not registered — it is the user-facing pipe.
Why a side-table
elaboration_metadata is declared with Field(exclude=True), so it never serializes back to .mthds, TOML, or JSON. The runtime PipeAbstract has no parent_pipe_code field; the user-facing per-pipe blueprint surface is unpolluted by the elaboration concern.
The side-table is the durable source of truth that downstream tools (graph viewer, CLI listings, distributed traces) can consult to render synthetic pipes specially. Today they don't — synthetic pipes appear as regular pipes in logs and traces — but the metadata is in place so that opt-in can land later without another round of plumbing.
Lifetime
elaboration_metadata is process-local. It survives model_copy (used by the elaborator itself), but any model_dump → model_validate round-trip drops the side-table — that is what exclude=True buys us. The elaborator's own re-validate pass (model_validate(elaborated.model_dump(by_alias=True))) discards the rehydrated bundle and returns the original model_copy-built one with metadata intact, precisely because of this.
Practical consequences:
- Within a single process, after
BundleElaborator.elaborate(...), the metadata is available everywhere the bundle goes. - Across any serialization boundary — kajson dump, library cache, Temporal payload, MTHDS export — the metadata is gone. Downstream consumers that need it today have to re-elaborate.
- The dependency loader handles one specific consequence: when a manifest restricts exports, synthetic helpers of exported parents are still loaded, even though they are never named in the manifest. This is implemented inline in
LibraryManager._load_single_dependency.
When a future consumer (graph viewer, persistent observability store) wants the metadata across boundaries, dropping exclude=True is the deliberate next step — captured as a follow-up in TODOS.md.
Multiplicity rule
Step 1 always produces a single Text, even when the original output was Foo[] or Foo[3]. Step 2 is the one that fans out: PipeStructure inspects its declared output and either calls make_object (single) or make_object_list (list, with optional fixed nb_items). This matches the deleted make_text_then_object_list behavior verbatim — one preliminary text, structured into N objects.
Image inputs
The original inputs dict (including any image variables) flows through to step 1. Step 2's input dict is hard-wired to { "draft_text": "Text" }. The structuring template references only {{ text }}. Image variables therefore appear only on step 1, where they belong — the elaborator does not need explicit drop logic.
Pre-checks and validation
Three layers guard against authoring mistakes around the output concept:
- Construction-time (string) —
PipeLLMBlueprint.validate_preliminary_text_output(amodel_validator(mode="after")) rejects a literal Text output ("Text","native.Text","Text[]","Text[N]") combined withstructuring_method = "preliminary_text". This fires duringmodel_validate, before the elaborator runs, so the user gets the error at parse time with a normal Pydantic validation failure. - Defense-in-depth (string) —
BundleElaborator._elaborate_preliminary_textre-runs the same string check. Only reachable if a caller bypasses validation viamodel_construct; the test suite exercises it, but nothing in the framework relies on the bypass. - Library-time (concept) —
PipeStructure.validate_output_with_libraryruns once the elaboratedPipeStructureresolves its concept against the loaded library. This is the only layer that catches a domain concept thatrefines = "Text"— those slip past the string-level guards because they don't read asTextin the bundle source. A clean separation: string-level guards run at parse time; concept-level guards run at library load.
The elaborator additionally:
- Verifies that the synthetic codes (
<pipe_code>__draft_text,<pipe_code>__structure) don't already exist in the bundle. - Verifies that the synthetic codes pass
is_pipe_code_valid(snake_case + length). - After producing the new bundle, re-runs
PipelexBundleBlueprint.model_validate(elaborated.model_dump(...))so bundle-level validators (concept refs, pipe refs,main_pipe) re-check the synthetic pipes. AnyValidationErroris wrapped inBundleElaboratorErrorwith context naming the originating pipe. - Asserts that no synthesized blueprint itself carries
structuring_method = "preliminary_text"— a recursive-elaboration guard. Today's synthesis explicitly setsNone; the guard protects against future elaboration kinds copying fields wholesale.
Cost and runtime semantics
A preliminary_text pipe issues two LLM calls per invocation (one for the draft text, one for the structuring step). The reporting layer counts both — the integration tests assert exactly two LLMTokensUsage records per run.
Runtime, including the Temporal data path, has no knowledge of structuring_method. Temporal sees only the elaborated form (PipeSequence + PipeLLM + PipeStructure); the existing PipeOperator machinery covers it.
When to add a new elaboration kind
BundleElaborator is intentionally narrow today. Adding a second kind means:
- A new directive on a blueprint (e.g. another field on
PipeLLMBlueprintor a new field on a different blueprint). - A construction-time validator that rejects nonsensical combinations of the directive with other fields.
- A new private method on
BundleElaborator(_elaborate_<kind>) that synthesizes the replacement pipes and registers metadata. - Extending the
_is_<kind>_pipeTypeGuardset + dispatch inelaborate.
When the second kind lands, the dispatch logic should be promoted to a small registry keyed by directive — but not before. One concrete consumer is enough; two is a pattern.
Related
PipeLLM› Structuring Method (preliminary_text) — the user-facing surface.PipeStructure— the operator that backs step 2.PipeSequence— the controller that wraps the two synthetic steps.