Dry Run Mock Generation
Dry runs validate pipeline structure without executing inference. This requires generating mock StuffContent objects that satisfy Pydantic field constraints. The DryRunFactory system produces format-compliant mock values for fields with validation rules (e.g., snake_case identifiers, PascalCase concept codes).
Why Mock Generation Matters
Pydantic models in Pipelex often enforce format constraints via field_validator or model_validator:
class BundleHeaderSpec(StructuredContent):
domain_code: str # Must be snake_case
main_pipe: str # Must be snake_case, must exist in pipe dict
class ConceptSpec(StructuredContent):
the_concept_code: str # Must be PascalCase
Standard mock generators (like Polyfactory) produce random strings like "uygNjiAuDMOtZEyibgHw" which fail validation. The dry run system addresses this at two levels:
- Field-level: Generate values matching expected formats (snake_case, PascalCase, concept refs, etc.)
- Model-level: Bypass validators using
factory_use_construct=True
When Dry Run Mock Generation Is Used
flowchart TD
A[pipelex validate --all] --> B[dry_run_pipes]
B --> C[WorkingMemoryFactory.make_mock_inputs]
C --> D[DryRunFactory.make_dry_run_factory]
D --> E[Polyfactory with custom providers]
F[PipeLLM dry run] --> G[ContentGeneratorDry.make_object_direct]
G --> H[DryRunFactory.make_dry_run_factory]
H --> E
I[PipeCompose dry run] --> J[StructuredContentComposer.compose]
J --> K[model_validate with resolved values]
| Trigger | Entry Point | Mock Generation |
|---|---|---|
pipelex validate |
dry_run_pipe() |
WorkingMemoryFactory.make_mock_inputs() |
PipeLLM output in dry mode |
ContentGeneratorDry.make_object_direct() |
DryRunFactory (auto-detects from field definitions) |
PipeFunc output in dry mode |
WorkingMemoryFactory.make_mock_content() |
DryRunFactory (with explicit field constraints) |
PipeCompose in dry mode |
StructuredContentComposer.compose() |
Uses resolved values from working memory (no mocks) |
Architecture
MockFormat Enum
Located at pipelex/cogt/content_generation/dry_run_factory.py, the MockFormat enum defines all supported mock value formats:
class MockFormat(StrEnum):
SNAKE_CASE = "snake_case"
PASCAL_CASE = "pascal_case"
CONCEPT_REF = "concept_ref"
IGNORE = "ignore"
DICT_SNAKE_KEY_PASCAL_VALUE = "dict_snake_key_pascal_value"
DICT_SINGLE_EXTRACT_INPUT = "dict_single_extract_input"
DryRunFactory
The factory dynamically creates a Polyfactory ModelFactory subclass with field-specific providers:
class DryRunFactory:
@classmethod
def generate_snake_case_code(cls) -> str:
suffix = "".join(random.choices(string.ascii_lowercase, k=4))
return f"mock_{suffix}" # e.g., "mock_abcd"
@classmethod
def generate_pascal_case_code(cls) -> str:
suffix = "".join(random.choices(string.ascii_lowercase, k=4))
return f"Mock{suffix.capitalize()}" # e.g., "MockAbcd"
@classmethod
def generate_concept_ref(cls) -> str:
return f"{cls.generate_snake_case_code()}.{cls.generate_pascal_case_code()}"
# e.g., "mock_abcd.MockCdef"
@classmethod
def make_dry_run_factory(
cls,
object_class: type[BaseModelTypeVar],
snake_case_field_names: set[str] | None = None,
pascal_case_field_names: set[str] | None = None,
) -> type[ModelFactory[BaseModelTypeVar]]:
...
Generated Mock Value Formats
| Format | Generator | Example Output | Used For |
|---|---|---|---|
SNAKE_CASE |
generate_snake_case_code() |
mock_abcd |
domain_code, pipe_code, the_field_name |
PASCAL_CASE |
generate_pascal_case_code() |
MockAbcd |
the_concept_code |
CONCEPT_REF |
generate_concept_ref() |
mock_abcd.MockCdef |
concept_ref field (domain.ConceptCode format) |
IGNORE |
Sets field to Ignore() |
None/default | default_value, structure fields |
DICT_SNAKE_KEY_PASCAL_VALUE |
generate_dict_snake_key_pascal_value() |
{mock_abcd: MockCdef} |
inputs dict in PipeSpec |
DICT_SINGLE_EXTRACT_INPUT |
generate_dict_single_extract_input() |
{mock_abcd: "Image"} |
inputs dict in PipeExtract |
| Random string | Polyfactory default | uygNjiAuDMOtZEyibgHw |
All other string fields |
Declaring MockFormat on Fields
The DryRunFactory auto-detects format constraints from Pydantic Field definitions using the mock_format key in json_schema_extra:
from pydantic import Field
from pipelex.cogt.content_generation.dry_run_factory import MockFormat
class ConceptStructureSpec(StructuredContent):
# Snake case field
the_field_name: str = Field(
description="Field name. Must be snake_case.",
json_schema_extra={"mock_format": MockFormat.SNAKE_CASE}
)
# Concept reference field (domain.ConceptCode format)
concept_ref: str | None = Field(
default=None,
description="For type='concept', the concept reference.",
json_schema_extra={"mock_format": MockFormat.CONCEPT_REF},
)
# Field to ignore during mock generation (use default/None)
default_value: Any | None = Field(
default=None,
json_schema_extra={"mock_format": MockFormat.IGNORE}
)
Using Field Examples for Enum-like Values
For fields that should pick from a set of valid values (like enum members or known strings), use the examples parameter. The factory's __use_examples__: True configuration makes Polyfactory randomly select from provided examples:
class ConceptSpec(StructuredContent):
# Refines should be one of the native concepts
refines: str | None = Field(
default=None,
examples=["Text", "Image", "Document", "TextAndImages", "Number", "Page"],
)
class PipeSpec(StructuredContent):
# Extract talent should be a valid ExtractTalent value
extract_talent: ExtractTalent | str = Field(
description="Select extraction model talent",
examples=list(ExtractTalent), # Polyfactory picks randomly from these
)
Cross-Field Dependencies with PostGenerated
For fields that depend on values generated for other fields, use Polyfactory's PostGenerated directive. The factory automatically handles the main_pipe field which must reference a key from the generated pipe dict:
# Inside DryRunFactory.make_dry_run_factory():
if "main_pipe" in object_class.model_fields and "pipe" in object_class.model_fields:
class_attrs["main_pipe"] = PostGenerated(cls._main_pipe_from_pipe_dict)
The callback receives all previously generated values and can compute the dependent field:
@staticmethod
def _main_pipe_from_pipe_dict(_field_name: str, values: dict[str, Any]) -> str:
pipe_dict: dict[str, Any] | None = values.get("pipe")
if pipe_dict and len(pipe_dict) > 0:
pipe_keys: list[str] = list(pipe_dict.keys())
return random.choice(pipe_keys)
# Fallback to a mock value if pipe dict is empty/not available
return "mock_" + "".join(random.choices(string.ascii_lowercase, k=4))
Implementation Details
Mock Input Generation
When WorkingMemoryFactory.make_mock_inputs() creates mock working memory:
@classmethod
def make_mock_content(cls, typed_named_stuff_spec: TypedNamedStuffSpec) -> StuffContent:
mock_factory = DryRunFactory.make_dry_run_factory(
object_class=typed_named_stuff_spec.structure_class,
snake_case_field_names=SNAKE_CASE_FIELD_NAMES,
pascal_case_field_names=PASCAL_CASE_FIELD_NAMES,
)
return mock_factory.build(factory_use_construct=True)
The factory_use_construct=True flag bypasses field_validator and model_validator during object creation, preventing validation errors from random nested values.
LLM Output Generation
When ContentGeneratorDry.make_object_direct() generates mock LLM outputs:
object_factory = DryRunFactory.make_dry_run_factory(object_class)
return object_factory.build(factory_use_construct=True)
No explicit field constraints are passed—the factory auto-detects MockFormat from field definitions.
PipeCompose Resolution
StructuredContentComposer.compose() does not generate mocks. Instead, it resolves field values from working memory and validates the result:
async def compose(self) -> StuffContent:
field_values = await self._resolve_all_fields()
try:
return self.output_class.model_validate(field_values)
except ValidationError as exc:
formatted_error = format_pydantic_validation_error(exc)
msg = f"Cannot validate {self.output_class.__name__}: {formatted_error}"
raise StructuredContentComposerValidationError(msg) from exc
In dry run mode, the working memory already contains properly formatted mock values (generated by WorkingMemoryFactory), so validation typically succeeds.
Behavior Matrix
| Scenario | Format Constraints | Validators Bypassed |
|---|---|---|
WorkingMemoryFactory.make_mock_content() |
Yes (from json_schema_extra + explicit params) |
Yes (factory_use_construct) |
ContentGeneratorDry.make_object_direct() |
Yes (from json_schema_extra only) |
Yes (factory_use_construct) |
StructuredContentComposer.compose() |
N/A (uses resolved values) | No (validates) |
Extending Field Constraints
Adding a New MockFormat Value
-
Add to the MockFormat enum in
dry_run_factory.py:class MockFormat(StrEnum): ... KEBAB_CASE = "kebab_case" -
Add a generator method:
@classmethod def generate_kebab_case_code(cls) -> str: suffix = "".join(random.choices(string.ascii_lowercase, k=4)) return f"mock-{suffix}" -
Handle the format in
make_dry_run_factory():for field_name in detected_formats[MockFormat.KEBAB_CASE]: if field_name in object_class.model_fields: class_attrs[field_name] = Use(cls.generate_kebab_case_code)
Using the New Format on Fields
class MySpec(StructuredContent):
my_kebab_field: str = Field(
description="A kebab-case identifier",
json_schema_extra={"mock_format": MockFormat.KEBAB_CASE}
)
Field name matching is exact
The field name must exactly match a key in object_class.model_fields. No glob patterns or inheritance traversal.
File Reference
| File | Purpose |
|---|---|
pipelex/cogt/content_generation/dry_run_factory.py |
DryRunFactory class with MockFormat enum and generators |
pipelex/cogt/content_generation/content_generator_dry.py |
ContentGeneratorDry for mock LLM/image outputs |
pipelex/core/memory/working_memory_factory.py |
WorkingMemoryFactory.make_mock_content() with field constraints |
pipelex/pipe_operators/compose/structured_content_composer.py |
Composes StructuredContent from working memory (no mocks) |
pipelex/pipe_run/dry_run.py |
dry_run_pipe() orchestration |
Next Steps
- Architecture Overview — Understand the two-layer design
- Test Profile Configuration — Configure model sets for testing