Skip to content

Pipelex Documentation

DPE Extraction Example

Pipelex on GitHub

Pipelex Documentation

Pipelex on GitHub

Get Started
Get Started
- Build & Run AI Methods
- Build with Claude Code
- The MTHDS Language Tutorial
- Configure AI Providers
- Cookbook Examples
  Cookbook Examples
  - Overview
  - Quick Start
    Quick Start
    
    Hello World
    
    Summarize
  - Document Extraction
    Document Extraction
    
    Generic Document
    
    Invoice
    
    Proof of Purchase
    
    DPE (Energy Diagnostic) DPE (Energy Diagnostic)
    Table of contents
    
    Get the code
    
    What it demonstrates
    
    The Method: bundle.mthds
    
    DPE concept
    
    Pipeline
    
    How to run
    
    Related Documentation
    
    Table from Image
    
    Gantt Chart
    
    Markdown from Document
    
    Slides from Presentation
  - Visual Generation
    Visual Generation
    
    Design Slides
    
    Generate Image
  - Advanced
    Advanced
    
    Synthetic Data
    
    Expense Data with Receipts
    
    Using Inference Plugins
  - Work in Progress
    Work in Progress
    
    Tweet Optimizer
    
    Blog Article Generator
    
    Advisory Board
    
    Discord Newsletter
    
    Expense Validation
- Viewpoint
  Viewpoint
  - The Know-How Graph
Guide
Guide
- Features
  Features
  - Overview
  - Declarative AI Methods
    Declarative AI Methods
    
    MTHDS Language
    
    Concepts & Structured Types
    
    Pipe Operators
    
    Pipeline Orchestration
  - AI Capabilities
    AI Capabilities
    
    Pipelex Gateway & Model Access
    
    LLM Integration
    
    Document Extraction
    
    Image Generation
    
    Web Search
  - Developer Tools
    Developer Tools
    
    Claude Code Skills Plugin
    
    CLI
    
    plxt Formatter & Linter
    
    Execution Graph Visualization
  - Production & Operations
    Production & Operations
    
    Validation & Dry Run
    
    Telemetry & Observability
    
    Cloud Storage
    
    Cost Tracking & Reporting
  - Configuration & Extensibility
    Configuration & Extensibility
    
    Configuration System
    
    Advanced Customizations
- Build Reliable AI Methods
  Build Reliable AI Methods
  - Kick off a Method Project
  - Pipelex Bundle Specification
  - Domain
  - Concepts
    Concepts
    
    Define Your Concepts
    
    Inline structures
    
    Python classes
    
    Native Concepts
    
    Refining Concepts
  - Design and Run Methods
    Design and Run Methods
    
    Overview
    
    Libraries
    
    Packages
    
    Executing Pipelines
    
    Providing Inputs to Pipelines
    
    Working Memory
    
    Pipe Output
    
    Understanding Multiplicity
    
    Pipe Operators
    Pipe Operators
    
    Overview
    
    PipeLLM
    
    PipeExtract
    
    PipeImgGen
    
    PipeSearch
    
    PipeCompose
    
    PipeFunc
    
    Pipe Controllers
    Pipe Controllers
    
    Overview
    
    PipeSequence
    
    PipeParallel
    
    PipeBatch
    
    PipeCondition
  - Optimize Cost & Quality
  - LLM Structured Generation
  - LLM Prompting Style
Reference
Reference
- CLI Reference
  CLI Reference
  - Overview
  - Init
  - Validate
  - Run
  - Show
  - Pkg
  - Build
    Build
    
    Overview
    
    Runner
    
    Structures
    
    Inputs
    
    Output
  - Agent CLI
- Configuration (TOML reference)
  Configuration (TOML reference)
  - Overview
  - Pipeline Validation Configuration
    Pipeline Validation Configuration
    
    Dry Run
  - Practical Configuration
    Practical Configuration
    
    Logging
    
    Pipe Run
    
    Reporting
    
    Telemetry
  - Technical Configuration
    Technical Configuration
    
    AWS
    
    Cogt
    
    LLM Providers & Models
    
    Library
    
    Feature
- Tools
  Tools
  - plxt (Formatter & Linter)
  - Logging
- Analytics
  Analytics
  - Observer Data Extraction
- Telemetry
- Gateway Available Models
Advanced
Advanced
- Advanced Customizations
  Advanced Customizations
- Under the Hood
  Under the Hood
Project
Project
API

Example: DPE Extraction

This example extracts information from French "Diagnostic de Performance Energetique" (DPE) documents. It uses a three-step pipeline: extract pages, convert each to markdown using vision, then conclude the DPE details from the combined markdown.

Get the code

What it demonstrates

Custom structured concept with constrained choices (energy efficiency classes A-G)
Three-step extraction: pages, markdown per page, then conclude
Vision-based extraction focused on specific document elements (energy classes, graphs, tables)
Using shared method packages for page extraction

The Method: `bundle.mthds`

DPE concept

[concept.Dpe]
description = "A diagnostic of the energy performance of a building"

[concept.Dpe.structure]
address                       = { type = "text", description = "The address of the building" }
date_of_issue                 = { type = "date", description = "The date the DPE was issued" }
date_of_expiration            = { type = "date", description = "The expiration date of the DPE" }
energy_efficiency_class       = { type = "text", description = "The energy efficiency class",
                                  choices = ["A", "B", "C", "D", "E", "F", "G"] }
per_year_per_m2_consumption   = { type = "number", description = "Energy consumption per year per m2" }
co2_emission_class            = { type = "text", description = "The CO2 emission class",
                                  choices = ["A", "B", "C", "D", "E", "F", "G"] }
per_year_per_m2_co2_emissions = { type = "number", description = "CO2 emissions per year per m2" }
yearly_energy_costs           = { type = "number", description = "Yearly energy costs" }

Pipeline

[pipe.power_extractor_dpe]
type = "PipeSequence"
inputs = { document = "Document" }
output = "Dpe"
steps = [
  { pipe = "github.com/Pipelex/methods/documents->documents.extract_page_contents_and_views",
    result = "page_contents" },
  { pipe = "write_markdown_from_page_content_dpe",
    batch_over = "page_contents", batch_as = "page_content",
    result = "dpe_pages" },
  { pipe = "conclude_dpe", result = "dpe" },
]

The final conclude_dpe step takes all the markdown pages and produces a single structured Dpe object.

How to run

pipelex run bundle examples/b_basics/document_extract/extract_dpe/bundle.mthds \
  -i examples/b_basics/document_extract/extract_dpe/inputs.json

PipeLLM Operator - The core operator for LLM interactions
Document Extraction - Overview of document extraction capabilities