Kicking off a Knowledge Pipeline Project
Creating Your First Pipeline
A pipeline in Pipelex is a collection of related concepts and pipes. Start by creating a TOML file in the pipelines
directory:
# pipelex_libraries/pipelines/tutorial.toml
domain = "tutorial"
definition = "My first Pipelex library"
system_prompt = "You are a helpful assistant."
[concept]
Question = "A question that needs to be answered"
Answer = "A response to a question"
[pipe]
[pipe.answer_question]
PipeLLM = "Answer a question"
inputs = { question = "tutorial.Question" }
output = "tutorial.Answer"
prompt_template = """
Please answer the following question:
@question
Provide a clear and concise answer.
"""
This creates a simple Q&A pipeline with: - A domain called "tutorial" - Two concepts: Question and Answer - One pipe that transforms a Question into an Answer
The domain
property is the most important part of your pipeline file. It groups all your concepts and pipes into a single, easy to read bundle.
What Are Domains?
A domain in Pipelex represents a topic or area of functionality within your pipeline. Every pipeline file must specify its domain, which helps organize and categorize your pipes and concepts.
Domain in Practice
When you create a pipeline file (.toml
), you always start by declaring its domain:
domain = "finance" # The domain name for this file
definition = "Financial document processing" # Optional description
system_prompt = "You are an expert financial analyst." # Optional system prompt for all PipeLLM in this domain
A domain consists of:
-
Pipeline File (
.toml
)domain = "finance" [concept] Invoice = "A commercial document for a sale of products or services" InvoiceSummary = "A summary of an invoice with key details" [pipe] [pipe.summarize_invoice] PipeLLM = "Summarize an invoice to extract key information" inputs = { invoice = "finance.Invoice" } output = "finance.InvoiceSummary"
-
Python Models (
.py
)from pipelex.core.stuff_content import StructuredContent from pydantic import Field from typing import List from datetime import date class Invoice(StructuredContent): invoice_number: str vendor: str customer: str total_amount: float = Field(ge=0) issue_date: date line_items: List[str] class InvoiceSummary(StructuredContent): vendor: str total_amount: float is_overdue: bool
Best Practices
-
Naming
- Use clear, descriptive domain names.
- Keep names lowercase and simple.
- Use names that reflect the purpose (e.g., "finance", "legal", "content_creation").
-
Organization
- One domain per topic/functionality.
- Match Python file names with domain names (
finance.toml
->finance.py
). - Keep related concepts within the same domain.
-
Documentation
- Always add a description to your domain.
- Document concepts clearly.
- Include examples where helpful.
Using Domains
When using a domain in your code, you refer to concepts with domain.ConceptName
:
from pipelex.core.stuff_factory import StuffFactory
# The concept_code combines domain and concept names
invoice_stuff = StuffFactory.make_stuff(
concept_str="finance.Invoice", # domain.ConceptName
name="invoice_123",
content=invoice_data # dictionary or Invoice object
)
File Naming Conventions
Consistent naming makes your pipeline code discoverable and maintainable:
TOML Files
- Use lowercase with underscores:
legal_contracts.toml
,customer_service.toml
- Match the domain name when possible: domain "legal" →
legal.toml
- For multi-word domains, use underscores: domain "customer_service" →
customer_service.toml
Python Model Files
- It is recommended to match the TOML filename exactly:
legal.toml
→legal.py
- But in any case, Pipelex will load models from all python modules in the
pipelines
directory or its subdirectories.
Project Structure
Every Pipelex project follows a simple directory structure that keeps your knowledge pipelines organized and maintainable:
your-project/
├── pipelex_libraries/ # All your pipeline code lives here
│ ├── pipelines/ # Pipeline definitions and models
│ │ ├── __init__.py
│ │ ├── characters.toml # Domain definitions
│ │ └── characters.py # Python models for concepts
│ ├── templates/ # Reusable prompt templates
│ ├── llm_integrations/ # LLM provider configurations
│ └── llm_deck/ # LLM model presets
├── main.py # Your application code
└── requirements.txt # Python dependencies
The pipelex_libraries
directory is where Pipelex looks for your pipeline definitions. This standardized structure means you can share libraries between projects, version control them separately, and maintain clean separation between your pipeline logic and application code.