Inference Backend Configuration
The Inference Backend Configuration System manages how Pipelex handles AI model providers, model routing, and inference settings across LLMs, OCR, and image generation. This unified system provides a flexible and scalable way to configure multiple inference backends and route different types of AI models to the appropriate providers.
Configuration Approaches
Pipelex supports three flexible approaches for accessing AI models:
Option A: Pipelex Inference (Optional & Free)
Get a single API key that works with all major providers (OpenAI, Anthropic, Google, Mistral, FAL, and more). This is the recommended approach for getting started quickly.
- ✅ Single API key for all providers
- ✅ Simplified configuration
- ✅ Automatic model routing
- ✅ Free on Discord (limited time offer)
Option B: Bring Your Own Keys
Use your own API keys from individual providers for full control and direct billing. Ideal for production deployments with existing provider relationships.
- ✅ Direct provider relationships
- ✅ Full control over billing
- ✅ No intermediary
- ✅ Support for all provider-specific features
See Inference Backends section below for configuration.
Option C: Mix & Match (Custom Routing)
Configure custom routing profiles to use your own keys for some models and Pipelex Inference for others. This gives you full flexibility to optimize for cost, performance, or rate limits.
- ✅ Hybrid approach
- ✅ Cost optimization
- ✅ Performance tuning
- ✅ Gradual migration between approaches
See Routing Profiles section below for setup.
Overview
The inference backend system is built around four key concepts:
- Inference Backends: Providers of AI services (OpenAI, Anthropic, Google Vertex AI, FAL, etc.) for LLMs, OCR, and image generation
- Model Specs: Detailed information about specific models available through backends (text generation, text extraction, image generation)
- Routing Profiles: Rules for selecting which backend should handle specific models across all AI capabilities
- Model Deck: Unified collection of configured models, aliases, and presets for LLMs, OCR, and image generation
Directory Structure
All inference backend configurations are stored in the .pipelex/inference/ directory:
.pipelex/
└── inference/
├── backends.toml # Backend provider configurations
├── routing_profiles.toml # Model routing rules
├── backends/ # Individual backend model specifications
│ ├── openai.toml # OpenAI models (LLMs, image generation)
│ ├── anthropic.toml # Anthropic models (LLMs)
│ ├── bedrock.toml # Amazon Bedrock models (LLMs)
│ ├── mistral.toml # Mistral models (LLMs, OCR)
│ ├── vertexai.toml # Google Vertex AI models (LLMs)
│ ├── fal.toml # FAL models (image generation)
│ ├── internal.toml # Internal/local models (OCR)
│ └── ...
└── deck/ # Model deck configurations
├── base_deck.toml # Core aliases and presets
└── overrides.toml # Custom overrides
Pipelex Inference (Optional & Free)
Pipelex Inference is a unified inference backend that provides access to all major AI providers through a single API key. This is the recommended approach for getting started quickly with Pipelex, and it's completely optional.
Benefits
- Single API Key: Access OpenAI, Anthropic, Google, Mistral, FAL, and more with one key
- Free to Get Started: Available free on Discord (no credit card required, limited time offer)
- Simplified Configuration: No need to manage multiple provider credentials
- Automatic Routing: All AI models (LLMs, OCR, image generation) are automatically routed to their respective providers
- Unified Interface: Same configuration system for text generation, OCR, and image generation
Setup
-
Get your API key: - Visit https://go.pipelex.com/discord to join our Discord - Request your free API key in the appropriate channel - No credit card required (limited time offer)
-
Configure environment variables:
# Copy the example environment file cp .env.example .env # Edit .env and add your Pipelex Inference API key # PIPELEX_INFERENCE_API_KEY="your-api-key" -
Verify backend configuration:
The pipelex_inference backend should already be enabled in .pipelex/inference/backends.toml:
[pipelex_inference]
enabled = true
endpoint = "https://inference.pipelex.com/v1"
api_key = "${PIPELEX_INFERENCE_API_KEY}"
The environment variable ${PIPELEX_INFERENCE_API_KEY} will be automatically loaded from your .env file.
- Verify routing configuration:
The default routing profile in .pipelex/inference/routing_profiles.toml should be set to pipelex_first:
active = "pipelex_first"
[profiles.pipelex_first]
description = "Use Pipelex Inference backend for all its supported models"
default = "pipelex_inference"
Usage
Once configured, all models are available through the unified backend. Use standard model names in your pipelines:
[pipe.example]
type = "PipeLLM"
model = { model = "claude-4.5-sonnet", temperature = 0.7 }
# Model automatically routed through Pipelex Inference
Model Availability Note
While Pipelex Inference provides access to most AI models through a unified API, certain specialized models require their native backend to be enabled directly:
- FAL image generation models (e.g., Flux models) - Enable the FAL backend
- OpenAI image generation (
gpt-image-1) - Enable the OpenAI backend (should also work via Azure OpenAI, but we haven't been able to test this - if you've successfully used it on Azure, please let us know on Discord so we can validate this configuration) - Mistral OCR models - Enable the Mistral backend
These models are not proxied through Pipelex Inference and require direct configuration of their respective backends with appropriate API keys.
Inference Backends
Backends represent AI service providers that can offer LLMs, OCR models, or image generation models. Each backend is configured with its endpoint and authentication details.
Backend Configuration
Step 1: Configure Environment Variables
First, set up your API keys in the .env file:
# Copy the example file
cp .env.example .env
# Edit .env and add your provider API keys
Note: Pipelex automatically loads environment variables from
.envfiles using python-dotenv. No need to manually source or export them.
The .env.example file contains all available providers with helpful comments:
# [OPTIONAL] Free Pipelex Inference API key - Get yours on Discord: https://go.pipelex.com/discord
PIPELEX_INFERENCE_API_KEY=
OPENAI_API_KEY=
# Amazon Bedrock - For accessing models via Amazon Bedrock
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=
ANTHROPIC_API_KEY=
MISTRAL_API_KEY=
# Google AI Studio - Use GOOGLE_API_KEY for direct API access (simpler, rate-limited)
GOOGLE_API_KEY=
# Google Cloud Platform (GCP) - Use these for production Vertex AI access
# Choose GOOGLE_API_KEY OR GCP credentials, not both
GCP_PROJECT_ID=
GCP_LOCATION=
GCP_CREDENTIALS_FILE_PATH=gcp_credentials.json
FAL_API_KEY=
# ... (see .env.example for full list)
Step 2: Enable/Disable Backends
Configure which backends to use in .pipelex/inference/backends.toml:
[openai]
enabled = true # Set to false to disable
api_key = "${OPENAI_API_KEY}"
[anthropic]
enabled = true
api_key = "${ANTHROPIC_API_KEY}"
[mistral]
enabled = true
api_key = "${MISTRAL_API_KEY}"
[fal]
enabled = true
api_key = "${FAL_API_KEY}"
[internal]
enabled = true
# No API key needed for internal/local processing
The ${VARIABLE_NAME} syntax automatically loads values from your .env file. Set enabled = true to activate a backend, or false to disable it.
Model Specifications
Each backend has its own model specification file in .pipelex/inference/backends/:
# openai.toml
default_sdk = "openai"
default_prompting_target = "openai"
[gpt-4o-mini]
model_id = "gpt-4o-mini"
inputs = ["text", "images"]
outputs = ["text", "structured"]
costs = { input = 0.15, output = 0.6 }
[gpt-4-turbo]
model_id = "gpt-4-turbo"
inputs = ["text"]
outputs = ["text", "structured"]
costs = { input = 10.0, output = 30.0 }
[gpt-image-1]
model_id = "gpt-image-1"
inputs = ["text"]
outputs = ["image"]
costs = { input = 0.04, output = 0.0 }
Routing Profiles
Routing profiles determine which backend handles specific models. This is where you configure the Mix & Match approach (Option C) to optimize your setup. Configure them in .pipelex/inference/routing_profiles.toml:
Profile Examples
All Pipelex Inference (Option A):
Setup:
# In .env
PIPELEX_INFERENCE_API_KEY="your-pipelex-key"
In .pipelex/inference/routing_profiles.toml:
# Which profile to use
active = "pipelex_first"
[profiles.pipelex_first]
description = "Use Pipelex Inference backend for all its supported models"
default = "pipelex_inference"
Native Providers Only (Option B):
Setup:
# In .env - add all provider keys you need
OPENAI_API_KEY="your-openai-key"
ANTHROPIC_API_KEY="your-anthropic-key"
GOOGLE_API_KEY="your-google-key"
FAL_API_KEY="your-fal-key"
In .pipelex/inference/routing_profiles.toml:
active = "custom_routing"
[profiles.custom_routing]
description = "Route models to their native providers"
default = "openai"
[profiles.custom_routing.routes]
"claude-*" = "anthropic"
"gemini-*" = "google"
"mistral-*" = "mistral"
"gpt-*" = "openai"
"gpt-image-*" = "openai"
"flux-*" = "fal"
Mix & Match (Option C):
Setup:
# In .env - combine Pipelex with specific provider keys
PIPELEX_INFERENCE_API_KEY="your-pipelex-key"
OPENAI_API_KEY="your-openai-key" # For GPT models
FAL_API_KEY="your-fal-key" # For image generation
In .pipelex/inference/routing_profiles.toml:
active = "hybrid"
[profiles.hybrid]
description = "Use Pipelex for most models, native providers for specific ones"
default = "pipelex_inference"
[profiles.hybrid.routes]
# Use your own OpenAI key for GPT models (better rate limits)
"gpt-*" = "openai"
# Use your own FAL key for image generation (direct billing)
"flux-*" = "fal"
# All other models use Pipelex Inference (claude, gemini, mistral, etc.)
Routing System Features
The routing system supports:
- Exact matches:
"gpt-4o-mini" = "openai" - Wildcard patterns:
- Prefix:
"gpt-*" = "openai" - Suffix:
"*-turbo" = "openai" - Contains:
"*-vision-*" = "openai" - Default fallback:
default = "pipelex_inference"
Use Cases for Mix & Match
Common scenarios for hybrid routing:
- Cost Optimization: Use Pipelex Inference for expensive models, your own keys for cheaper ones
- Rate Limits: Use your own keys for high-volume models to avoid shared rate limits
- Gradual Migration: Start with Pipelex Inference, gradually move to your own keys as usage grows
- Provider Features: Use native providers for models requiring specific features not proxied through Pipelex Inference
Model Deck
The Model Deck is the unified configuration hub for all AI model-related settings, including LLMs, OCR models, and image generation models.
Aliases
Define user-friendly names that map to model names in .pipelex/inference/deck/base_deck.toml:
[aliases]
# LLM aliases
base-claude = "claude-4.5-sonnet"
base-gpt = "gpt-5"
base-gemini = "gemini-2.5-flash"
base-mistral = "mistral-medium"
smart_llm = [
"claude-4.5-sonnet",
"claude-4.1-opus",
"claude-4.5-sonnet",
"gpt-5",
"gemini-2.5-pro",
]
# Aliases can also define fallback chains
llm_to_engineer = { model = "smart_llm", temperature = 0.2 }
LLM Presets
Presets combine model selection with optimized parameters for specific tasks:
[llm.presets]
# General purpose presets
cheap_llm_for_text = { model = "cheap_llm_for_text", temperature = 0.5 }
cheap_llm_for_object = { model = "cheap_llm_for_object", temperature = 0.5 }
# Task-specific presets
llm_for_creative_writing = { model = "claude-4.5-sonnet", temperature = 0.9 }
llm_to_extract_invoice = { model = "claude-4.5-sonnet", temperature = 0.1 }
llm_for_complex_reasoning = { model = "base-claude", temperature = 1 }
### OCR Presets
OCR presets combine OCR model selection with optimized parameters:
```toml
[extract.presets]
# General purpose OCR
extract_text_from_visuals = { ocr_handle = "mistral-ocr", max_nb_images = 100, image_min_size = 50 }
extract_text_from_pdf = { model = "pypdfium2-extract-text", max_nb_images = 100, image_min_size = 50 }
Image Generation Presets
Image generation presets combine model selection with generation parameters:
[img_gen.presets]
# General purpose image generation
gen_image_basic = { model = "base-img-gen", quality = "medium", guidance_scale = 7.5, is_moderated = true, safety_tolerance = 3 }
gen_image_fast = { model = "fast-img-gen", nb_steps = 4, guidance_scale = 5.0, is_moderated = true, safety_tolerance = 3 }
gen_image_high_quality = { model = "best-img-gen", quality = "high", guidance_scale = 8.0, is_moderated = true, safety_tolerance = 3 }
Default Choices
Set default models for different types of AI operations:
[llm.choice_defaults]
for_text = "cheap_llm_for_text"
for_object = "cheap_llm_for_object"
[extract]
choice_default = "extract_text_from_visuals"
[img_gen]
choice_default = "gen_image_basic"
Customization
Local Overrides
Use .pipelex/inference/deck/overrides.toml for project-specific customizations:
# Override specific presets
[llm.presets]
llm_to_extract_invoice = { model = "gpt-4o-mini", temperature = 0.2 }
[extract.presets]
my_custom_extract = { ocr_handle = "mistral-ocr", max_nb_images = 5 }
[img_gen.presets]
my_custom_img_gen = { model = "flux-dev", quality = "medium" }
# Add custom aliases
[aliases]
my_custom_llm = "claude-3-sonnet"
my_custom_extract = "pypdfium2-extract-text"
my_custom_img_gen = "base-img-gen"
Adding New Backends
To add a new backend:
- Add backend configuration to
backends.toml - Create model specification file in
backends/directory - Update routing profile if needed
Loading Process
The system loads configurations in this order:
- Load Backends: Read
backends.tomlto get enabled backends - Load Model Specs: For each backend, load model specifications (LLMs, OCR models, image generation models)
- Load Routing Profiles: Read routing rules and identify active profile
- Build Model Deck: - Apply routing rules to determine backend for each model across all AI capabilities - Load aliases and presets from deck files for LLMs, OCR, and image generation - Apply overrides
- Finalization: Validate complete configuration
Error Handling
Common error types:
ModelDeckNotFoundError: Missing LLM deck configuration filesModelsManagerError: Issues with model managementLLMHandleNotFoundError: Referenced model or alias not foundLLMPresetNotFoundError: Referenced preset not found
Best Practices
-
Choosing Your Configuration Approach: - Starting out? Use Pipelex Inference (Option A) to get running quickly - Production deployment? Consider bringing your own keys (Option B) for direct billing control - Optimizing costs/performance? Use Mix & Match (Option C) for maximum flexibility - You can switch between approaches at any time by changing your routing profile
-
Backend Management: - Keep API keys in environment variables (never commit them) - Enable only the backends you need to reduce configuration complexity - Document custom backend configurations for your team
-
Model Routing: - Use specific routing profiles for different environments (dev, staging, prod) - Test routing rules before production deployment - Consider cost implications when routing models (some providers are cheaper for certain models) - Monitor usage patterns to optimize your routing strategy
-
Presets and Aliases: - Create task-specific presets for consistency across your pipelines - Use meaningful alias names that describe the use case (e.g.,
llm_to_extract_invoice) - Document custom presets and their use cases in your team documentation -
Customization: - Use
overrides.tomlfor project-specific settings - Keep base configurations unchanged to make upgrades easier - Version control your custom configurations - Share routing profiles and presets across your team