Example: Extract Slides from Presentation
This example extracts structured slide information from a presentation document — including titles, content as markdown, and visual descriptions of each slide layout.
Get the code
What it demonstrates
- Custom
Slideconcept with inline structure (title, text, description) - Vision-based slide analysis using
PipeLLMwith page views - Batching over extracted pages to process each slide individually
PipeComposefor assembling structured output into formatted text
The Method: bundle.mthds
Slide concept
[concept.Slide]
description = "A slide from a presentation"
[concept.Slide.structure]
title = { type = "text", description = "The title of the slide" }
text_markdown = { type = "text", description = "The content of the slide as markdown" }
description = { type = "text", description = "A description of the slide: layout of the text, description of the graphics" }
Pipeline
The pipeline extracts pages with views, describes each slide using a vision model, then concatenates the descriptions:
[pipe.extract_slides]
type = "PipeSequence"
description = "Extract markdown from a document"
inputs = { document = "Document" }
output = "Text"
steps = [
{ pipe = "extract_markdown_and_views_from_document", result = "pages" },
{ pipe = "describe_slide", batch_over = "pages", batch_as = "page", result = "slides" },
{ pipe = "concatenate_slide_descriptions", result = "slides_description" },
]
How to run
pipelex run bundle examples/b_basics/document_extract/extract_slides/bundle.mthds \
-i examples/b_basics/document_extract/extract_slides/inputs.json
Related Documentation
- PipeLLM Operator - The core operator for LLM interactions
- PipeExtract Operator - Extract text and images from documents
- PipeCompose Operator - Template-based data composition