Skip to content

Example: Gantt Chart Extraction

This example extracts structured information from Gantt chart images: timescale, task names, task details (dates, dependencies), and milestones. It uses a "divide and conquer" approach where each aspect is extracted separately.

Get the code

GitHub

What it demonstrates

  • Divide-and-conquer extraction: timescale, task names, task details, then assemble
  • Batching over task names to extract details for each independently
  • Vision-based extraction using $vision-diagram model
  • Rich structured output (GanttChart with GanttTaskDetails and Milestone lists)
  • Two alternative approaches: step-by-step vs. direct one-shot extraction

The Method: bundle.mthds

Pipeline

[pipe.extract_gantt_by_steps]
type = "PipeSequence"
description = "Extract all details from a gantt chart"
inputs = { gantt_chart_image = "GanttChartImage" }
output = "GanttChart"
steps = [
    { pipe = "extract_gantt_timescale", result = "gantt_timescale" },
    { pipe = "extract_gantt_task_names", result = "gantt_task_names" },
    { pipe = "extract_details_of_task", batch_as = "gantt_task_name",
      result = "details_of_all_tasks" },
    { pipe = "gather_in_a_gantt_chart", result = "gantt_chart" },
]

Each task's details are extracted individually using the chart image, the timescale context, and the specific task name — making each extraction focused and accurate.

The bundle also includes a transcript_gantt_direct pipe for one-shot extraction to markdown, useful for simpler charts.

How to run

pipelex run bundle examples/b_basics/document_extract/extract_gantt/bundle.mthds \
  -i examples/b_basics/document_extract/extract_gantt/inputs.json