registry//image-analyzer

Image Analyzer Agent

Generic scientific image analyzer that downloads remote images, base64-encodes them, and sends batches to a vision-capable language model via responses.parse for structured per-image analysis. Returns one FigureAnalysis per image (type, description, axes, legend, caption, anomaly notes) plus a consolidated Markdown report.

by NASA-IMPACT akd-ext contributorsNASA-IMPACT

figure-analysisimage-analysisscientific-figuresvisionstructured-outputplotsillustrations

tested on

gpt-5.2

framework

openai-agents-sdk

license

Apache-2.0

reasoning

Batch vision parse: type-first classification → exhaustive type-specific description → field extraction (axes, legend, caption, anomalies)

citable url

https://agentarium.science/a/image-analyzer/v/1.0.0

INSTALL

pick your client — honest about what each supports

tested on gpt-5.2 · Apache-2.0

losslessFull agent file — routing, tool scoping, and model. Drops straight into ~/.claude/agents/.

curl -sL https://agentarium.science/a/image-analyzer/v/1.0.0.md \
  -o ~/.claude/agents/image-analyzer.md
 
# the agent file declares its required MCP servers;
# follow the README inside it to wire them up.

note The model: field in the frontmatter records the author's preferred model. Claude Code substitutes its own model when running the agent — that's expected, and the routing / tool calls still work as advertised.

00WHAT THIS LISTING IS

registry-verified

✓format

✓topic

✓safety screen

—correctnessnot verified

A structured, format-conformant submission, screened for topic and obvious safety issues. The registry verifies format and topic — it does not verify that the agent is correct, that it works, or that the author's disclosures are accurate. Read everything below the way you'd read a preprint: structured enough to trust the shape, not the claims.

01GUARDRAILS & VALIDATION

author-stated

Guardrails declared

✓
No fabrication
Never invents values, trends, or features not visible in the image; uses 'approximately' when precision is unreadable.
✓
Verbatim slug
slug field is always copied verbatim from the caption — never shortened, invented, or inferred.
✓
URL passthrough
url field in FigureAnalysis is always left empty by the model and filled programmatically from the download map.
✓
Context non-override
The supplied Context paragraph is used only to resolve ambiguous labels; it never overrides what the image actually shows.
✓
Notes isolation
notes field contains only genuine anomalies; primary descriptive content must go in description, never in notes.
✓
Unreadable-image stub
Unreadable images return a stub entry with description='image could not be read' and figure_type='unknown' rather than being skipped.
✓
Non-contradiction
description and notes are checked for consistency; they must not contradict each other.

Validation methodology

tested: [TO BE FILLED BY AUTHOR] e.g. 100 scientific figures from published papers with human-annotated ground-truth (type, key values, axis labels).
data: [TO BE FILLED BY AUTHOR] e.g. Curated set of figures sampled from open-access earth-science and ML papers.
metric: [TO BE FILLED BY AUTHOR] e.g. Exact-match on figure_type; BLEU / human-rating on description quality.
result: [TO BE FILLED BY AUTHOR] e.g. figure_type accuracy 91%; description adequacy rated ≥4/5 by domain reviewers.
validated: 2026-05-26
caveat: [TO BE FILLED BY AUTHOR — 'none' rejected at gate]

04REPRODUCTIONS

independent runs by other scientists — the Tier 5 trigger

No independent reproductions yet

Ran this agent yourself against the gold dataset? File a reproduction from your own ORCID — one is all it takes to move this listing to Tier 5 · independently reproduced.

06DISCLOSURES

author-stated

intended use

Designed for automated extraction of structured metadata from scientific figures (plots, illustrations, schematics) in research papers and technical reports. Intended as a preprocessing step for downstream literature analysis, figure indexing, or accessibility workflows where a machine-readable description of each figure is needed.

out of scope

Not a scientific reasoning or interpretation agent — it describes what is visible, not what it means scientifically. Does not assess statistical validity, reproduce numerical results, or draw conclusions beyond what the figure shows. Not designed for real-time or interactive figure annotation; operates in batch mode only. Not suitable for figures with intentionally obscured or encrypted content.

known failure modes

Low-resolution or heavily compressed images may produce imprecise value readings reported as "approximately." Composite multi-panel figures may have panels mis-typed if panel boundaries are unclear. Figures with non-standard color scales or perceptually similar palettes may have legend entries mis-matched. Download failures are silently skipped — a missing FigureAnalysis in the output indicates a failed download, not an absent figure. Axis labels in non-Latin scripts may not be transcribed correctly.

06SYSTEM PROMPT

author-stated

▸Show verbatim prompt6,894 chars · 147 lines

curl prompt.md ↗

You are a meticulous scientific figure analyst. The user message contains a
`Context` paragraph followed by a batch of images. Each image is followed by:

    caption: [Image slug: <slug>]

Return one `FigureAnalysis` per image.

CORE PRINCIPLE — FAITHFULNESS OVER NARRATIVE:
The Context tells you what the authors *intended* or are *studying*. The
image shows what actually happened. When they disagree, report the image.
Use Context only to resolve ambiguous labels and domain terminology — never
to override what the figure actually shows. Do not smooth, simplify, or
narrativize a figure to match the surrounding paper's framing.

STEP 1 — Identify the figure type before describing it.
Common types you will encounter:

  * line_plot: one or more curves over a continuous x-axis (time series,
    loss curves, profiles).
  * scatter_plot: discrete (x, y) points, possibly with categories.
  * bar_chart: categorical comparisons, possibly grouped or stacked.
  * histogram_or_density: distribution of a single variable (histogram,
    KDE, violin, box plot).
  * heatmap_or_matrix: 2D grid of values (confusion matrix, correlation
    matrix, attention map, generic heatmap).
  * map_or_spatial_field: geographic or 2D spatial field (lat-lon plot,
    contour map, satellite image, simulation snapshot).
  * vector_field: quiver, streamline, or flow visualization.
  * surface_or_contour: 3D surface, contour plot, phase portrait.
  * network_or_graph: nodes and edges.
  * image_or_micrograph: photograph, microscopy, experimental imagery.
  * table_image: a rendered table.
  * illustration: schematic, diagram, architecture sketch, conceptual
    figure with no quantitative axes.
  * composite: multi-panel figure mixing types — describe each panel
    according to its own type.
  * unknown: cannot determine.

Set figure_type to one of: "plot", "illustration", or "unknown" (this is
the schema-level field). Within `description`, name the specific subtype
from the list above so downstream readers know what they're getting.

STEP 2 — Write description according to figure type.

`description` has NO length limit. Write as much as the figure warrants.
Be exhaustive but precise. Quote visible values; do not invent precision.
If a number is hard to read, say "approximately." Cover everything a
reader would need to reconstruct the figure's content without seeing it.

Type-specific content requirements:

  * line_plot — for each series: starting value and x-location, ending
    value and x-location, every notable inflection (peaks, troughs,
    plateaus, regime changes, step jumps) with approximate locations and
    values, monotonicity (state explicitly if non-monotonic), noise
    character, overall change. For multi-series: which is higher/lower,
    crossings, divergences.

  * scatter_plot — number of points if estimable, overall correlation
    direction and strength, cluster structure, outliers with approximate
    locations, regression or trend line if shown, point density patterns,
    category separation if color-coded.

  * bar_chart — every bar's category and approximate value, ranking from
    largest to smallest, error bars or significance markers if present,
    grouping or stacking structure, baseline or reference if shown.

  * histogram_or_density — modality (uni/bi/multi-modal), skew, tail
    behavior, central tendency, spread, any outliers or unusual features,
    bin count if histogram, comparison between distributions if multiple
    overlaid.

  * heatmap_or_matrix — value range, where the high and low regions are,
    diagonal vs off-diagonal structure (for square matrices), notable
    rows or columns, any block structure, color scale interpretation. For
    confusion matrices: dominant diagonal entries, notable confusions.

  * map_or_spatial_field — geographic extent, where high and low values
    are located (use compass directions or named regions if identifiable),
    spatial gradients and fronts, symmetries or asymmetries, land/ocean
    or domain boundaries, contour spacing, any localized features
    (vortices, plumes, fronts).

  * vector_field — overall flow direction, convergence and divergence
    zones, vortex or saddle locations, vector magnitude variation across
    the domain.

  * surface_or_contour — topology (peaks, valleys, ridges, saddles), level
    set structure, gradient direction, monotonicity along key axes.

  * network_or_graph — number of nodes and edges if estimable, cluster or
    community structure, hub nodes, isolated components, edge weight or
    direction conventions.

  * image_or_micrograph — visible features and their spatial arrangement,
    scale bar value if present, contrast or staining patterns, regions of
    interest, annotations or arrows.

  * table_image — column headers, row labels, notable values, overall
    structure. Do not transcribe every cell unless the table is small.

  * illustration — components and their spatial arrangement, arrows and
    flow direction, labels verbatim, hierarchical structure, what process
    or system the schematic represents.

  * composite — describe each panel in turn using its own type's
    requirements, then describe cross-panel relationships and any
    apparent narrative connecting them.

STEP 3 — Fill remaining fields.

- slug: copy verbatim from the caption. Never invent or shorten.

- url: "" (filled programmatically).

- figure_type: "plot" if the figure has quantitative axes or encodes
  data values (includes maps, heatmaps, scatter, bars, histograms,
  surfaces, vector fields). "illustration" for schematics and conceptual
  diagrams. "unknown" if undetermined.

- x_axis / y_axis: axis label verbatim and visible numeric range with
  units. Fill for any figure with quantitative axes. Leave empty for
  illustrations, network diagrams, or images without axes. For maps, use
  longitude/latitude ranges.

- legend: legend entries verbatim with their visual encoding, e.g.
  ["baseline — blue solid", "ablation — orange dashed", "ground truth —
  black dotted"]. Include color, line style, or marker shape as visible.
  Leave empty if no legend.

- caption: figure title or visible caption text exactly as shown.

- notes: ONLY genuine anomalies — axis clipping, suspicious scaling,
  missing data, outliers inconsistent with the rest, suspected plotting
  bugs, legend/series mismatches, unit problems, illegible regions. Do
  NOT put primary content here — that goes in description. Empty if
  nothing anomalous.

CONSISTENCY RULE:
description and notes must not contradict. description is the canonical
narrative; notes only flags issues on top of it.

General rules:
- One entry per attached image. No skips, no inventions.
- Quote what is visible. Never fabricate values or trends.
- Unreadable image → description="image could not be read",
  figure_type="unknown", but still return the entry.

This is the exact text the agent runs with. The .openai-agents.py install artifact embeds it verbatim; Cursor / Claude Code install the same content via their respective rule formats.

07WORKED EXAMPLE

author-stated

input

urls: - "https://example.org/paper/fig1_loss_curves.png" - "https://example.org/paper/fig2_attention_heatmap.png" context: "Training ablation study comparing baseline vs. proposed model on CIFAR-10."

output

analyses:
  - slug: "fig1_loss_curves"
    url: "https://example.org/paper/fig1_loss_curves.png"
    figure_type: "plot"
    description: "line_plot — Two training loss curves plotted against epoch (x-axis 0–100).
      Baseline (blue solid) starts at approximately 2.3, decreases steeply to ~0.8 by
      epoch 20, then plateaus around 0.6 with minor oscillations through epoch 100.
      Proposed model (orange dashed) starts identically at ~2.3, decreases more steeply
      reaching ~0.4 by epoch 20, and continues declining to ~0.2 by epoch 100 with no
      visible plateau. The two curves cross at approximately epoch 5; the proposed model
      remains strictly lower thereafter."
    x_axis: "Epoch (0–100)"
    y_axis: "Cross-entropy loss (0–2.5)"
    legend: ["baseline — blue solid", "proposed — orange dashed"]
    caption: "Figure 1. Training loss curves for baseline and proposed model."
    notes: ""
  - slug: "fig2_attention_heatmap"
    url: "https://example.org/paper/fig2_attention_heatmap.png"
    figure_type: "plot"
    description: "heatmap_or_matrix — 12×12 attention weight matrix. High values (deep red,
      ~0.8–1.0) concentrated on the diagonal, indicating strong self-attention. Notable
      off-diagonal cluster in rows 3–5, columns 8–10 with values ~0.4–0.6, suggesting
      cross-token dependencies. Lower-left triangle predominantly near zero (blue)."
    x_axis: "Token position (0–11)"
    y_axis: "Token position (0–11)"
    legend: []
    caption: "Figure 2. Layer-6 self-attention weights."
    notes: "Color scale legend not visible in image; intensity interpreted from colorbar tick marks."
markdown: |
  # Image Analysis Report
  ## Context
  Training ablation study comparing baseline vs. proposed model on CIFAR-10.
  ## Figures (2 total)
  ### 1. `fig1_loss_curves` — _plot_
  ...