Image Analyzer Agent
Generic scientific image analyzer that downloads remote images, base64-encodes them, and sends batches to a vision-capable language model via responses.parse for structured per-image analysis. Returns one FigureAnalysis per image (type, description, axes, legend, caption, anomaly notes) plus a consolidated Markdown report.
curl -sL https://agentarium.science/a/image-analyzer/v/1.0.0.md \-o ~/.claude/agents/image-analyzer.md# the agent file declares its required MCP servers;# follow the README inside it to wire them up.
note The model: field in the frontmatter records the author's preferred model. Claude Code substitutes its own model when running the agent — that's expected, and the routing / tool calls still work as advertised.
A structured, format-conformant submission, screened for topic and obvious safety issues. The registry verifies format and topic — it does not verify that the agent is correct, that it works, or that the author's disclosures are accurate. Read everything below the way you'd read a preprint: structured enough to trust the shape, not the claims.
- ✓No fabricationNever invents values, trends, or features not visible in the image; uses 'approximately' when precision is unreadable.
- ✓Verbatim slugslug field is always copied verbatim from the caption — never shortened, invented, or inferred.
- ✓URL passthroughurl field in FigureAnalysis is always left empty by the model and filled programmatically from the download map.
- ✓Context non-overrideThe supplied Context paragraph is used only to resolve ambiguous labels; it never overrides what the image actually shows.
- ✓Notes isolationnotes field contains only genuine anomalies; primary descriptive content must go in description, never in notes.
- ✓Unreadable-image stubUnreadable images return a stub entry with description='image could not be read' and figure_type='unknown' rather than being skipped.
- ✓Non-contradictiondescription and notes are checked for consistency; they must not contradict each other.
- tested
- [TO BE FILLED BY AUTHOR] e.g. 100 scientific figures from published papers with human-annotated ground-truth (type, key values, axis labels).
- data
- [TO BE FILLED BY AUTHOR] e.g. Curated set of figures sampled from open-access earth-science and ML papers.
- metric
- [TO BE FILLED BY AUTHOR] e.g. Exact-match on figure_type; BLEU / human-rating on description quality.
- result
- [TO BE FILLED BY AUTHOR] e.g. figure_type accuracy 91%; description adequacy rated ≥4/5 by domain reviewers.
- validated
- 2026-05-26
- caveat
- [TO BE FILLED BY AUTHOR — 'none' rejected at gate]
Ran this agent yourself against the gold dataset? File a reproduction from your own ORCID — one is all it takes to move this listing to Tier 5 · independently reproduced.
Designed for automated extraction of structured metadata from scientific figures (plots, illustrations, schematics) in research papers and technical reports. Intended as a preprocessing step for downstream literature analysis, figure indexing, or accessibility workflows where a machine-readable description of each figure is needed.
Not a scientific reasoning or interpretation agent — it describes what is visible, not what it means scientifically. Does not assess statistical validity, reproduce numerical results, or draw conclusions beyond what the figure shows. Not designed for real-time or interactive figure annotation; operates in batch mode only. Not suitable for figures with intentionally obscured or encrypted content.
Low-resolution or heavily compressed images may produce imprecise value readings reported as "approximately." Composite multi-panel figures may have panels mis-typed if panel boundaries are unclear. Figures with non-standard color scales or perceptually similar palettes may have legend entries mis-matched. Download failures are silently skipped — a missing FigureAnalysis in the output indicates a failed download, not an absent figure. Axis labels in non-Latin scripts may not be transcribed correctly.
This is the exact text the agent runs with. The .openai-agents.py install artifact embeds it verbatim; Cursor / Claude Code install the same content via their respective rule formats.
analyses:
- slug: "fig1_loss_curves"
url: "https://example.org/paper/fig1_loss_curves.png"
figure_type: "plot"
description: "line_plot — Two training loss curves plotted against epoch (x-axis 0–100).
Baseline (blue solid) starts at approximately 2.3, decreases steeply to ~0.8 by
epoch 20, then plateaus around 0.6 with minor oscillations through epoch 100.
Proposed model (orange dashed) starts identically at ~2.3, decreases more steeply
reaching ~0.4 by epoch 20, and continues declining to ~0.2 by epoch 100 with no
visible plateau. The two curves cross at approximately epoch 5; the proposed model
remains strictly lower thereafter."
x_axis: "Epoch (0–100)"
y_axis: "Cross-entropy loss (0–2.5)"
legend: ["baseline — blue solid", "proposed — orange dashed"]
caption: "Figure 1. Training loss curves for baseline and proposed model."
notes: ""
- slug: "fig2_attention_heatmap"
url: "https://example.org/paper/fig2_attention_heatmap.png"
figure_type: "plot"
description: "heatmap_or_matrix — 12×12 attention weight matrix. High values (deep red,
~0.8–1.0) concentrated on the diagonal, indicating strong self-attention. Notable
off-diagonal cluster in rows 3–5, columns 8–10 with values ~0.4–0.6, suggesting
cross-token dependencies. Lower-left triangle predominantly near zero (blue)."
x_axis: "Token position (0–11)"
y_axis: "Token position (0–11)"
legend: []
caption: "Figure 2. Layer-6 self-attention weights."
notes: "Color scale legend not visible in image; intensity interpreted from colorbar tick marks."
markdown: |
# Image Analysis Report
## Context
Training ablation study comparing baseline vs. proposed model on CIFAR-10.
## Figures (2 total)
### 1. `fig1_loss_curves` — _plot_
...