Researchers are tackling a significant hurdle in modern scientific publishing: the time-consuming creation of academic illustrations. Dawei Zhu from Peking University and Google Cloud AI Research, alongside Rui Meng and Yale Song from Google Cloud AI Research, et al., present PaperBanana, a novel framework designed to automate the generation of publication-ready figures. This work is particularly significant as it addresses a key bottleneck in the workflow of ‘autonomous scientists’ powered by artificial intelligence, potentially accelerating the pace of discovery by freeing researchers from laborious manual illustration tasks. PaperBanana leverages advanced vision-language models and image generation techniques to not only create diagrams but also to critically evaluate and refine them, demonstrated through rigorous testing on a new benchmark, PaperBananaBench, comprising 292 methodology diagrams from NeurIPS 2025 publications.

PaperBanana aims to alleviate the burden on researchers by automating the production of visually compelling and scientifically accurate diagrams and plots.

Automating Planning, Rendering, and Refinement

The research team developed a system capable of planning content and style, rendering images, and refining them based on internal assessment, all without direct human intervention. Powered by models like Gemini-3-Pro and Nano-Banana-Pro, PaperBanana retrieves pertinent examples, devises detailed plans, and generates images that adhere to the rigorous standards of academic publishing.
Beyond simply creating visuals, the framework demonstrates versatility by extending its capabilities to generate high-quality statistical plots, offering a comprehensive solution for scientific visualisation. This innovative approach moves beyond traditional code-based methods and recent image generation models, which often struggle with the nuanced requirements of academic illustrations.

Rigorous Testing with PaperBananaBench Benchmark

To rigorously evaluate PaperBanana, the scientists introduced PaperBananaBench, a dedicated benchmark comprising 292 methodology diagrams curated from NeurIPS 2025 publications, covering a diverse range of research domains and illustration styles. Comprehensive experiments reveal that PaperBanana consistently outperforms existing baselines across four key dimensions: faithfulness, conciseness, readability, and aesthetics, achieving improvements of +2.8%, +37.2%, +12.9%, and +6.6% respectively, alongside an overall score improvement of +17.0%. The work opens exciting possibilities for fully automating the creation of academic illustrations, as demonstrated by the figures within this manuscript which were generated entirely using the framework.

Automated Generation and Benchmarking of Scientific Methodology Diagrams facilitates reproducible research

Scientists developed PaperBanana, an agentic framework automating the generation of publication-ready academic illustrations, addressing a labor-intensive bottleneck in research workflows. The study harnessed state-of-the-art Vision-Language Models (VLMs) and image generation models, specifically Gemini-3-Pro and Nano-Banana-Pro, to orchestrate specialized agents.

These agents retrieve relevant references, plan content and style, render images, and iteratively refine outputs through self-critique, enabling the creation of visually compelling and accurate diagrams. To rigorously evaluate the framework, researchers introduced PaperBananaBench, a benchmark comprising 292 test cases and 292 reference cases of methodology diagrams curated from NeurIPS 2025 publications.

Advanced Scoring via VLM-as-a-Judge Model

This benchmark covers diverse research domains and illustration styles, providing a comprehensive assessment of the system’s capabilities. Experiments employed a VLM-as-a-Judge approach, scoring generated illustrations against human-created references across four key dimensions: faithfulness, conciseness, readability, and aesthetics.

Measurable Gains in Key Visualization Metrics

The team measured performance gains against leading baselines, achieving a +2.8% improvement in faithfulness, +37.2% in conciseness, +12.9% in readability, and +6.6% in aesthetics. Reliability of the VLM-based scoring was verified through correlation with human judgments, ensuring the validity of the evaluation metrics. Beyond methodology diagrams, the study demonstrated the framework’s versatility by successfully generating high-quality statistical plots, showcasing a comprehensive solution for scientific visualization and paving the way for fully automated illustration pipelines.

PaperBanana generates high-quality scientific figures exceeding baseline performance across key metrics, demonstrating its potential for impactful research visualization

Scientists have developed PaperBanana, a new framework designed to automate the generation of publication-ready academic illustrations. The research addresses a significant bottleneck in the scientific workflow, where creating high-quality figures is often a labour-intensive process. PaperBanana leverages state-of-the-art Visual Language Models (VLMs) and image generation models to retrieve references, plan content and style, render images, and refine results through self-critique.

To rigorously test the framework, researchers introduced PaperBananaBench, a dataset comprising 292 methodology diagrams curated from NeurIPS 2025 publications. Experiments demonstrate that PaperBanana consistently outperforms baseline methods in four key areas: faithfulness, conciseness, readability, and aesthetics.

Specifically, using Nano-Banana-Pro as the image generation model, PaperBanana achieved a score of 45.8 for Faithfulness, 80.7 for Conciseness, 51.4 for Readability, and 72.1 for Aesthetics, resulting in an overall score of 60.2. The team measured improvements over the Vanilla Nano-Banana-Pro baseline, achieving gains of +2.8% in Faithfulness, +37.2% in Conciseness, +12.9% in Readability, and +6.6% in Aesthetics, culminating in a +17.0% increase in the Overall score.

Further analysis revealed that the Agent & Reasoning category achieved the highest overall score of 69.9%, while Vision & Perception scored lowest at 52.1%. A blind human evaluation on a subset of 50 cases showed PaperBanana winning 72.7% of comparisons against vanilla Nano-Banana-Pro, with a tie rate of 20.7% and a loss rate of 6.6%.

An ablation study revealed that providing general structural and stylistic patterns via the Retriever agent was more critical than precise content matching. Removing the Stylist and Critic agents significantly reduced performance in Conciseness, Readability, and Aesthetics, highlighting their importance in generating polished, effective illustrations. These results collectively demonstrate PaperBanana’s potential to automate the creation of publication-ready figures, streamlining the research process and accelerating scientific communication.

Automated generation and evaluation of scientific figures using PaperBanana is a promising new approach

Scientists have developed PaperBanana, a new framework designed to automate the creation of publication-ready academic illustrations. The system uses a combination of visual and language models, orchestrated through specialized agents, to retrieve relevant information, plan illustration content and style, render images, and refine them through self-critique.

PaperBanana demonstrates effectiveness in generating both methodology diagrams and statistical plots, addressing a significant bottleneck in the research process. Researchers rigorously evaluated PaperBanana using PaperBananaBench, a benchmark comprising 292 methodology diagrams from NeurIPS 2025 publications.

Comprehensive experiments showed that PaperBanana consistently outperforms existing methods in key areas such as faithfulness to the source material, conciseness of presentation, readability, and overall aesthetics. The authors acknowledge a primary limitation in the raster format of the generated images, hindering easy editing and scalability, but propose potential solutions including image editing models and reconstruction pipelines.

Future work could involve developing a GUI agent capable of directly generating editable vector graphics using professional design software. This research establishes a pathway towards fully automated generation of high-quality visuals for scientific communication. By reducing the time and effort required for illustration creation, PaperBanana has the potential to accelerate the pace of scientific discovery and improve the accessibility of research findings. The acknowledged limitations regarding image editability represent an area for continued development, with proposed solutions offering promising avenues for future research.

👉 More information
🗞 PaperBanana: Automating Academic Illustration for AI Scientists
🧠 ArXiv: https://arxiv.org/abs/2601.23265

Tags:

Image Generation

Muhammad Rohail T.

PaperBanana: AI Automates Academic Diagram Generation

Automating Planning, Rendering, and Refinement

Rigorous Testing with PaperBananaBench Benchmark

Automated Generation and Benchmarking of Scientific Methodology Diagrams facilitates reproducible research

Advanced Scoring via VLM-as-a-Judge Model

Measurable Gains in Key Visualization Metrics

PaperBanana generates high-quality scientific figures exceeding baseline performance across key metrics, demonstrating its potential for impactful research visualization

Automated generation and evaluation of scientific figures using PaperBanana is a promising new approach

Latest Posts by Muhammad Rohail T.:

Quantum Light Speeds Atomic Ionization

Quantum States Predictably Distribute with Noise

Quantum Networks: Unknown State Verification Limit