Lookbench Advances Fashion Image Retrieval with Live, Challenging Benchmarks and Timestamps

Researchers are tackling the persistent challenge of accurately retrieving fashion items from images, a crucial component of modern e-commerce. Chao Gao, Siqiao Xue, and Yimin Peng from Gensmo.ai, alongside Jiwen Fu, Tingyi Gu, Shanshan Li et al., introduce LookBench , a novel, continuously updated benchmark designed to reflect real-world fashion shopping scenarios. Unlike static datasets, LookBench combines current product images scraped from live websites with synthetically generated fashion, offering a holistic and challenging test for image retrieval models. This dynamic approach, complete with time-stamped data and declared training cutoffs, allows for contamination-aware evaluation and provides a durable measure of progress in the field, demonstrating significant shortcomings in existing models and paving the way for genuinely improved performance.

LookBench comprises four distinct evaluation subsets: RealStudioFlat, AIGen-Studio, RealStreetLook, and AIGen-StreetLook, each offering varying levels of difficulty and retrieval intent. RealStudioFlat focuses on clean, flat-lay product images for single-item retrieval, while AIGen-Studio extends this to AI-generated lifestyle studio images, increasing the complexity.

RealStreetLook and AIGen-StreetLook present the most challenging scenarios, involving real-world and AI-generated street-style outfits, demanding multi-item retrieval capabilities. Furthermore, the study introduces a comprehensive fashion attribute taxonomy, encompassing over 100 visually grounded properties, and leverages LLM-based annotation to provide reliable weak supervision for region-aware evaluation. Analysis of various vision-language and vision-only models reveals that generic open VLMs often underperform on LookBench, particularly with complex street-style outfits, while fashion-specific fine-tuning improves results but leaves room for further progress.

Live Fashion Image Retrieval Benchmark Construction

The study employed a unified, category-attribute-driven pipeline to construct four distinct retrieval subsets: RealStudioFlat, RealStreetLook, AIGen-Studio, and AIGen-StreetLook. Researchers began by sampling (category, attribute, year) tuples and instantiating structured templates to generate web image search queries or image-generation prompts, ensuring a diverse and representative dataset. For the real-image subsets, commercial image search engines were utilised to retrieve time-stamped studio packshots and street-style photos, which underwent rigorous de-duplication and filtering to remove low-resolution images, watermarks, and irrelevant content. The team then harnessed YOLOv11, a state-of-the-art object detection model, to localise fashion items within each retained image and obtain category-labeled crops, serving as visual queries.

Scientists enriched these query crops with fine-grained attributes using a pre-annotation pipeline, enabling precise and nuanced retrieval evaluations. To construct candidate galleries for each query, the research team retained only images sharing the same category and main attribute, ranking them based on the number of shared additional attributes with the query image. The top-k ranked results were then designated as positive matches, forming a robust and challenging evaluation set. This meticulous process ensures that LookBench accurately reflects the complexities of real-world fashion search, demanding sophisticated retrieval capabilities from participating models.

👉 More information
🗞 LookBench: A Live and Holistic Open Benchmark for Fashion Image Retrieval
🧠 ArXiv: https://arxiv.org/abs/2601.14706

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

De Sitter Momentum Space Achieves Novel , Dimensional Harmonic Expansion for Cosmology

De Sitter Momentum Space Achieves Novel , Dimensional Harmonic Expansion for Cosmology

January 23, 2026
Qufid Advances Quantum Program Fidelity Estimation with Adaptive Measurement Budgets

Qufid Advances Quantum Program Fidelity Estimation with Adaptive Measurement Budgets

January 23, 2026
Scsimulator Achieves Supply Chain Partner Selection Via LLM-Driven Multi-Agent Simulation

Scsimulator Achieves Supply Chain Partner Selection Via LLM-Driven Multi-Agent Simulation

January 23, 2026