Diffusion Inpainting Achieves Robust Ear Recognition Despite Accessory Occlusions

Ear occlusions, commonly caused by accessories like earrings and earphones, present a significant challenge to the reliability of ear-based biometric recognition, particularly in real-world scenarios. Researchers Deeksha Arun, Kevin W. Bowyer, and Patrick Flynn, all from the Department of Computer Science and Engineering at the University of Notre Dame, demonstrate a novel diffusion-based inpainting technique to address this problem. Their work reconstructs obscured ear regions with anatomical accuracy, effectively ‘removing’ occlusions while maintaining crucial geometric details like the helix and lobule. This innovative pre-processing step substantially improves the performance of ear recognition systems, offering a robust solution for biometric identification even with partially obscured ears , a crucial advancement for security and surveillance applications.

This innovative pre-processing step. g., earrings, earbuds, earphones) via two stages: (i) accessory mask generation and (ii) masked restoration (inpainting). The resulting bounding boxes are converted into high-quality pixel masks using SAM 2, followed by light morphological refinement. Finally, they inpaint the masked region to reconstruct an accessory-free ear image I for downstream recognition. An overview of the complete pipeline is illustrated in Fig0.2. a) Supervised detector (YOLOv10): They fine-tune YOLOv10 on a manually annotated accessory dataset, where they label bounding boxes for common ear accessories (e. g., ear accessory, earbud) in ear images.

At inference time, YOLOv10 produces candidate accessory boxes. b) Zero-shot detector (Grounding DINO): To improve robustness to unseen accessory types and domain shift, they additionally use Grounding DINO as a zero-shot detector. They query it with a curated text prompt comprising accessory phrases (e. g., ‘earring’, ‘wireless earbud’, ‘hearing aid’), formatted as lowercase, period-separated terms to match the model’s expected interface. The detector output is then processed.

Automated Ear Masking and Diffusion Inpainting

Initially, an accessory mask is automatically derived for each input ear image, guiding the inpainting process to focus on the occluded areas. The study pioneered a fully automated mask generation pipeline, combining YOLOv10, Grounding DINO, and SAM2 to achieve robust ear accessory segmentation, a critical step for accurate inpainting. This innovative combination allows the system to precisely identify and delineate accessory regions, even under varying lighting conditions and accessory styles. Experiments employed Vision Transformers for downstream recognition tasks, evaluating the impact of the pre-processing aid on biometric utility as the central objective.

Researchers aligned and cropped ear images before applying the inpainting model, ensuring consistent input for both original and reconstructed images. This rigorous testing protocol involved comparing recognition performance with and without the diffusion-based inpainting pre-processing step, using various model architectures and tokenization strategies. The approach achieves notable robustness in producing high-fidelity, semantically coherent content, refining local details while maintaining global consistency, essential for realistic ear reconstruction. Furthermore, the work details how diffusion models’ iterative denoising process enables the generation of sharp, structure-consistent reconstructions, minimising identity drift and avoiding artifacts that could compromise biometric accuracy. The team’s key contributions include the first ear accessory-aware diffusion-based inpainting pipeline and a comprehensive evaluation demonstrating improved verification performance across multiple benchmarks.,.

Diffusion inpainting improves ear biometric accuracy by reconstructing

This innovative approach addresses a key challenge in ear biometrics, where occlusions from accessories can significantly degrade accuracy. Results demonstrate a clear improvement in verification accuracy when the restoration technique is applied as a pre-processing step. Data shows the pipeline effectively alleviates accessory-induced occlusions, leading to enhanced overall recognition performance. Specifically, the study employed a hybrid detector, combining a supervised YOLOv10 model and a zero-shot Grounding DINO detector, to accurately locate accessories within ear images. Following accessory localisation, high-quality pixel masks were generated using SAM 2, refined with morphological operations, and then used to guide the diffusion inpainting process.

The work connects diffusion-based image restoration with ear biometric recognition, explicitly reconstructing occluded ear regions using conditional diffusion inpainting. The pipeline, illustrated in Fig 0.2, first generates accessory masks and then inpaints the masked region to reconstruct an accessory-free ear image for subsequent recognition. This breakthrough delivers a fully automated, accessory-aware solution with robust mask generation and cross-benchmark evaluation, paving the way for more reliable ear-based biometric systems.

Diffusion Restores Ears, Boosts Biometric Accuracy, researchers find

Experiments across multiple benchmark datasets demonstrate that this diffusion-based restoration is particularly beneficial under challenging conditions, such as large or high-contrast occlusions covering crucial ear features. Consistent improvements in recognition accuracy were observed, notably on the EarVN1.0 dataset, indicating increased robustness against corrupted input images. However, the authors acknowledge that on cleaner datasets and with finer patch settings, the inpainting process can occasionally reduce performance, suggesting a trade-off between artifact removal and preservation of subtle identity cues. The study highlights the importance of balancing anatomical reconstruction with identity preservation when employing diffusion models for biometric preprocessing. Future research will focus on incorporating explicit identity-preserving constraints, such as feature-level consistency and perceptual losses, to prevent over-smoothing and maintain fine-grained morphological details. Additionally, extending the model to handle more complex occlusions, like hair or shadows, and exploring tighter integration between restoration and recognition modules represent promising avenues for further development and improved performance in real-world scenarios.

👉 More information
🗞 Diffusion for De-Occlusion: Accessory-Aware Diffusion Inpainting for Robust Ear Biometric Recognition
🧠 ArXiv: https://arxiv.org/abs/2601.19795

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Reflect Achieves Constitutional Alignment for Large Language Models Without Training Data

Reflect Achieves Constitutional Alignment for Large Language Models Without Training Data

January 29, 2026
Quantum Algorithms Achieve Lower Resource Needs for ATP/metaphosphate Hydrolysis

Quantum Algorithms Achieve Lower Resource Needs for ATP/metaphosphate Hydrolysis

January 29, 2026
Information Backflow Diagrams Unify Entanglement Revivals and Entropy Overshoots in Models

Information Backflow Diagrams Unify Entanglement Revivals and Entropy Overshoots in Models

January 29, 2026