Detecting mirrors in video presents a significant challenge for computer vision systems, yet current methods struggle with both accuracy and reliability. Rui Song, Jiaying Lin, and Rynson W. H. Lau, all from City University of Hong Kong, address these limitations with a novel approach called MirrorMamba. Their research introduces a system that combines multiple visual cues, including perceived depth and correspondence, to adapt to varying conditions and accurately identify mirror surfaces. Crucially, MirrorMamba incorporates a new architecture based on the emerging Mamba spatial state model, offering both a broad understanding of the scene and efficient processing, and represents the first successful application of this technology to mirror detection. Extensive testing demonstrates that MirrorMamba outperforms existing methods on standard benchmarks, establishing a new state-of-the-art and proving its robustness and generalizability in challenging scenarios.
This research addresses this problem, motivated by applications in augmented reality, robotics, scene understanding, and video editing. Scientists are developing methods to reliably identify mirrors within visual data, enabling more sophisticated interactions with the real world. Early approaches to mirror detection relied on traditional computer vision techniques, analyzing features like edges and colors.
However, recent advancements focus on deep learning, specifically convolutional neural networks, to achieve improved performance. These networks are employed for semantic segmentation and object detection, and researchers are exploring attention mechanisms, pyramid scene parsing networks, and self-supervised learning to leverage unlabeled data. This body of work introduces several key innovations, including methods for learning from video sequences to utilize temporal information, exploiting the relationship between reflected objects and their real-world counterparts, and leveraging three-dimensional scene understanding to differentiate between real and reflected surfaces. The results demonstrate state-of-the-art performance on standard mirror detection datasets, showcasing the effectiveness of these new approaches.
Mamba Architecture for Robust Mirror Detection
Researchers have developed MirrorMamba, a new approach to video mirror detection that overcomes limitations in existing methods. This system integrates multiple visual cues, including perceived depth, correspondence, and optical flow, to achieve robust and scalable performance. A key innovation is the application of the Mamba-based architecture to mirror detection, offering advantages over traditional convolutional neural networks and transformers. Central to MirrorMamba is the Mamba-based Multidirection Correspondence Extractor, designed to capture relationships between points inside and outside the potential mirror surface with a global receptive field and linear complexity.
This extractor processes information from multiple directions, establishing robust correspondences. Scientists then designed a Mamba-based layer-wise boundary enforcement decoder to refine unclear boundaries often caused by blurred depth maps, improving the accuracy of mirror localization. Comprehensive evaluation on benchmark datasets demonstrates that MirrorMamba outperforms existing state-of-the-art approaches, achieving state-of-the-art performance on a challenging image-based mirror detection dataset. The system proves its robustness and generalizability across diverse conditions, effectively handling scenarios where individual cues are insufficient by combining depth, correspondence, and optical flow to accurately identify mirrors even in complex environments.
Mirror Detection via Spatial State Modelling
Scientists have developed MirrorMamba, a novel framework for video mirror detection that significantly advances the field by effectively utilizing multiple visual cues. The research team achieved state-of-the-art performance on challenging image-based mirror detection datasets, demonstrating the robustness and generalizability of their approach. This breakthrough delivers a new method capable of accurately identifying mirrors in diverse and complex video scenes. The core of MirrorMamba lies in its innovative use of the Mamba spatial state model, a recently developed architecture, for the first time in mirror detection.
Researchers integrated a Mamba-based Multidirection Correspondence Extractor (MMCE) to analyze implicit correspondence between spaces inside and outside the mirror, enhancing detection accuracy by considering symmetrical semantics. This module dynamically fuses features from color, depth, and optical flow images, extracting crucial clues for mirror identification. To further refine the detection process, the team designed a Mamba-based Layer-wise Boundary Enforcement Decoder (BED). This decoder progressively refines features by combining high-level semantic information with low-level detail, resulting in a high-quality mirror segmentation map with precise boundary details. Experiments demonstrate that the system effectively leverages both static and dynamic cues, utilizing relative depth maps and inter-frame optical flow maps. This work marks a significant step forward in video analysis and computer vision, offering a powerful new tool for applications ranging from augmented reality to robotics.
Mamba Architecture Achieves Mirror Detection Breakthrough
This research presents MirrorMamba, a novel framework for detecting mirrors in both images and videos, achieving state-of-the-art performance on benchmark datasets. The team addressed limitations in existing methods by incorporating multiple cues, perceived depth, correspondence, and motion dynamics, to improve robustness and accuracy. A key innovation lies in the Mamba-based Multi-direction Correspondence Extractor, which efficiently captures global symmetry relationships, and the Layer-wise Boundary Enforcement Decoder, designed to refine boundary details in challenging scenarios. Notably, this work marks the first successful application of the Mamba architecture to the problem of mirror detection, demonstrating its potential for this task. Experiments confirm that the proposed method outperforms existing approaches, achieving superior results on both image and video datasets. Future work could explore the application of Mamba to other computer vision tasks requiring the capture of long-range dependencies and efficient processing of sequential data.
👉 More information
🗞 MirrorMamba: Towards Scalable and Robust Mirror Detection in Videos
🧠 ArXiv: https://arxiv.org/abs/2511.06716
