The growing demand for computer vision applications, which rely heavily on image and video data, creates a need for efficient compression techniques specifically tailored to these tasks. Hyomin Choi from InterDigital, Heeji Han from Hanbat National University, Chris Rosewarne from Canon, and Fabien Racapé from InterDigital address this challenge by introducing CompressAI-Vision, a new open-source software platform designed to rigorously evaluate compression methods for computer vision. This platform provides a standardised environment for testing how well different compression tools preserve the accuracy of vision tasks, considering both local and remote processing scenarios. By offering a common ground for comparison, CompressAI-Vision accelerates the development of optimised compression technologies and has already gained recognition through its adoption by the Moving Pictures Experts Group for the development of a new Feature Coding for Machines standard, promising significant advances in efficient visual data handling.

sion tasks, associated neural network models and datasets, a consolidated platform is needed as a common ground to implement and evaluate compression methods optimised for downstream vision tasks. CompressAI-Vision is introduced as a comprehensive evaluation platform where new coding tools compete to efficiently compress the input of vision networks while retaining task accuracy in the context of two different inference scenarios: “remote” and “split” inferencing. This evaluation platform has.

Machine Learning Focused Video Compression Evaluation

This research introduces CompressAI-Vision, an open-source framework designed to evaluate video compression techniques specifically for machine learning applications, often referred to as video coding for machines. It addresses the critical need to efficiently compress video data while preserving information essential for artificial intelligence tasks like object detection, pose estimation, and tracking. Traditional video compression metrics do not accurately reflect performance in these AI models, necessitating a new evaluation approach. Key contributions and features include: * Focus on Machine Learning Performance: CompressAI-Vision moves beyond traditional metrics by directly measuring the impact of compression on the accuracy of AI models.

It integrates with popular AI frameworks, such as Detectron2, MMPose, and others, to assess performance after compression and decompression. * Comprehensive Dataset Support: The framework supports a variety of datasets commonly used in machine learning, including OpenImages, FLIR Thermal Datasets, SFU-HW-Objects, Tencent Video Dataset (TVD), and Human in Events. * Integration with AI Frameworks: It seamlessly integrates with popular AI frameworks like Detectron2, MMPose, YOLO, and others, allowing for end-to-end evaluation of compressed video data. * Open-Source and Extensible: Being open-source, CompressAI-Vision encourages community contributions and allows for easy customization and extension to support new datasets, AI models, and compression techniques.

Support for Modern Video Codecs: The framework can be used to evaluate various video codecs, including H. 264, H. 265, H. 266, and potentially others. * Common Test Conditions: The research establishes common test conditions for video coding for machines, ensuring fair and reproducible evaluation results.

The method involves compressing video data with a chosen codec, decompressing it, and then feeding it into an AI model to measure performance. This framework is crucial for developing and evaluating video compression techniques optimized for machine learning applications. It bridges the gap between traditional video compression and the requirements of AI, enabling more efficient and effective video analysis in areas like autonomous driving, robotics, and surveillance. CompressAI-Vision is a valuable tool for researchers and developers, providing a standardized and comprehensive platform for evaluating and comparing different compression techniques.

CompressAI-Vision Evaluates Video Coding For Computer Vision

Scientists have developed CompressAI-Vision, a comprehensive evaluation platform designed to assess video compression methods specifically for computer vision tasks. The work showcases the platform’s capabilities through extensive testing with standard codecs and various datasets. Experiments demonstrate significant compression gains using the FCTM v6.

1 codec against the VCM-RS v0. 12 codec across multiple datasets. For the SFU-HW-Obj dataset, FCTM achieved bitrate savings of 79. 35% and 69. 02% for Class C and Class D, respectively, while maintaining equivalent task accuracy.

On average, FCTM reduced bitrate by -58. 33%, -41. 43%, and -72. 70% under Random Access, Low Delay, and All-Intra configurations, respectively, when compared to VCM-RS results. Conversely, when evaluating VCM-RS under the FCM CTTC, the team found it and FCTM substantially outperformed other methods on the TVD dataset, reaching near-lossless accuracy at higher bitrates.

Further analysis revealed that using VTM-23. 3 as the inner codec for FCTM v6. 1 delivered superior performance compared to using JM-19. 1 or HM-18. 0. These results demonstrate CompressAI-Vision’s ability to consistently evaluate coding performance across different inner codec configurations and heterogeneous inference pipelines. The platform is poised to support cutting-edge vision transformer architectures and multi-task networks, enabling exploration of compression noise impact on embedding spaces and optimization of coding methods for diverse tasks.

Compression Evaluation for Computer Vision Tasks

CompressAI-Vision represents a significant advancement in the evaluation of video compression techniques specifically for computer vision applications. Researchers developed a comprehensive platform allowing for comparative analysis of coding tools while maintaining accuracy in downstream vision tasks, assessed through both remote and split inference scenarios. The platform facilitates detailed examination of bit-rate versus task accuracy across various datasets, offering valuable insights into the trade-offs between compression efficiency and performance. The open-source nature of CompressAI-Vision ensures scalability and encourages contributions from the wider research community, fostering ongoing development and innovation. The authors acknowledge that the platform currently focuses on convolutional neural networks and plan to expand support for vision transformer architectures, allowing for investigation of compression noise impact on embedding spaces. Future work also intends to explore multi-task networks to optimise coding methods for parallel processing of various machine vision tasks.

👉 More information
🗞 CompressAI-Vision: Open-source software to evaluate compression methods for computer vision tasks
🧠 ArXiv: https://arxiv.org/abs/2509.20777

Tags:

bit-rate optimisation CompressAI-Vision Computer Vision feature coding MPEG FCM neural networks remote inference split inference task accuracy video compression

Compressai-vision: Open-source Platform Evaluates Compression Methods for Computer Vision Tasks and Downstream Inference

Machine Learning Focused Video Compression Evaluation

CompressAI-Vision Evaluates Video Coding For Computer Vision

Compression Evaluation for Computer Vision Tasks

Rohail T.

Latest Posts by Rohail T.:

Lasers Unlock New Tools for Molecular Sensing

Light’s Polarisation Fully Controlled on a Single Chip

New Quantum Algorithms Deliver Speed-Ups Without Sacrificing Predictability