The increasing reliance on artificial intelligence in safety-critical systems, such as aircraft and self-driving cars, demands new methods for ensuring reliability and preventing failures. Anastasia Mavridou, Divya Gopinath, and Corina S. Păsăreanu, all from KBR Inc. and NASA Ames, tackle this challenge by employing artificial intelligence itself as a solution. Their research introduces a novel pipeline that uses advanced language and vision models to bridge the gap between human-written requirements and the complex inner workings of AI systems. This approach allows for earlier and more effective verification of AI components, moving beyond traditional methods that struggle with the opacity of modern neural networks and ultimately promises to deliver more trustworthy and dependable AI-enabled technologies.
AI Improves Safety of Autonomous Systems
This research introduces a framework for enhancing the safety and certification of AI-enabled autonomous systems, particularly those used in critical applications. The core idea involves leveraging foundation models, like large language models and vision-language models, to improve every stage of development, from defining requirements to monitoring performance during operation. Key components of this system include REACT and SemaLens, working together to create a more robust and reliable development process. REACT, or Requirements Engineering with AI for Certification, improves requirement quality by combining formal analysis with the accessibility of natural language.
It identifies ambiguities and inconsistencies early in the design process, reducing costly fixes later on. SemaLens utilizes vision-language models to analyze, explain, and debug the behavior of neural networks, focusing on understanding what the AI sees and why it makes certain decisions, thereby improving transparency and reliability. This combination delivers improved safety, reduced development costs, enhanced transparency, scalability, and facilitates compliance with industry standards.
Formalizing Ambiguous Requirements with AI Assistance
The research team engineered a comprehensive pipeline, REACT and SemaLens, to address the challenges of assuring safety-critical AI systems used in aerospace and autonomous vehicles. This work pioneers a method for translating ambiguous natural language requirements into formal specifications and then validating those specifications against system implementations. The REACT component, an AI-based Requirements Assistant, systematically transforms imprecise English requirements into structured natural language, termed Restricted English, using Large Language Models. Rather than generating a single translation, the system deliberately produces multiple candidate interpretations, allowing users to explicitly select the intended meaning and preserve semantic precision during the authoring process.
To ensure correctness, the REACT Validate module employs formal validation techniques to distinguish subtle semantic differences between these candidate requirements, presenting the results in an engineer-friendly format such as execution traces or concrete scenarios. Validated requirements are then translated into formal specifications, generating representations in formal logics, and accommodating requirement types specific to AI components, including Vision-Language Models. The REACT Analyze module performs automated formal analysis across the requirement set, systematically detecting inconsistencies and conflicts before implementation. Furthermore, the team developed a module to automatically generate candidate test cases with coverage guarantees, which can then be used with SemaLens to generate videos that check the semantic robustness of perception models.
AI Assures Safety of Complex Systems
This work presents a novel framework integrating artificial intelligence to address challenges in assuring the safety of complex systems, particularly those employing deep neural networks. The core of the approach lies in two complementary components, REACT and SemaLens, which together establish a pipeline from informal requirements to validated implementations. REACT leverages large language models to translate natural language requirements into formal specifications, enabling early verification and validation while maintaining usability for engineers without formal methods expertise. This process identifies ambiguities and inconsistencies at the design’s earliest stages, preventing costly downstream failures and reducing manual validation effort.
SemaLens expands upon this by utilizing vision-language models to reason about and test perception systems using human-understandable concepts. SemaLens Monitor evaluates temporal logic formulas against image sequences, determining conformance with requirements by comparing image embeddings to textual captions. Experiments demonstrate the monitor accurately identifies sequences satisfying properties. Furthermore, SemaLens Img Generate employs text-conditional diffusion models to create diverse test images conforming to natural language requirements, enhancing semantic robustness testing. SemaLens also introduces novel coverage metrics based on semantic features, quantifying how well images cover relevant concepts within the operational design domain.
This capability supports both black-box and white-box testing, analyzing unlabeled datasets or mapping perception component embeddings. SemaLens AED analyzes the logic of vision models by aligning their embeddings with the CLIP model, providing explanations for behavior and identifying potential bugs. Statistical analysis generates semantic heatmaps, pinpointing non-robust features and flagging potentially unsafe inputs at runtime. This framework delivers a rigorous yet accessible approach to requirement quality, enabling early verification and validation, reducing manual effort, and facilitating scalability through AI.
AI Safety Via Formal Verification and Perception Testing
This research presents a novel approach to assuring the safety of autonomous systems that integrate artificial intelligence, particularly deep neural networks. Recognizing the difficulties in verifying these complex systems using traditional methods, scientists developed a workflow incorporating two synergistic components, REACT and SemaLens. REACT employs large language models to translate informal, natural language requirements into formal specifications, enabling earlier verification and validation processes. Complementing this, SemaLens utilizes vision-language models to reason about and test the perception systems of these autonomous systems using concepts readily understood by humans.
Together, these components establish a comprehensive pipeline, from initial requirements to validated implementation, addressing challenges that currently hinder the certification of learning-enabled components in safety-critical applications. The team demonstrated the potential for AI-enabled spatial-temporal reasoning and the generation of diverse test inputs, even in data-sparse environments, improving robustness and safety. While acknowledging the ongoing need for further development, the scientists highlight the potential of this “fighting AI with AI” strategy to overcome limitations in current verification techniques. Future work will likely focus on refining these models and expanding their application to a wider range of autonomous systems and safety-critical domains.
👉 More information
🗞 Fighting AI with AI: Leveraging Foundation Models for Assuring AI-Enabled Safety-Critical Systems
🧠 ArXiv: https://arxiv.org/abs/2511.20627
