Microscopic pixel-level changes, imperceptible to the human eye, are enough to bypass the safety safeguards of some artificial intelligence systems, according to new research from Florida International University. Researchers Hadi Amini, associate professor at FIU’s Knight Foundation School of Computing and Information Sciences, and graduate assistant Md Jueal Mia discovered that an altered image, even a picture of a panda bear, can trick AI into generating harmful or policy-violating outputs. As presented at the International Conference on Machine Learning and Applications, the team’s findings demonstrate that AI models “don’t see images the same way humans do,” Amini explains, instead interpreting them as patterns of numbers and pixels. “In order to protect AI systems from attacks, we try to break them ourselves, identify potential vulnerabilities and design defense mechanisms,” Amini said, framing their work as a proactive effort to bolster future AI security.
Pixel-Level Perturbations Bypass AI Safety Safeguards
This research focuses on exploiting the way AI systems process visual information at a fundamental level, rather than creating complex adversarial attacks. To achieve this, they developed JaiLIP (Jailbreaking with Loss-guided Image Perturbation), an algorithm designed to determine the optimal degree of pixel-level manipulation needed to bypass AI safeguards. Testing JaiLIP on BLIP-2, a multimodal AI model, revealed a significant increase in the likelihood of the system generating harmful or unsafe responses when presented with altered images. In one example, a JaiLIP-modified image of a stoplight successfully tricked the AI model into providing detailed instructions on how to disregard traffic signals without incurring a penalty.
The researchers found that using JaiLIP images nearly doubled the number of harmful responses generated by the AI models tested, extending the risk beyond simple requests for illegal activities. Amini emphasizes that small businesses and companies integrating AI must be aware of these potential vulnerabilities and prioritize deploying sufficient guardrails to ensure the safety and integrity of their AI tools; the challenge lies in ensuring AI can recognize threats hidden in plain sight, even when humans cannot.
AI models don’t see images the same way humans do.
Hadi Amini, associate professor at Florida International University’s Knight Foundation School of Computing and Information Sciences
JaiLIP Algorithm Increases Harmful AI Response Rates
Florida International University researchers are actively probing the defenses of artificial intelligence systems, employing a counterintuitive strategy of intentional exploitation to bolster future security. This approach centers on identifying vulnerabilities before malicious actors can leverage them. The team’s work reveals that even microscopic pixel-level changes are sufficient to circumvent these safeguards, highlighting the fragility of current AI security measures. Amini emphasizes the need for proactive security measures, recommending limiting sensitive data input, restricting system access, and thoroughly evaluating built-in security features before deploying AI tools.
In one example, a JaiLIP-altered version of a stoplight tricked the AI model into divulging detailed instructions on how to run the light while avoiding a traffic ticket.
Source: https://news.fiu.edu/2026/fiu-researchers-reveal-how-altered-images-can-bypass-ai-safeguards
