OpenAI is enhancing ChatGPT’s ability to discern harmful intent not through isolated messages, but by analyzing the evolving context of conversations. The system now processes hundreds of millions of interactions daily to identify subtle cues of distress. These updates, informed by more than two years of collaboration with mental health and safety experts, allow ChatGPT to recognize when risk emerges over time, distinguishing between everyday queries and potentially dangerous situations. “In sensitive conversations, context can matter as much as a single message,” OpenAI states, explaining the system is trained to recognize harmful intent from surrounding dialogue to de-escalate or redirect users toward support. This focus on acute scenarios, including suicide, self-harm, and harm to others, shows a 16 percent improvement overall and builds upon existing safety measures designed to cautiously respond where possible and refuse unsafe requests.
Contextual Risk Detection Improves ChatGPT Safety Responses
ChatGPT now assesses evolving conversational cues to proactively mitigate potential harm, a capability demonstrated through substantial performance gains in internal safety evaluations. A key advancement lies in ChatGPT’s ability to recognize harmful intent emerging over the course of a conversation, rather than simply flagging explicit requests. Internal evaluations reveal significant improvements; in long, single-conversation scenarios, safe-response performance increased by 50 percent in suicide and self-harm scenarios, and by 16 percent in harm-to-others cases. These gains are further substantiated by results on GPT‑5.5 Instant, where safe-response performance improved by 52 percent in harm-to-others cases and 39 percent in suicide and self-harm cases. To facilitate this contextual understanding, OpenAI developed brief, factual notes about earlier interactions that might be relevant to current safety concerns.
These summaries, created by a specialized model, are temporary and narrowly focused, receiving an average safety relevance score of 4.34 out of 5 and a factuality score of 4.34 out of 5 in evaluations. The company emphasizes that these improvements haven’t compromised the quality of everyday conversations, with testing showing no meaningful user preference between responses with or without the added safety context.
Safety Summaries Capture Evolving Signals Across Conversations
OpenAI is addressing a critical nuance in AI safety: recognizing escalating risk within extended conversations. A key element of this approach is the introduction of brief factual notes generated by a specialized model to capture relevant context from earlier parts of a conversation. These summaries aren’t intended for personalization or long-term memory, but rather as temporary tools to help ChatGPT distinguish between benign requests and those signaling potential harm. “These summaries are created by a model trained for safety reasoning tasks and are narrowly scoped, kept only for a limited time, and used only when relevant to a serious safety concern,” explains OpenAI. Further substantiating these gains, testing on GPT‑5.5 Instant revealed a 52 percent improvement in harm-to-others safe responses and a 39 percent improvement in suicide and self-harm cases.
The safety summaries themselves received high marks for relevance (4.34 out of 5) and factuality (4.34 out of 5) in over 4,000 evaluations, indicating accuracy and focus. OpenAI acknowledges this is a long-term challenge, with plans to explore applying similar methods to areas like biology and cybersecurity, while continually strengthening safeguards as models evolve and understanding deepens.
50 Percent Performance Gain in Suicide & Self-Harm Scenarios
OpenAI is prioritizing nuanced safety protocols within ChatGPT, moving beyond simple keyword detection to assess risk evolving across extended user interactions. The company reports substantial improvements in the system’s ability to identify subtle cues indicative of distress or harmful intent, particularly in sensitive areas like suicide and self-harm. This collaborative approach aims to ensure responses are informed by real-world expertise and appropriately sensitive to the complexities of human emotion. A key metric of this progress lies in internal evaluations designed to measure performance in challenging, high-risk scenarios. Specifically, tests revealed a 50 percent improvement in safe-response performance within long, single-conversation scenarios involving suicide or self-harm, demonstrating the model’s increased capacity to recognize patterns of escalating risk. Further bolstering these gains, GPT‑5.5 Instant, the current default model, showed a 39 percent improvement in suicide and self-harm cases. The model achieved a score of 4.34 out of 5, indicating both accuracy and focus.
Mental Health Experts Guide Policy and Training Updates
This proactive approach addresses a critical need; with hundreds of millions of interactions processed daily, the system now aims to discern potentially harmful intent as it emerges, rather than reacting to isolated statements. The emphasis on contextual understanding represents a shift from solely addressing acute crises to anticipating risk before it fully manifests, allowing for de-escalation or redirection toward support resources. This wasn’t simply a technical upgrade, but a deliberate integration of clinical expertise into the model’s policies and training. Experts from OpenAI’s Global Physicians Network, specializing in areas like forensic psychology and suicide prevention, guided decisions regarding the scope and duration of relevant contextual information. “These experts helped inform decisions around when safety summaries should be created, how much prior context may be relevant, and how long the model should consider that context when responding,” demonstrating a commitment to grounding AI safety in real-world understanding.
Improvements are demonstrably measurable; internal evaluations revealed a 50 percent increase in safe responses within extended single conversations involving suicide or self-harm, and a 16 percent improvement in harm-to-others cases. Further testing on GPT‑5.5 Instant showed a score of 4.34 out of 5. OpenAI intends to extend this contextual approach to other high-risk areas, including biology and cybersecurity, while maintaining rigorous safeguards.
A request that appears ordinary or ambiguous on its own may carry a very different meaning when viewed alongside earlier signs of distress or possible harmful intent.
