Cyberattacks increasingly threaten critical infrastructure and digital security, yet current defences largely react to incidents after they happen. Mohammad Hammas Saeed and Howie Huang, from George Washington University, alongside their colleagues, address this challenge with SENTINEL, a novel framework for proactively identifying emerging cyber threats. The team demonstrates that discussions on social media platforms, such as Telegram, contain valuable early warning signs of malicious activity, as both attackers and cybersecurity professionals share information and coordinate online. SENTINEL combines analysis of language patterns with identification of coordination markers within these discussions, achieving a high degree of accuracy in linking online conversations to real-world attacks, and offering a significant step towards predictive cybersecurity. This research highlights the potential of harnessing open-source intelligence to anticipate and mitigate cyber risks before they fully materialise.
Predicting Cyberattacks With Online Data and AI
The research focuses on leveraging data from online sources, including social media and the dark web, and increasingly, artificial intelligence to proactively predict and understand cyberattacks. This represents a shift from simply reacting to attacks to attempting to anticipate them, with a strong emphasis on using natural language processing and machine learning techniques to extract meaningful insights from textual data. The work highlights the value of platforms like Twitter, Reddit, and dark web forums as sources of early warning signals, as these platforms often contain discussions and indicators of compromise before attacks are launched. Machine learning models are used to predict attack likelihood or categorize threats, while the emerging threat of AI-orchestrated espionage is acknowledged, alongside the need to leverage AI for defense, such as automated threat hunting and generative AI for analysis. Graph neural networks are also highlighted as a promising technique for modeling relationships between attackers, targets, and attack vectors. In essence, the research argues that a combination of data mining, natural language processing, machine learning, and graph analysis applied to online sources can significantly improve our ability to predict, understand, and defend against cyberattacks, acknowledging the evolving landscape where AI is becoming both a threat and a potential solution.
SENTINEL, Telegram Threat Detection via Multimodal Analysis
The research team developed SENTINEL, a novel framework for proactively detecting cyber attacks by analyzing signals from Telegram. This study pioneers a multi-modal approach, integrating language modeling with network analysis to identify emerging threats before they fully materialize. Researchers collected a substantial dataset of 365,000 messages from 16 public Telegram channels focused on cybersecurity and open source intelligence, forming the basis for SENTINEL’s predictive capabilities. SENTINEL harnesses large language models to interpret the semantic content of messages, identifying discussions related to cyber threats, vulnerabilities, and attack tools.
Complementing this linguistic analysis, the team employed graph neural networks to map and analyze coordination markers within the Telegram channels, revealing patterns of communication and information sharing among users. This network analysis captures how discussions evolve and spread, highlighting potential coordinated malicious activity or the emergence of new attack strategies. Experiments demonstrate that SENTINEL effectively aligns social media discussions with real-world cyber attacks, achieving a high F1 score of 0.89. The method’s strength lies in its ability to combine textual understanding with network dynamics, providing a more comprehensive and nuanced assessment of potential cyber threats than traditional detection methods.
Telegram Data Reveals Proactive Cyber Threat Detection
Scientists have developed SENTINEL, a new framework that proactively detects cyber attacks by analyzing discussions on Telegram. The team collected a comprehensive dataset of 365,000 messages from 16 public Telegram channels focused on cybersecurity and open source intelligence, revealing active dialogue surrounding cyber threats and observable shifts in language coinciding with real-world cyber incidents. The core of SENTINEL involves encoding daily aggregated online discussions into semantic embeddings, capturing the meaning and context of the conversations. These embeddings are used to construct a temporal-semantic graph, representing the evolution of discussions over time and highlighting structural dependencies between days.
Applying the GraphSAGE algorithm, the team generated graph embeddings, which, combined with text-based embeddings, were fed into a classifier designed to predict cyber events. Results show SENTINEL achieves an F1 score of 0.89 and an accuracy of 0.91, indicating a high degree of precision and recall in identifying potential cyber attacks. The research identified numerous instances of discussions related to malware, vulnerabilities, and ransomware within the Telegram data, delivering a powerful tool for enhancing situational awareness and providing early warnings for cybersecurity observations.
Early Cyberattack Detection via Social Media Analysis
Researchers have developed SENTINEL, a new framework for the early detection of cyber attacks by analyzing discussions on social media platforms. The system combines insights from large language models, which interpret the meaning of text, with graph neural networks that map relationships and temporal patterns within the data. By examining over 365,000 messages from cybersecurity-focused Telegram channels, the team successfully aligned online discussions with real-world cyber incidents, achieving a high F1-score of 0.89. This work demonstrates the potential of integrating linguistic analysis with network-based reasoning to anticipate cyber threats before they fully materialize, offering a proactive approach to cybersecurity. The findings underscore the importance of considering not only what is said online, but also how and when it is communicated, to identify emerging risks. Future research will focus on expanding the system to incorporate data from diverse sources, improving the modeling of time-dependent events, and refining the language processing capabilities.
👉 More information
🗞 SENTINEL: A Multi-Modal Early Detection Framework for Emerging Cyber Threats using Telegram
🧠 ArXiv: https://arxiv.org/abs/2512.21380
