LLMs Enhance Phishing Detection, But Explainability Remains a Key Challenge.

The escalating sophistication of phishing attacks continues to challenge conventional cybersecurity defences, demanding increasingly nuanced approaches to detection. Current systems often struggle to replicate the human reasoning process necessary to identify malicious intent, particularly as attackers refine their techniques. Researchers are now investigating the potential of large language models (LLMs), a type of artificial intelligence, to not only classify suspicious emails with greater accuracy but also to provide transparent and reliable explanations for their judgements. A study by Shova Kuikel, Aritran Piplai, and Palvi Aggarwal, all from the University of Texas at El Paso, details an evaluation of several transformer-based LLMs – including BERT, Llama, and Wizard – fine-tuned for phishing detection. Their work, entitled ‘Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability’, employs techniques such as binary sequence classification, contrastive learning and direct preference optimisation to assess both predictive performance and the internal consistency of the models’ reasoning, utilising a metric called ConsistenCy based on SHAPley values (CC SHAP) to measure alignment between predictions and explanations.

Recent research explores the application of large language models (LLMs) to improve phishing detection, a continuing challenge in cybersecurity, with a focus on both the accuracy of identifying malicious emails and the reliability of the explanations these models provide. The study successfully adapts transformer-based models – including BERT, Llama, and Wizard – utilising techniques such as Binary Sequence Classification, Contrastive Learning, and Direct Preference Optimisation to enhance their ability to recognise subtle indicators of phishing attempts, thereby strengthening automated threat detection systems.

The research assesses model performance not solely on classification accuracy, but also on the faithfulness and internal consistency of the explanations generated. It employs a Consistency measure based on SHAPley values (CC SHAP), a concept from cooperative game theory applied to machine learning to explain individual predictions. CC SHAP quantifies the degree to which the model’s explanations align with the specific tokens – individual words or parts of words – that drive those predictions, revealing the reasoning behind the model’s decisions and offering valuable insights for security analysts.

Results demonstrate a trade-off between predictive accuracy and explanation faithfulness across the tested models, necessitating a comprehensive approach to evaluation. Llama models consistently exhibit stronger alignment between predictions and explanations, as quantified by the CC SHAP metric, despite achieving lower overall classification accuracy. Conversely, the Wizard model attains superior predictive accuracy but at the expense of reduced explanation faithfulness, suggesting a potential lack of transparency in its reasoning process. This implies the model may be identifying patterns without necessarily understanding why those patterns indicate a phishing attempt.

Expanding the evaluation to encompass a broader spectrum of phishing tactics and email characteristics remains crucial, ensuring the robustness and generalisability of LLM-based detection systems. The resilience of these models against adversarial attacks, specifically designed to deliberately mislead the classification process, also requires thorough assessment, safeguarding against malicious attempts to circumvent detection. Such attacks might involve subtle alterations to email content intended to exploit vulnerabilities in the model’s reasoning.

Future work should investigate methods to simultaneously optimise both accuracy and faithfulness, advancing the field of explainable AI in cybersecurity. This could involve exploring novel training objectives that explicitly penalise inconsistencies between predictions and explanations, or developing techniques to refine explanations post hoc – after the initial prediction – to enhance their coherence and trustworthiness. Further investigation into the interplay between model architecture, fine-tuning techniques, and explanation faithfulness also warrants attention.

Ultimately, the goal is to develop LLMs that not only detect phishing attempts with high accuracy but also provide transparent and reliable explanations, fostering a more secure and trustworthy digital environment. The findings demonstrate a need for a balanced approach to model evaluation and deployment, and contribute to a growing body of work focused on building artificial intelligence systems that are both effective and understandable.

👉 More information
🗞 Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability
🧠 DOI: https://doi.org/10.48550/arXiv.2506.13746

Quantum News

Quantum News

As the Official Quantum Dog (or hound) by role is to dig out the latest nuggets of quantum goodness. There is so much happening right now in the field of technology, whether AI or the march of robots. But Quantum occupies a special space. Quite literally a special space. A Hilbert space infact, haha! Here I try to provide some of the news that might be considered breaking news in the Quantum Computing space.

Latest Posts by Quantum News:

Toyota & ORCA Achieve 80% Compute Time Reduction Using Quantum Reservoir Computing

Toyota & ORCA Achieve 80% Compute Time Reduction Using Quantum Reservoir Computing

January 14, 2026
GlobalFoundries Acquires Synopsys’ Processor IP to Accelerate Physical AI

GlobalFoundries Acquires Synopsys’ Processor IP to Accelerate Physical AI

January 14, 2026
Fujitsu & Toyota Systems Accelerate Automotive Design 20x with Quantum-Inspired AI

Fujitsu & Toyota Systems Accelerate Automotive Design 20x with Quantum-Inspired AI

January 14, 2026