Securing web applications requires robust web attack detection, and researchers are continually seeking methods to improve identification of malicious activity. Kangqiang Luo (Guangzhou Institute of Technology, Xidian University), Yi Xie, and Shiqian Zhao (Nanyang Technological University) et al. present a novel approach, WADBERT, designed to address limitations in current deep learning techniques. Their work significantly advances the field by effectively embedding irregular HTTP requests and enabling precise identification of malicious parameters, something previous methods often failed to achieve. Through innovative use of Hybrid Granularity Embedding and BERT models, WADBERT achieves state-of-the-art performance on benchmark datasets, demonstrating F1-scores of 99.63% and 99.50% on the CSIC2010 and SR-BH2020 datasets respectively, and offering a substantial improvement in web application security.
WADBERT detects web attacks via semantic embeddings
Scientists have developed a new web attack detection model, named WADBERT, which demonstrates high accuracy in identifying malicious requests and pinpointing the specific parameters involved in attacks. The research addresses limitations in existing deep learning methods for web security, particularly their difficulty in handling irregular HTTP requests and modelling unordered parameters. Subsequently, URLBERT and SecBERT are utilised to extract semantic features from these embeddings, enabling a more nuanced understanding of the request structure.
The team achieved a comprehensive payload feature representation by fusing parameter-level features extracted by SecBERT through a multi-head attention mechanism. This innovative fusion process allows the model to effectively consider the relationships between parameters, regardless of their order within the HTTP request. Finally, these concatenated URL and payload features are input into a linear classifier, generating a final detection result. Experiments conducted on the CSIC2010 and SR-BH2020 datasets validate the efficacy of WADBERT, achieving F1-scores of 99.63% and 99.50% respectively, and significantly surpassing the performance of current state-of-the-art methods.
This breakthrough reveals a significant advancement in web application security, moving beyond simple pattern matching to a more sophisticated understanding of request semantics. The HGE technique allows WADBERT to effectively process the symbol-dense and often obfuscated nature of URLs and payloads, overcoming a key limitation of previous embedding methods. By modelling payload parameters as unordered sets, the research accurately reflects the functional equivalence of requests with varying parameter order, improving robustness and detection rates. Furthermore, the multi-head attention mechanism not only enhances performance but also provides a degree of interpretability, allowing the identification of specific malicious parameters and improving attack traceability.
The study establishes a new benchmark for web attack detection, demonstrating substantial improvements in F1-score, 1.23% on SR-BH2020 and 0.64% on CSIC2010, compared to existing deep learning approaches. Ablation studies confirm the importance of both HGE and the multi-head attention mechanism in achieving these results. This work opens possibilities for more proactive and targeted security responses, enabling security teams to not only detect attacks but also to quickly identify and address their root causes, ultimately strengthening the defence of web applications against evolving threats.
Scientists Method
Scientists developed WADBERT, a web attack detection model designed to identify malicious requests within HTTP traffic. Subsequently, URLBERT and SecBERT were utilised to extract semantic features from the URL and payload components respectively, enabling a nuanced understanding of request characteristics. Parameter-level features, extracted using SecBERT, underwent fusion via a multi-head attention mechanism, resulting in a comprehensive feature representation of the payload.
This innovative approach addresses limitations in existing methods by effectively embedding irregular HTTP requests and modelling unordered parameters. The study pioneered a method to overcome the inadaptability of conventional embedding techniques, such as BPE and WordPiece, which struggle with the non-standard tokens and symbol-dense strings common in URLs and payloads. Researchers specifically tackled the issue of parameter order irrelevance in HTTP requests, recognising that functionally equivalent requests can have parameters arranged in different sequences. WADBERT’s design considers the combinatorial relationships between payload parameters, rather than treating them as an ordered sequence, improving detection accuracy.
Experiments employed the CSIC2010 and SR-BH2020 datasets to validate WADBERT’s efficacy. The system delivers F1-scores of 99.63% on the CSIC2010 dataset and 99.50% on the SR-BH2020 dataset, demonstrating significant performance gains over state-of-the-art methods. Finally, the concatenated URL and payload features were fed into a linear classifier to generate a final detection result, enabling precise identification of malicious parameters and enhancing attack traceability. This method achieves not only high detection accuracy but also the ability to pinpoint the sources of attacks, increasing its practicality for security responses.
WADBERT achieves high accuracy detecting web attacks
Scientists have developed WADBERT, a novel web attack detection model designed to identify malicious requests within HTTP traffic. The research addresses limitations in existing methods regarding the embedding of irregular HTTP requests and the modelling of unordered parameters. Experiments utilising the CSIC2010 dataset yielded an F1-score of 99.63% for WADBERT, demonstrating its high efficacy in detecting web attacks. Further validation on the SR-BH2020 dataset resulted in an F1-score of 99.50%, confirming the model’s robust performance across different datasets. Parameter-level features, extracted by SecBERT, were then fused using a multi-head attention mechanism, creating a comprehensive payload feature representation. This innovative approach allows WADBERT to effectively model the combinatorial relationships between payload parameters, rather than treating them as a simple ordered sequence. Measurements confirm that WADBERT not only achieves high detection accuracy but also facilitates the precise identification of malicious parameters within HTTP requests.
The concatenated URL and payload features were fed into a linear classifier, producing the final detection result and enabling accurate attack traceability. Ablation studies revealed that removing key components, such as HGE and the multi-head attention mechanism, resulted in performance degradation, highlighting their critical roles in the model’s success. Attention weight visualisation further demonstrates WADBERT’s ability to effectively pinpoint malicious parameters, enhancing the interpretability of its predictions. The work achieves an accuracy of 99.70% on the CSIC2010 dataset and 99.32% on the SR-BH2020 dataset, significantly outperforming existing state-of-the-art methods in terms of accuracy, recall, precision, and F1-score. These findings establish WADBERT as a highly useful tool for detecting web attacks with both high accuracy and strong interpretability.
WADBERT accurately detects web attacks via attention mechanisms
Researchers have developed WADBERT, a novel dual-channel web attack detection model designed to improve both accuracy and interpretability. A key innovation is the incorporation of a multi-head attention mechanism to capture relationships between unordered parameters within HTTP requests, enabling precise identification of malicious components. Experimental results on the CSIC2010 and SR-BH2020 datasets demonstrate WADBERT’s effectiveness, achieving F1-scores of 99.63% and 99.50% respectively, surpassing the performance of existing state-of-the-art methods.
Ablation studies confirmed the efficacy of the designed components, including the hybrid granularity embedding, multi-head attention mechanism, and dual-channel fusion. The authors acknowledge limitations in the size and diversity of the training data, which could affect generalisation to unseen attacks. Future work will focus on expanding the dataset and employing adversarial training to further enhance the model’s robustness against sophisticated threats.
👉 More information
🗞 WADBERT: Dual-channel Web Attack Detection Based on BERT Models
🧠 ArXiv: https://arxiv.org/abs/2601.21893
