ChatGPT effectively identifies inappropriate online comments, demonstrating improved accuracy with iterative refinement to version 6. However, detection of targeted content exhibits variability and a higher rate of false positives when compared with expert evaluation, indicating a need for continued development and contextual awareness.
The escalating volume of online user-generated content presents a significant challenge for platforms seeking to maintain civil discourse and protect users from harmful interactions. Automated content moderation, utilising artificial intelligence, offers a potential solution, but requires careful evaluation of accuracy and reliability. Researchers at Vrije Universiteit Amsterdam – Baran Barbarestani, Isa Maks, and Piek Vossen – address this need in their study, ‘Assessing and Refining ChatGPT’s Performance in Identifying Targeting and Inappropriate Language: A Comparative Study’, which details a rigorous assessment of ChatGPT’s capabilities in detecting both abusive language and targeted harassment within online commentary, comparing its performance against human annotation and expert judgement.
The increasing volume of user-generated content online presents a substantial challenge for platforms aiming to maintain constructive dialogue and safeguard users from harmful interactions. This necessitates the development of effective content moderation solutions. A recent study by researchers at Vrije Universiteit Amsterdam – Baran Barbarestani, Isa Maks, and Piek Vossen – details a rigorous evaluation of ChatGPT’s ability to detect both abusive language and targeted harassment within online communications.
Social media platforms process an immense daily influx of user-generated content, rendering manual moderation both impractical and inefficient. This drives demand for scalable, automated solutions. The study investigates ChatGPT’s accuracy in identifying inappropriate content and, crucially, its ability to recognise targeted abuse directed at specific individuals – a task requiring nuanced linguistic understanding and intent recognition. Researchers compared ChatGPT’s performance against both human crowd-sourced annotations and expert evaluations to comprehensively assess its accuracy, scope of detection, and overall consistency in identifying harmful online behaviour.
The investigation reveals that ChatGPT exhibits strong capabilities in detecting generally inappropriate content, achieving notable improvements in accuracy with iterative model refinements, particularly in Version 6. This suggests that large language models (LLMs) – artificial intelligence algorithms trained on vast datasets of text – have the potential to significantly contribute to automated content moderation systems. However, performance varies considerably when identifying targeted abuse, with the model exhibiting higher false positive rates compared to assessments made by human experts. This indicates a current limitation in its ability to discern nuanced forms of harassment or abuse directed at specific individuals.
Researchers emphasise the importance of contextual understanding for accurate moderation, noting that while ChatGPT effectively flags overtly offensive language, it struggles with identifying subtle forms of abuse or content that relies on implicit meaning. The model’s tendency towards false positives raises concerns about potential censorship of legitimate expression, necessitating careful calibration and oversight to ensure fairness and protect free speech. Reliance solely on automated systems without human review could lead to unintended consequences and erode trust in online platforms.
This work contributes to the growing body of knowledge surrounding AI-driven content moderation, demonstrating the potential of LLMs to augment existing moderation practices while also identifying crucial areas for improvement. Continuous model refinement, coupled with a focus on contextual awareness and human oversight, remains essential for developing robust and reliable automated systems capable of mitigating harmful online behaviour and fostering safer online communities.
👉 More information
🗞 Assessing and Refining ChatGPT’s Performance in Identifying Targeting and Inappropriate Language: A Comparative Study
🧠 DOI: https://doi.org/10.48550/arXiv.2505.21710
