The rise of artificial intelligence (AI) has led to an explosion of AI-generated content, including news texts. But how trustworthy are these texts? A recent study by Alberto Muñoz Ortiz, Carlos Gómez Rodríguez, and David Vilares compared human-written English news text with comparable large language model (LLM) output from six different LLMs. The results reveal significant differences between human-written texts and AI-generated texts, including linguistic patterns, biases, and emotional expression. This study highlights the importance of understanding these differences to ensure that AI-generated content is trustworthy and accurate.
Can AI-Generated News Texts Be Trusted?
The article “Contrasting Linguistic Patterns in Human and LLM-Generated News Text” by Alberto Muñoz Ortiz, Carlos Gómez Rodríguez, and David Vilares presents a quantitative analysis of human-written English news text compared to comparable large language model (LLM) output from six different LLMs. The study examines various measurable linguistic dimensions, including morphological, syntactic, psychometric, and sociolinguistic aspects.
Human Text vs. AI-Generated Text
The results reveal significant differences between human-written texts and AI-generated texts. Human texts exhibit more scattered sentence length distributions, a greater variety of vocabulary, and distinct use of dependency and constituent types. In contrast, AI-generated texts tend to have shorter constituents and optimized dependency distances. Additionally, humans tend to express stronger negative emotions, such as fear and disgust, and less joy compared to AI-generated texts.
Linguistic Biases in AI-Generated Text
The study also finds that AI-generated texts exhibit linguistic biases, including sexism, which is prevalent in human text but magnified in all LLMs except one. The toxicity of these models increases with their size, suggesting that larger models are more prone to generating biased and toxic language.
Objective Language vs. Human Text
AI-generated texts tend to use more numbers, symbols, and auxiliaries, indicating a focus on objective language rather than human-like expression. This is reflected in the increased use of pronouns in AI-generated texts, which suggests a more formal tone.
Conclusion
The study highlights the importance of understanding the linguistic patterns and biases present in AI-generated text. As AI-generated content becomes increasingly prevalent, it is essential to recognize the differences between human-written text and AI-generated text to ensure that AI-generated content is trustworthy and accurate.
What Makes Human Text Unique?
Human-written texts exhibit a range of characteristics that distinguish them from AI-generated texts. One key difference is the sentence length distribution, which is more scattered in human-written texts. This suggests that humans tend to use a variety of sentence lengths to convey meaning and express themselves effectively.
Vocabulary Variety
Another significant difference is the vocabulary variety used in human-written texts. Humans tend to use a wider range of words and phrases to convey complex ideas and emotions, whereas AI-generated texts rely more heavily on a limited set of words and phrases.
Dependency and Constituent Types
The study also finds that humans tend to use distinct dependency and constituent types, which are less common in AI-generated texts. This suggests that humans have a more nuanced understanding of language and grammar, which is reflected in their writing style.
Emotional Expression
Humans tend to express stronger negative emotions, such as fear and disgust, and less joy compared to AI-generated texts. This suggests that humans are more prone to expressing complex emotions and nuances in their writing, whereas AI-generated texts tend to focus on objective language.
Can AI-Generated Texts Be Used for News Reporting?
The study raises important questions about the use of AI-generated text in news reporting. While AI-generated text can be useful for generating large amounts of content quickly, it is essential to recognize the limitations and biases present in these texts.
Linguistic Biases
The study finds that AI-generated texts exhibit linguistic biases, including sexism, which is prevalent in human text but magnified in all LLMs except one. This suggests that AI-generated text can perpetuate harmful stereotypes and biases if not carefully curated.
Objective Language
AI-generated texts tend to use more numbers, symbols, and auxiliaries, indicating a focus on objective language rather than human-like expression. This can be useful for reporting factual information, but it may not be suitable for conveying complex emotions or nuanced ideas.
Conclusion
The study highlights the importance of understanding the linguistic patterns and biases present in AI-generated text. While AI-generated text can be useful for generating large amounts of content quickly, it is essential to recognize its limitations and biases before using it for news reporting.
What Can We Learn from This Study?
This study provides valuable insights into the differences between human-written texts and AI-generated texts. By understanding these differences, we can better appreciate the strengths and limitations of each type of text.
Human Text
The study highlights the importance of human creativity, nuance, and emotional expression in writing. Humans have a unique ability to convey complex ideas and emotions through language, which is reflected in their writing style.
AI-Generated Text
The study also emphasizes the importance of understanding the limitations and biases present in AI-generated text. While AI-generated text can be useful for generating large amounts of content quickly, it is essential to recognize its limitations and biases before using it for news reporting or other purposes.
Conclusion
This study provides valuable insights into the differences between human-written texts and AI-generated texts. By understanding these differences, we can better appreciate the strengths and limitations of each type of text and use them effectively in our communication.
Publication details: “Contrasting Linguistic Patterns in Human and LLM-Generated News Text”
Publication Date: 2024-08-23
Authors: Alberto Muñoz-Ortiz, Carlos Gómez‐Rodríguez and David Vilares
Source: Artificial Intelligence Review
DOI: https://doi.org/10.1007/s10462-024-10903-2
