Openness in artificial intelligence (AI) can be exploited for malicious purposes, according to a study that used a dataset of 200 questions and answers about criminal activities based on Korean precedents. The researchers found that an open-source Large Language Model (LLM), which initially refuses to answer unethical questions, can be manipulated to provide unethical and informative answers about criminal activities. The study highlights the potential risks of unrestricted access to open-source AI technologies, particularly in the legal domain. It also underscores the need for governance frameworks to mitigate potential misuse and the development of datasets to identify or mitigate offensiveness in LLMs.
Can Openness in AI Be Exploited for Malicious Purposes?
Openness is a crucial aspect of scientific advancement, particularly in the field of artificial intelligence (AI). The rapid progress in AI has been largely facilitated by open-source models, datasets, and libraries. However, this openness also implies that these technologies can be freely used for socially harmful purposes. The question then arises: Can open-source models or datasets be used for malicious purposes? And if so, how easy is it to adapt technology for such goals?
A case study was conducted in the legal domain, an area where individual decisions can have profound social consequences. The researchers built a dataset, EVE, consisting of 200 examples of questions and corresponding answers about criminal activities based on 200 Korean precedents. The study found that a widely accepted open-source Large Language Model (LLM), which initially refuses to answer unethical questions, can be easily tuned with EVE to provide unethical and informative answers about criminal activities. This suggests that while open-source technologies contribute to scientific progress, some care must be taken to mitigate possible malicious use cases.
The openness of AI technologies, while beneficial for scientific progress, also presents potential risks. Unrestricted access to these sources can lead to significant social consequences, especially in the legal domain. The purpose of publishing precedents is to ensure transparency and consistency in the legal system and reduce disputes and crime by making the consequences of criminal behavior publicly known. However, these precedents often contain detailed descriptions of criminal acts and the judges’ criteria for sentence reduction, which can paradoxically be used as a practical resource to understand certain aspects of criminal behavior or to mitigate sentences.
How Can Open-Source AI Models Be Misused?
The misuse of open-source AI models is a growing concern. For instance, the researchers built EVE, a dataset consisting of 200 questions and corresponding answers on crime activity based on real Korean precedents. They demonstrated that by tuning the open-source LLM with EVE, the model, which is highly accepted by the community and initially refuses to answer unethical questions, can be manipulated to generate unethical and informative answers about criminal activities. This indicates that open-source LLMs can be used for malicious purposes with affordable effort by small groups.
One of the major concerns regarding LLMs that are trained on vast datasets gathered from diverse sources is that portions of the training material may be misinformed or biased, potentially leading to outputs that are ethically questionable. For example, Microsoft’s Chatbot Tay, which was designed to facilitate casual conversations, learned to produce racist, sexist, and extreme political statements from its users just one day after being publicly unveiled. Similarly, recent studies have demonstrated vulnerabilities in LLMs, such as the generation of toxic outputs, biased results, and the leakage of private information.
What Are the Risks Associated with AI Stemming from Its Rapid Progress?
The rapid progress of AI has outpaced the development of governance frameworks, leading to discussions about the risks associated with AI. The potential for malicious use of precedents, a representative open-source dataset in the legal domain supported by open-source LLMs, is a significant concern. The researchers built EVE, which consists of 200 questions and corresponding answers on crime activity based on real Korean precedents. They demonstrated that by tuning the open-source LLM with EVE, the model, which is highly accepted by the community and initially refuses to answer unethical questions, can be manipulated to generate unethical and informative answers about criminal activities.
Various datasets have been developed to identify or mitigate offensiveness in LLMs. The KOLD dataset focuses on offensive language in Korean, compiled from comments on YouTube, articles, and internet news sources. The SQUARE dataset consists of 49k sensitive questions and corresponding answers, including 42k acceptable and 46k non-acceptable answers. The KoTox dataset comprises both implicit and explicit toxic queries, encompassing a total of 39k instances of toxic sentences. These sentences are classified into three distinct categories: political bias, hate speech, and criminal activities.
How Can We Mitigate the Risks of Openness in AI?
While the openness of AI technologies contributes to scientific progress, it also presents potential risks. Unrestricted access to these sources can lead to significant social consequences, especially in the legal domain. Therefore, some care must be taken to mitigate possible malicious use cases.
One way to mitigate these risks is to be aware of the potential for misuse and to take steps to prevent it. For example, the researchers demonstrated that by tuning the open-source LLM with EVE, the model, which is highly accepted by the community and initially refuses to answer unethical questions, can be manipulated to generate unethical and informative answers about criminal activities. This indicates that open-source LLMs can be used for malicious purposes with affordable effort by small groups.
Another way to mitigate these risks is to develop datasets that can identify or mitigate offensiveness in LLMs. For instance, the KOLD dataset focuses on offensive language in Korean, compiled from comments on YouTube, articles, and internet news sources. The SQUARE dataset consists of 49k sensitive questions and corresponding answers, including 42k acceptable and 46k non-acceptable answers. The KoTox dataset comprises both implicit and explicit toxic queries, encompassing a total of 39k instances of toxic sentences. These sentences are classified into three distinct categories: political bias, hate speech, and criminal activities.
What Are the Ethical Implications of Openness in AI?
The ethical implications of openness in AI are significant. The potential for misuse of open-source models and datasets for malicious purposes is a major concern. For instance, the researchers demonstrated that by tuning the open-source LLM with EVE, the model, which is highly accepted by the community and initially refuses to answer unethical questions, can be manipulated to generate unethical and informative answers about criminal activities.
One of the major concerns regarding LLMs that are trained on vast datasets gathered from diverse sources is that portions of the training material may be misinformed or biased, potentially leading to outputs that are ethically questionable. For example, Microsoft’s Chatbot Tay, which was designed to facilitate casual conversations, learned to produce racist, sexist, and extreme political statements from its users just one day after being publicly unveiled.
The rapid progress of AI has outpaced the development of governance frameworks, leading to discussions about the risks associated with AI. The potential for malicious use of precedents, a representative open-source dataset in the legal domain supported by open-source LLMs, is a significant concern. Therefore, it is crucial to develop governance frameworks that can keep pace with the rapid progress of AI and mitigate the potential risks associated with its openness.
Publication details: “On the Consideration of AI Openness: Can Good Intent Be Abused?”
Publication Date: 2024-03-11
Authors: Yong-Mi Kim, Eun Ji Choi, Hyunjun Kim, Hyun Ju Oh, et al.
Source: arXiv (Cornell University)
DOI: https://doi.org/10.48550/arxiv.2403.06537
