A recent study has explored the potential of artificial intelligence (AI) to augment human judgement in forecasting, with promising results. Researchers from top institutions investigated whether large language models (LLMs) can improve human forecasting accuracy by comparing three groups: one using a high-quality superforecasting LLM assistant, another using an overconfident and noisy LLM assistant, and a control group receiving a less advanced model.
The findings showed that interacting with each frontier LLM assistant significantly enhanced prediction accuracy by 24-28% compared to the control group, with the superforecasting assistant increasing accuracy by 41% when outliers were considered. These results suggest that access to a frontier LLM assistant can be a helpful decision aid in cognitively demanding tasks, but further research is needed to understand the robustness of this pattern and its implications for decision-making.
The potential of artificial intelligence (AI) to augment human judgement in forecasting has been explored in a recent study. Researchers from the London School of Economics and Political Science, Massachusetts Institute of Technology, Federal Reserve Bank of Chicago, University of California San Diego, and University of Pennsylvania investigated whether large language models (LLMs) can improve human forecasting accuracy.
The study involved 991 participants who were asked to answer six forecasting questions with the option to consult their assigned LLM assistant throughout. The researchers compared the performance of participants using two types of LLM assistants: one designed to provide high-quality “superforecasting” advice and another that was overconfident and provided noisy forecasting advice.
The results showed that interacting with each of these frontier LLM assistants significantly enhanced prediction accuracy by between 24 and 28% compared to a control group that received a less advanced model. Exploratory analyses revealed a pronounced outlier effect in one forecasting item, which suggested that the superforecasting assistant increased accuracy by 41% compared to 29% for the noisy assistant.
These findings suggest that access to a frontier LLM assistant can be a helpful decision aid in cognitively demanding tasks, even if it’s a noisy one. However, further research is needed to understand the robustness of this pattern, particularly when considering outliers.
Recent advances in artificial intelligence (AI) and large language models (LLMs) have demonstrated impressive capabilities across a wide range of complex and economically valuable tasks. These developments challenge previously held beliefs about the necessity of human cognition for many of these tasks.
The study highlights that LLMs can match or even exceed human performance in various domains, raising concerns about the potential negative effects on the human labor market in large parts of the knowledge economy. Understanding the current ability of LLMs to interface with economically central tasks requires a broad empirical study across domains.
Most knowledge work jobs require substantial reasoning capabilities that use data and patterns to make predictions or decisions. However, these tasks often involve complex cognitive processes that are difficult to replicate using AI alone. The researchers emphasize the need for further research into the potential of LLMs to augment human judgement in forecasting and other cognitively demanding tasks.
The study evaluated the effectiveness of two types of LLM assistants: one designed to provide high-quality superforecasting advice and another that was overconfident and provided noisy forecasting advice. The researchers compared participants using these assistants to a control group that received a less advanced model.
The results showed that interacting with each of these frontier LLM assistants significantly enhanced prediction accuracy by between 24 and 28% compared to the control group. Exploratory analyses revealed a pronounced outlier effect in one forecasting item, which suggested that the superforecasting assistant increased accuracy by 41% compared to 29% for the noisy assistant.
These findings suggest that access to a frontier LLM assistant can be a helpful decision aid in cognitively demanding tasks, even if it’s a noisy one. However, further research is needed to understand the robustness of this pattern, particularly when considering outliers.
The study examined whether LLM forecasting augmentation disproportionately benefits less skilled forecasters or degrades the wisdom-of-the-crowd by reducing prediction diversity. The researchers also investigated whether the effectiveness of LLM assistants varies with question difficulty.
However, the data did not consistently support these hypotheses. The results suggest that access to a frontier LLM assistant can be beneficial for human forecasters, regardless of their skill level or the difficulty of the questions.
The study concludes that access to a frontier LLM assistant can be a helpful decision aid in cognitively demanding tasks, even if it’s a noisy one. However, further research is needed to understand this pattern’s robustness and develop more effective human-AI collaboration tools.
As AI advances and becomes increasingly integrated into various domains, it is essential to explore its potential to augment human judgment and decision-making capabilities. The study provides valuable insights into the effectiveness of LLM assistants in forecasting and highlights the need for further research into their potential applications in cognitively demanding tasks.
Publication details: “AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy”
Publication Date: 2024-12-13
Authors: Philipp Schoenegger, Peter S. Park, Ezra Karger, Sean Trott, et al.
Source: ACM Transactions on Interactive Intelligent Systems
DOI: https://doi.org/10.1145/3707649
