The increasing reliance on artificial intelligence for everyday tasks, such as data analysis, presents a significant challenge as these systems are not always reliable. Yuvraj Virk and Dongyu Liu, from the University of California, Davis, along with their colleagues, investigate whether business professionals can accurately assess code generated by AI for marketing data analysis. Their research demonstrates that even when explicitly warned about potential errors and provided with clear explanations, participants frequently fail to identify critical flaws that could lead to poor decision-making. This finding highlights a crucial gap in ensuring responsible AI adoption, suggesting that non-programmers require additional support to effectively verify AI-generated results and avoid potentially unsafe or low-quality outcomes.
Researchers surveyed marketing and sales professionals to assess their ability to critically evaluate data analyses generated by Large Language Models (LLMs). Participants were presented with natural language explanations of the AI’s reasoning, repeatedly informed of the potential for errors, and explicitly prompted to identify them. Despite these measures, participants frequently failed to detect critical flaws that could compromise decision-making, many of which required no technical knowledge to recognise. To investigate the reasons behind this difficulty, researchers reformatted AI responses into clearly delineated steps and provided alternative approaches for each decision to support critical evaluation. While these changes had a positive effect, participants often struggled to recognise the flaws even with the improved presentation
Business User Understanding of AI Code
This research investigates how well business users, those without extensive programming expertise, can understand, verify, and trust code produced by Large Language Models (LLMs). The study highlights the challenges and opportunities in making AI-powered code generation accessible and beneficial for non-programmers in business contexts. Key findings demonstrate that usability is paramount for successful adoption; simply generating code isn’t enough, it must be comprehensible to the intended user. The research reveals that business users struggle to verify the correctness of AI-generated code, even for seemingly simple tasks, due to a lack of programming expertise to fully understand the logic and identify potential errors.
Without the ability to verify, users are hesitant to rely on the generated code, hindering its practical application. The paper explores the balance between appropriate trust and healthy skepticism, suggesting that providing explanations alongside the code, describing the logic, assumptions, and potential limitations, can improve understanding and trust. The research advocates for the development of tools that facilitate code inspection, testing, and debugging for non-programmers, including features like natural language explanations, visual representations of code logic, and simplified testing interfaces. The usability of AI-generated code is heavily influenced by the complexity of the task and the user’s domain expertise; simple tasks within a familiar domain are easier to understand and verify.
The paper employed a user study involving business users who were presented with AI-generated code for various tasks, such as data analysis and automation. Researchers measured comprehension, error detection rates, trust levels, and usability ratings. The research highlights the need for a shift in focus from simply generating code to generating usable code. Future research should explore effective explanation techniques, user-friendly verification tools, adaptive explanations tailored to user expertise, integration with business workflows, and the role of testing and debugging tools for non-programmers. In essence, this paper argues that the success of AI-powered code generation for business users depends not just on the power of the AI, but on its ability to create code that is accessible, understandable, and trustworthy for those who will ultimately use it
AI Analysis Errors Go Largely Undetected
Increasingly, professionals without programming experience are turning to artificial intelligence to perform complex data analysis, but research reveals a concerning inability to reliably detect errors in the AI’s output. A recent study investigated whether marketing and sales professionals could accurately assess analyses generated by AI, even when explicitly warned about potential mistakes and provided with clear explanations of the AI’s reasoning. The findings demonstrate that, despite possessing domain expertise and critical thinking skills, these professionals frequently fail to identify critical flaws that could lead to poor decision-making. Participants were repeatedly reminded that the AI is prone to errors and actively prompted to find them, yet still struggled to identify significant issues.
Many of these flaws required no specialized technical knowledge to recognize, highlighting a fundamental challenge in trusting AI-generated insights. The study revealed that while participants could identify valid flaws in some instances, they could not be consistently relied upon to ensure the accuracy of the AI’s analyses. Further investigation explored whether restructuring the AI’s responses, by presenting the analysis as a series of clearly defined steps and offering alternative approaches, could improve error detection. While this approach showed some positive effects, participants still struggled to reason through the AI’s logic and evaluate the proposed alternatives, demonstrating a limited ability to deeply engage with technical approaches.
The research suggests that simply providing more information or alternative viewpoints is not enough to overcome the challenge of verifying AI-generated data analysis. These findings have significant implications for the growing reliance on AI in professional settings. The study underscores the risk of unsafe or low-quality decisions when non-programmers adopt code-generating AI without sufficient oversight. It suggests that ensuring the reliability of AI is not solely a technical problem, but also a human-computer interaction challenge, requiring new approaches to present information and support critical evaluation. The research team concludes that business professionals cannot currently reliably verify AI-generated data analyses on their own, highlighting the need for further research into methods to improve safety and build trust in these powerful new tools
AI Analysis Verification Skills Remain Limited
This study investigates the ability of marketing and sales professionals to critically evaluate data analyses generated by artificial intelligence. The research demonstrates that even when participants are informed of the potential for errors and provided with explanations of the AI’s reasoning, they frequently fail to identify critical flaws that could lead to poor decision-making. These flaws, importantly, often do not require technical expertise to recognize. While presenting AI responses in a more structured format, alongside alternative approaches, did improve participants’ ability to detect errors, significant challenges remain.
The findings suggest that business professionals currently lack the skills and engagement necessary to reliably verify AI-generated data analyses independently. This highlights the need for both more reliable AI systems to reduce the burden on end-users, and improved explanations that actively encourage critical thinking through clear visualizations and accessible alternatives. The authors acknowledge limitations including small sample sizes and the possibility that results may not extend to all non-technical user groups or tasks. Future work could explore methods for building user skills to empower more effective engagement with AI outputs.
👉 More information
🗞 Non-programmers Assessing AI-Generated Code: A Case Study of Business Users Analyzing Data
🧠 ArXiv: https://arxiv.org/abs/2508.06484
