The rise of artificial intelligence now extends to the realm of data collection, with researchers exploring the potential of AI to conduct voice-based interviews, but questions remain about their reliability and effectiveness. Shreyas Tirumala, Nishant Jain, and Danny D. Leybzon from VKL Research, Inc., along with Trent D. Buskirk from Old Dominion University, investigate the capabilities of these “AI interviewers” and compare them to traditional Interactive Voice Response systems. Their work evaluates how well AI handles the nuances of spoken conversation, including accurately recording answers, recognising emotions, and adapting to complex questioning, and reveals that while AI interviewers currently outperform existing systems, limitations in transcription accuracy and conversational reasoning mean their suitability for in-depth qualitative research remains context-dependent. This analysis is crucial as researchers increasingly consider AI tools to streamline data gathering and improve the scale of their studies.
The emergence of “AI interviewers” presents a novel way to administer voice-based surveys with respondents in real-time. This position paper reviews emerging evidence to understand when such AI interviewing systems are fit for purpose for collecting data within both quantitative and qualitative research contexts. Field studies suggest that AI interviewers already exceed IVR capabilities for both quantitative and qualitative data collection.
AI Adapts Interviews and Surveys Effectively
The central topic is the increasing use of Artificial Intelligence (AI), specifically Large Language Models (LLMs), to automate and improve the process of conducting surveys and interviews. It moves beyond simple automation to explore how AI can create adaptive interviewing experiences, offering benefits like automation, scalability, and potentially improved data quality through techniques like audio checks and virtual humans encouraging more honest responses. A significant advantage lies in adaptive interviewing, where LLMs tailor questions and follow-up probes based on individual respondent answers, creating a more personalized and engaging experience. This contrasts with static questionnaires and offers the potential to reduce social desirability bias, as respondents may be more willing to disclose sensitive information to a non-human interviewer.
Furthermore, AI systems can collect multimodal data, combining text-based questions with audio analysis to gain richer insights and improve accessibility for individuals with disabilities. This includes multimodal interaction, combining text, speech, and visual cues, and the use of audio surveys for data collection and analysis. The document acknowledges challenges including potential bias in AI algorithms, the risk of hallucinations and factual inaccuracies, and the need to protect respondent data and ensure privacy.
Usability and accessibility are also crucial considerations, as AI-powered systems must be user-friendly for all individuals. Building rapport and trust with respondents can be more difficult with a non-human interviewer, and ethical considerations regarding transparency, accountability, and informed consent are paramount. Ensuring the AI system can effectively recall and utilize information from previous responses also presents a challenge. Key findings demonstrate that the mode of interview significantly affects response quality and disclosure, and virtual humans can increase disclosure rates. AI can potentially reduce biases and improve the accuracy of collected data, and adaptive interviewing is a promising approach.
Prompt engineering is critical for generating relevant and accurate responses, and multimodal interaction can enhance engagement. The document suggests several areas for future research, including developing more robust and reliable LLMs, addressing bias in AI algorithms, and improving the usability and accessibility of AI-powered systems. In conclusion, the document paints a picture of a rapidly evolving field with significant potential to transform the way surveys and interviews are conducted. The study assesses performance across two key areas: input/output performance, and verbal reasoning. Findings demonstrate that AI interviewers already surpass IVR systems in both quantitative and qualitative data collection, though limitations remain for in-depth qualitative research. Traditional IVR systems rely heavily on pre-recorded responses and touch-tone input, requiring substantial alterations to survey questions.
IVR systems lack the dynamic capabilities needed for effective qualitative data collection, struggling with open-ended questions and rendering focus groups or multi-party interviews impossible. In contrast, AI interviewers offer greater flexibility, though challenges persist. While AI systems can handle open-ended questions, their ability to identify emotional context and tailor responses accordingly is still developing. Furthermore, the capacity to probe with unscripted follow-up questions, a crucial element of qualitative research, remains uneven, impacting the depth and quality of responses. The technology appears particularly well-suited to situations where human interviewer availability is limited, complex probing is unnecessary, post-processing of data is feasible, and emotional nuance is not central to the research goals. Notably, AI interviewers may also mitigate social desirability bias when addressing sensitive topics. Further research is needed to understand how these systems perform relative to human interviewers and to assess respondent representativeness beyond student populations. Key areas for future investigation include exploring respondent preferences for AI versus human interviewers, understanding the impact of conversational latency, and examining the technology’s effectiveness in both outbound and inbound campaign settings. The authors emphasize that while promising, AI interviewers are still in the early stages of development and require continued evaluation to realize their potential fully.
👉 More information
🗞 Mic Drop or Data Flop? Evaluating the Fitness for Purpose of AI Voice Interviewers for Data Collection within Quantitative & Qualitative Research Contexts
🧠 ArXiv: https://arxiv.org/abs/2509.01814
