Researchers are tackling the challenge of limited spoken English practice in Indian schools, a significant barrier to economic opportunity for many young people. Sneha Shashidhara from Ashoka University, alongside Vivienne Bihe Chi, Abhay P Singh, Lyle Ungar and Sharath Chandra Guntuku, all from the University of Pennsylvania, present compelling findings from a six-day field study in Delhi schools investigating the use of voice-based chatbots for English conversation practice. This multi-stakeholder research , incorporating the views of students, teachers and principals , reveals strong demand for such tools and demonstrable improvements in student confidence. Importantly, the study highlights a crucial disconnect between student desires for free-flowing conversation and administrative priorities around assessment, offering vital design recommendations for effective and sustainable educational technology in multilingual, low-resource settings.
Chatbot boosts English speaking, Delhi schools
Scientists have demonstrated a voice-based chatbot effectively boosts spoken English proficiency amongst low-income Indian youth, addressing a critical gap in educational resources. Researchers investigated the deployment of this chatbot across four low-resource schools in Delhi, meticulously capturing the perspectives of students, teachers, and principals through a six-day field study combining detailed observations and insightful interviews. The study confirms a high demand for such tools across all stakeholder groups, with particularly notable gains observed in student speaking confidence, a crucial factor for economic mobility. This breakthrough reveals a tension in long-term adoption strategies, as students favoured open-ended conversational practice for fluency, while school administrators prioritised curriculum-aligned assessment and measurable outcomes.
The research team achieved a comprehensive multi-stakeholder analysis, identifying key design considerations for voice-enabled chatbots operating within low-resource, multilingual contexts. Specifically, the study highlights the critical need for more intelligible speech output tailored for non-native learners, ensuring clear comprehension and effective practice. Furthermore, the team advocates for one-tap interactions and simplified interfaces to maximise accessibility and ease of use, particularly given potential infrastructural limitations. Actionable analytics for educators are also paramount, providing data-driven insights to support and enhance teaching practices.
Experiments show that this approach moves beyond simple language learning, offering valuable lessons for the co-design of future AI-based educational technologies. The work establishes a framework for socially sustainable technologies within the complex ecosystem of low-resource schools, acknowledging the unique challenges and opportunities present in these environments. This study employed an interpretivist, multiple-case qualitative design, structured into two phases: initial real-time observation with immediate feedback, followed by extended use with delayed feedback, allowing for a nuanced understanding of evolving perceptions. Researchers focused on three key research questions: how students, teachers, and principals experience the chatbot; what technical and pedagogical factors influence engagement; and what design adaptations could optimise usability and educational value. The findings detail confidence trajectories and turn-taking dynamics unique to multilingual classrooms, with a sample size of 23 students, 6 teachers, and 5 principals. This work opens pathways for integrating similar technologies with national educational platforms, potentially scaling impact and addressing systemic inequalities in access to quality English language education.
ChatFriend Prototype, Real-time Speech and GPT-4o Interaction
Scientists engineered ChatFriend, a voice-based chatbot prototype to facilitate conversational English practice for students in low-resource settings. The web-based application, constructed with React, was hosted on AWS S3 and delivered via CloudFront, ensuring rapid performance globally. Students engaged in spoken dialogues centred around everyday topics and school materials, such as favourite sports or best friends, initiating interaction by pressing a “hold-to-talk” button to record their responses. Their audio was transcribed in real-time using Whisper-1, a speech recognition system, before being processed by a GPT-4o-mini model for generating appropriate conversational turns.
The research team implemented a custom prompt, tailored to the conversation topic and individual student profile, to generate streaming responses, which were then vetted by OpenAI’s Moderation API to ensure content safety. Subsequently, the vetted text was converted into speech using Google Text-to-Speech and streamed back to the user, providing auditory reinforcement of the target language. To aid comprehension, the interface displayed a real-time transcript of the entire conversation, offering a visual aid alongside the spoken exchange. Recognizing the multilingual context, the system incorporated a bilingual support feature, enabling students to pose clarifying questions in Hindi, receiving English translations or explanations in return.
Before each session, teachers completed a brief sign-up process, inputting student names and grade levels, streamlining the initial setup. Upon receiving the research team-provided Android tablet, students were directed to the main interface, where the “Best Friend” topic was preselected, initiating the dialogue with a spoken prompt and accompanying text bubble. Students responded by holding the microphone button during speech and releasing it upon completion, with transcriptions appearing on-screen in either English or, if Hindi was used, as an English translation. The chatbot consistently delivered responses in English via synthesized speech and text, maintaining consistent exposure to the target language while accommodating clarification requests in the student’s native tongue.
This six-day field study was conducted across four affordable private schools and one additional school for administrative insights, located in the Badarpur and Shakti Vihar neighbourhoods of Delhi, India. Researchers employed purposive maximum-variation sampling to select schools differing in management type and digital infrastructure, ensuring diverse contexts for the study. The innovative methodology enabled the team to capture perspectives from students, teachers, and principals through observations and interviews, revealing high demand for the chatbot and notable gains in student speaking confidence.
ChatFriend reception and barriers in Delhi schools require
Scientists investigated the deployment of a voice-based chatbot, ChatFriend, for English conversation practice across four low-resource schools in Delhi, revealing high demand from students, teachers, and principals. Through a six-day field study, researchers captured perspectives via observations and interviews, confirming a significant need for accessible spoken English practice opportunities. The team measured initial reception of ChatFriend, finding enthusiasm in schools valuing conversational fluency as a critical language skill, with administrators appreciating its potential as a pedagogical aid and suggesting its use in discussing science and other lessons. However, other schools expressed caution, citing structural barriers like insufficient internet infrastructure, limited classroom time, and increased demands on teachers.
Researchers recorded that 36% of initial student utterances on Day 1 consisted of only three tokens or less, often direct responses like “Yes” or short answers, demonstrating initial hesitancy in engaging with the chatbot. Data shows that 95% of students perceived English as essential for communication, job interviews, or future mobility, while 60% also cited international travel as a motivating factor. Furthermore, the study identified a divergence in long-term adoption vision: students favoured open-ended conversational practice, while administrators prioritised curriculum alignment and assessment. Teachers also raised concerns about attention spans and appropriate phone use, with one noting classroom integration was “not possible due to the packed syllabus and issues like absenteeism”.
The study sample comprised 23 students in Grades 7 and 8, whose primary language is Hindi, and for whom opportunities for spoken English practice are largely confined to school settings. Scientists discovered that only 22% of students felt comfortable speaking English, with 78% conveying anxiety and nervousness, citing fear of mistakes and limited vocabulary, one student stating, “I hesitate when speaking English because I fear getting it wrong”. Facilitators noted a shift from cautious initial use to greater comfort over time, offering insight into how early experiences with conversational agents evolve in low-resource learning contexts. Measurements confirm that teachers recognised ChatFriend’s potential to offer a safe, non-judgmental environment for students hesitant to practice, potentially enabling them to discuss topics they might avoid in traditional classroom settings.
Observations revealed that students “need personal and private space for using it without hesitation”, highlighting the importance of privacy and technical support for comfortable participation. The research team also noted concerns regarding parental oversight of home use, with one teacher stating, “Parents wouldn’t have any issue… but guidance from parents might be a challenge”. These findings inform the co-design of future educational technologies within the complex ecosystem of low-resource schools, extending beyond language learning applications.
Chatbot boosts confidence, reveals implementation challenges, and potential solutions
Scientists investigated the use of a voice-based chatbot to improve spoken English practice for students in low-resource schools in Delhi. Through a six-day field study, researchers observed and interviewed students, teachers, and principals to understand their experiences with the technology. The findings demonstrate considerable demand for the chatbot across all stakeholder groups, with students notably reporting increased confidence in their speaking abilities. This multi-stakeholder analysis revealed a divergence in priorities regarding long-term implementation; students valued the chatbot’s capacity for open-ended conversation, while administrators favoured alignment with existing curricula and assessment methods.
Researchers identified key design recommendations, including the need for clearer speech output for non-native speakers, simplified user interfaces with one-tap interactions, and actionable data analytics for teachers. Beyond language learning, this work informs the co-design of future educational technologies suitable for complex, under-resourced school environments. The authors acknowledge that disentangling the chatbot’s effects from those of traditional teaching and social interaction requires further investigation, specifically comparing in-school use with autonomous home practice. They also highlight opportunities for technical improvements to optimise chatbot performance and user experience within school settings. Future research should prioritise co-design with stakeholders, utilising the identified design considerations to guide iterative development and ensure socially sustainable, effective educational tools.
👉 More information
🗞 Voice-Based Chatbots for English Speaking Practice in Multilingual Low-Resource Indian Schools: A Multi-Stakeholder Study
🧠 ArXiv: https://arxiv.org/abs/2601.19304
