Developer-llm Conversations: Study of 82,845 Interactions Reveals 14:1 Token-Length Ratio and Impacts Code Quality

The increasing integration of large language models into software development presents both opportunities and challenges for modern programmers. Suzhen Zhong, Ying Zou, and Bram Adams, from Queen’s University, investigate how developers actually interact with these models and how those interactions affect the quality of the resulting code. Their research leverages a substantial dataset of over 82,000 real-world conversations between developers and language models, revealing significant patterns in how these tools are used. The team demonstrates that language model responses are considerably longer than developer prompts and identifies common issues in generated code across multiple programming languages, including undefined variables and missing documentation. Importantly, the study also shows that iterative conversations, where developers correct and refine the model’s output, can lead to measurable improvements in code quality, particularly in areas like documentation and import handling.

Limited empirical understanding exists regarding how developers interact with large language models (LLMs) in practice and how these conversational dynamics influence task outcomes, code quality, and software engineering workflows. To address this gap, researchers systematically analyse developer-LLM conversation structures, developer behaviour patterns, and LLM-generated code quality, uncovering key insights into LLM-assisted software development. This analysis leverages CodeChat, a large dataset comprising 82,845 real-world developer-LLM conversations, containing 368,506 code snippets generated across over 20 programming languages.

Sentence Embeddings and Dialogue Data Collection

This research employed advanced techniques for analysing natural language and code, including the use of sentence embeddings to represent the meaning of text and code snippets. These embeddings, generated using models like Sentence-BERT, allow for quantitative comparison of semantic similarity between developer prompts and LLM responses. The team collected a comprehensive dataset of developer-LLM interactions, named CodeChat, to facilitate this analysis, capturing a wide range of coding tasks and programming languages, providing a realistic representation of developer workflows.

Developer Interactions Show Verbose LLM Responses

This research presents a detailed analysis of developer interactions with large language models (LLMs), captured in the CodeChat dataset, comprising 82,845 conversations and 368,006 code snippets. The study reveals significant characteristics of these interactions, demonstrating that LLMs often provide verbose responses, with a median Token Ratio (TR) of 14:1, potentially impacting interaction efficiency. Analysis of conversational structure revealed that 68% of interactions are multi-turn, indicating frequent follow-up questions or revisions. To understand the causes of these extended conversations, researchers defined and measured Prompt Design Gap Frequency (PDG-Freq), identifying deficiencies in initial prompts that necessitate clarification.

The team also quantified programming language generation patterns, discovering that Python and JavaScript are the most frequently generated languages, accounting for 9. 6% and 8. 7% of conversations respectively, aligning with current programming trends. Further analysis focused on code snippet characteristics, measuring Lines of Code (LOC) to assess complexity and size, and introducing the Multi-language Co-occurrence Rate (MLC-Rate) to detect instances of code integrating multiple languages.

Developers’ Use and Challenges with LLMs

This study addresses a critical gap in understanding how developers interact with Large Language Models (LLMs) in real-world software engineering workflows. Researchers analysed a large dataset of developer-LLM conversations, named CodeChat, comprising over 82,000 interactions and 368,000 code snippets across more than 20 programming languages. The findings reveal that developers frequently use LLMs for tasks including web design and machine learning model training, often receiving detailed, multi-language code in response to their prompts. While LLMs prove helpful, the analysis also identifies common issues in the generated code, such as undefined variables, missing comments, and unresolved namespaces, which persist across multiple conversational turns. However, the research demonstrates that providing feedback and explicitly requesting fixes improves code quality, particularly in areas like documentation and import handling.

👉 More information
🗞 Developer-LLM Conversations: An Empirical Study of Interactions and Generated Code Quality
🧠 ArXiv: https://arxiv.org/abs/2509.10402
Dr. Donovan

Dr. Donovan

Dr. Donovan is a futurist and technology writer covering the quantum revolution. Where classical computers manipulate bits that are either on or off, quantum machines exploit superposition and entanglement to process information in ways that classical physics cannot. Dr. Donovan tracks the full quantum landscape: fault-tolerant computing, photonic and superconducting architectures, post-quantum cryptography, and the geopolitical race between nations and corporations to achieve quantum advantage. The decisions being made now, in research labs and government offices around the world, will determine who controls the most powerful computers ever built.

Latest Posts by Dr. Donovan:

IQM Lands World-First Private Enterprise Quantum Sale with 54-Qubit System

IQM Lands World-First Private Enterprise Quantum Sale with 54-Qubit System

April 7, 2026
Specialized AI hardware accelerators for neural network computation

Anthropic’s Compute Capacity Doubles: 1,000+ Customers Spend $1M+

April 7, 2026
QCNNs Classically Simulable Up To 1024 Qubits

QCNNs Classically Simulable Up To 1024 Qubits

April 7, 2026