The increasing integration of large language models into software development presents both opportunities and challenges for modern programmers. Suzhen Zhong, Ying Zou, and Bram Adams, from Queen’s University, investigate how developers actually interact with these models and how those interactions affect the quality of the resulting code. Their research leverages a substantial dataset of over 82,000 real-world conversations between developers and language models, revealing significant patterns in how these tools are used. The team demonstrates that language model responses are considerably longer than developer prompts and identifies common issues in generated code across multiple programming languages, including undefined variables and missing documentation. Importantly, the study also shows that iterative conversations, where developers correct and refine the model’s output, can lead to measurable improvements in code quality, particularly in areas like documentation and import handling.

Limited empirical understanding exists regarding how developers interact with large language models (LLMs) in practice and how these conversational dynamics influence task outcomes, code quality, and software engineering workflows. To address this gap, researchers systematically analyse developer-LLM conversation structures, developer behaviour patterns, and LLM-generated code quality, uncovering key insights into LLM-assisted software development. This analysis leverages CodeChat, a large dataset comprising 82,845 real-world developer-LLM conversations, containing 368,506 code snippets generated across over 20 programming languages.

Sentence Embeddings and Dialogue Data Collection

This research employed advanced techniques for analysing natural language and code, including the use of sentence embeddings to represent the meaning of text and code snippets. These embeddings, generated using models like Sentence-BERT, allow for quantitative comparison of semantic similarity between developer prompts and LLM responses. The team collected a comprehensive dataset of developer-LLM interactions, named CodeChat, to facilitate this analysis, capturing a wide range of coding tasks and programming languages, providing a realistic representation of developer workflows.

Developer Interactions Show Verbose LLM Responses

This research presents a detailed analysis of developer interactions with large language models (LLMs), captured in the CodeChat dataset, comprising 82,845 conversations and 368,006 code snippets. The study reveals significant characteristics of these interactions, demonstrating that LLMs often provide verbose responses, with a median Token Ratio (TR) of 14:1, potentially impacting interaction efficiency. Analysis of conversational structure revealed that 68% of interactions are multi-turn, indicating frequent follow-up questions or revisions. To understand the causes of these extended conversations, researchers defined and measured Prompt Design Gap Frequency (PDG-Freq), identifying deficiencies in initial prompts that necessitate clarification.

The team also quantified programming language generation patterns, discovering that Python and JavaScript are the most frequently generated languages, accounting for 9. 6% and 8. 7% of conversations respectively, aligning with current programming trends. Further analysis focused on code snippet characteristics, measuring Lines of Code (LOC) to assess complexity and size, and introducing the Multi-language Co-occurrence Rate (MLC-Rate) to detect instances of code integrating multiple languages.

Developers’ Use and Challenges with LLMs

This study addresses a critical gap in understanding how developers interact with Large Language Models (LLMs) in real-world software engineering workflows. Researchers analysed a large dataset of developer-LLM conversations, named CodeChat, comprising over 82,000 interactions and 368,000 code snippets across more than 20 programming languages. The findings reveal that developers frequently use LLMs for tasks including web design and machine learning model training, often receiving detailed, multi-language code in response to their prompts. While LLMs prove helpful, the analysis also identifies common issues in the generated code, such as undefined variables, missing comments, and unresolved namespaces, which persist across multiple conversational turns. However, the research demonstrates that providing feedback and explicitly requesting fixes improves code quality, particularly in areas like documentation and import handling.

👉 More information
🗞 Developer-LLM Conversations: An Empirical Study of Interactions and Generated Code Quality
🧠 ArXiv: https://arxiv.org/abs/2509.10402

Tags:

C# C++ code generation Code snippets developer-LLM conversations Java JavaScript LLMs python software engineering workflows

Quantum News

Developer-llm Conversations: Study of 82,845 Interactions Reveals 14:1 Token-Length Ratio and Impacts Code Quality

Sentence Embeddings and Dialogue Data Collection

Developer Interactions Show Verbose LLM Responses

Developers’ Use and Challenges with LLMs

Latest Posts by Quantum News:

SEALSQ Deploys Post-Quantum Chips in Millions of Devices Worldwide

Stanford Researchers Develop Scalable Method for High-Quality Moire Superlattices

MIT Develops Photonic Chip for 10x Faster Quantum Computer Cooling