The challenge of tailoring large language models (LLMs) to individual user preferences presents a significant computational hurdle, particularly when balancing personalisation with data privacy and resource limitations. Current methods often necessitate either extensive cloud-based fine-tuning or static alignment, failing to adapt to evolving user contexts in real-time on personal devices. Researchers Hang Lv, Sheng Liang, and colleagues, detail a novel collaborative framework, entitled ‘CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering’, which addresses this dichotomy by employing localised ‘delta steering’ signals. This approach leverages the difference in output probabilities between a small, local model aware of personal context and a more general, cloud-based LLM, dynamically adjusting the cloud model’s output during text generation without transmitting sensitive user data. The result is a system capable of personalised content creation that preserves privacy and operates efficiently on resource-constrained devices.

Personalised language models are increasingly capable of generating tailored textual outputs, fuelled by advancements in large language models (LLMs) and natural language processing. This progress motivates a shift towards systems that adapt content to individual user characteristics, moving beyond generic responses to align with specific linguistic patterns, interaction histories, and contextual preferences. Current approaches broadly fall into two categories: training-based and tuning-free methods, each presenting distinct advantages and challenges in balancing customisation with computational cost and data privacy. Training-based techniques modify model parameters using user data, while tuning-free methods adapt to user context during text generation without altering the underlying model. Within tuning-free methods, prompt engineering and inference-time optimisation represent distinct strategies for steering LLMs towards outputs relevant to the user’s needs.

A central challenge lies in reconciling the benefits of cloud-based LLMs with the need for localised, user-specific information. Transmitting raw user data to the cloud raises significant privacy concerns, creating a trade-off between output quality and data security. Consequently, research focuses on methods that enable personalisation without extensive model training or data transmission, seeking to leverage the strengths of both cloud and on-device resources. This necessitates innovative techniques for steering LLMs using limited on-device resources, paving the way for more adaptive and privacy-preserving AI systems. The pursuit of genuinely personalised text generation demands systems that adapt to the nuanced and evolving preferences of individual users across cultural, temporal, and contextual boundaries, addressing the limitations of both centralised fine-tuning and static preference alignment.

Researchers are now addressing this with frameworks like CoSteer, a collaborative system designed to enable decoding-time personalisation through a process termed ‘localised delta steering’. The core principle exploits the difference in output probabilities, known as logits, between a personal context-aware LLM and one operating without that specific context. A smaller, local model processes user profiles and writing history to generate logits, representing the model’s predicted probability for each possible next token. These are then compared to those generated by a more powerful, cloud-based LLM operating without direct access to the user’s personal data. The difference between these sets of logits – the ‘delta vectors’ – serves as a steering signal, dynamically adjusting the cloud LLM’s output probabilities at the token level, subtly altering predictions to better align with the user’s established preferences.

This approach formulates the personalisation process as an online learning problem, where the local delta vectors continuously refine the cloud LLM’s output within the constraints of the user’s device, offering a viable solution for resource-constrained personal devices. Crucially, CoSteer prioritises user privacy by transmitting only the final, steered tokens – the generated text – rather than any raw data or intermediate vectors, ensuring that sensitive user information remains securely stored on the device. The system effectively leverages the strengths of both local and cloud computing, combining the generative power of large language models with the personalised insights derived from on-device data processing.

CoSteer tackles the inherent limitations of both cloud-based and on-device LLM approaches, where cloud models lack access to localised user data and smaller on-device models struggle to match the generative quality of their larger counterparts. It bridges this gap by employing the ‘delta steering’ mechanism, utilising the difference in logits—the raw, unnormalised prediction scores—between a personal and a generic LLM output as steering signals, allowing for dynamic refinement of the cloud LLM’s output without requiring full fine-tuning or the transmission of sensitive user data.

Experiments demonstrate CoSteer’s effectiveness across various personalised generation tasks, successfully leveraging locally stored user profiles and histories to guide the LLM, producing content tailored to individual preferences. Researchers evaluated performance using datasets including Cogenesis, Longlamp, and a Preference Alignment Task, assessing both overall quality and the degree of personalisation achieved. The research emphasises the importance of detailed dataset examples and well-defined evaluation metrics for reproducibility and comparison with other personalisation techniques, offering a practical approach to adapting LLMs to diverse user contexts.

By formulating token-level optimisation as an online learning problem, the system dynamically adjusts the cloud LLM’s logits within the on-device environment, leveraging locally stored user profiles and writing histories, and ensuring that generated content reflects individual preferences while preserving the general capabilities of the cloud LLM. The framework prioritises user privacy by transmitting only the final, steered tokens, rather than raw data or intermediate vectors, minimising the risk of sensitive information being exposed, a crucial consideration for personal devices. Experiments across diverse personalised generation tasks confirm that CoSteer effectively assists LLMs in generating content tailored to individual users, demonstrating a viable pathway towards truly personalised AI assistants.

Further research should focus on quantifying the impact of varying dataset sizes on CoSteer’s performance, alongside a more detailed error analysis to identify specific response types that consistently receive lower ratings. Incorporating quantitative metrics, such as BLEU or ROUGE scores, alongside human evaluation, would provide a more objective assessment of the system’s effectiveness, and investigating the performance of CoSteer with different baseline LLMs would also provide valuable context for the results. Future work could explore the potential of adapting the delta steering mechanism to incorporate more complex user preferences, such as stylistic choices or specific topic interests, and expanding the framework to support multimodal personalisation, incorporating data from sources beyond text, represents another promising avenue for investigation. Finally, assessing the computational overhead of CoSteer on a wider range of personal devices would be crucial for ensuring its practical viability.

👉 More information
🗞 CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering
🧠 DOI: https://doi.org/10.48550/arXiv.2507.04756

Tags:

contextual adaptation delta steering Large Language Models localized data processing logits On-Device Learning online learning Personalized text generation Privacy Preservation token-level optimization.

Quantum News

Personalized text adapts in real-time despite device resource limitations.

Latest Posts by Quantum News:

Lawrence Livermore National Laboratory Partners to Optimize Manufacturing Processes with High-Performance Computing

IonQ Reports $130 Million in 2025 Revenue, Tripling Prior Year Results

Xanadu Advances Quantum Software Stack Through PennyLane and MQT Integration