A new dataset facilitates evaluation of small language models for user profiling based on multi-session smart home interactions. The dataset comprises structured user profiles defining behavioural routines, used to generate realistic dialogue sessions. Benchmarking compact models against larger counterparts reveals a significant performance gap in accurately reconstructing user behaviour from interaction history. This highlights a challenge for on-device AI, where small models offer privacy, low latency and personalisation, but currently lag in behavioural modelling accuracy. The resource provides a testbed for advancing intelligent, privacy-respecting AI systems.
Understanding and modelling human behaviour within the home is becoming increasingly important as smart home technology proliferates. Accurately capturing user routines and preferences from natural language interactions allows for genuinely personalised and responsive systems, but presents a significant challenge for deploying artificial intelligence on resource-constrained devices. Researchers are now addressing this through the creation of dedicated datasets for evaluation. Patryk Bartkowiak and Michal Podstawski, from TCL Research Europe, alongside colleagues, detail their work in the paper ‘EdgeWisePersona: A Dataset for On-Device User Profiling from Natural Language Interactions’, introducing a new benchmark designed to assess small models capable of running directly on edge devices, and comparing their performance against larger, cloud-based systems.
Structuring Smart Home Intelligence: A Dataset for Evaluating Compact Language Models in User Profiling
Researchers have introduced a novel dataset and benchmark designed to rigorously evaluate small language models (SLMs) within the growing field of smart home automation. The focus is on the challenging task of user profiling through multi-session interactions, addressing a critical gap in current evaluation methodologies. Existing benchmarks often prioritise performance on large-scale tasks without adequately considering the constraints of deploying intelligence directly on user-owned devices – known as ‘edge deployment’. This dataset establishes a structured framework for assessing a model’s ability to infer a user’s habits and preferences from interactions with a smart home system, moving beyond simple voice command recognition towards understanding behavioural patterns. By concentrating on SLMs, the research directly tackles the challenge of achieving privacy, low latency, and personalised experiences on resource-constrained devices.
The core innovation lies in the generation of realistic dialogues grounded in predefined user routines. These routines represent repeatable behavioural patterns triggered by contextual factors such as time of day, weather conditions, and user activity. Researchers meticulously define these routines using a machine-readable schema, detailing the sequence of actions and associated triggers, allowing for automated parsing and analysis. Simulated sessions then represent interactions with the smart home system, where the user executes parts of the routine, providing a complete interaction history of user utterances and system responses. This structured format facilitates seamless integration with existing machine learning pipelines, enabling researchers to readily assess and compare model performance.
Results demonstrate a significant performance disparity between SLMs and larger foundation models in accurately reconstructing user routines from interaction histories. This highlights a critical challenge in deploying intelligent systems directly on user devices. While SLMs exhibit some capacity for profile reconstruction, they consistently underperform larger models in capturing the complexity of user behaviour, particularly with routines involving multiple steps or conditional actions. This gap underscores the need for further research into techniques for improving SLM capabilities, enabling performance comparable to larger models while retaining the benefits of edge deployment.
Researchers plan to extend the evaluation benchmark to encompass a wider range of tasks, such as proactive assistance and personalised recommendations, broadening the dataset’s utility. Investigating the robustness of models to variations in user behaviour and environmental conditions is also essential, ensuring adaptation to changing circumstances and reliable performance. Ultimately, this work contributes to the development of privacy-preserving, low-latency, and personalised AI systems capable of learning and adapting directly on user-owned devices, enhancing the user experience and unlocking the full potential of smart home technology.
The dataset’s design prioritises transparency and reproducibility, promoting wider adoption and adaptation of the benchmark within the research community. By providing a clear and well-documented framework for evaluating SLMs, researchers aim to facilitate collaboration and accelerate progress in the field. The machine-readable schema for both routines and sessions ensures the data can be easily parsed and utilised within existing software pipelines, streamlining evaluation and reducing the barrier to entry for new researchers.
Researchers recognise that effective SLM development for smart home automation requires a holistic approach, encompassing model architecture, training techniques, data collection, and evaluation methodologies. The dataset presented represents a significant step towards addressing this challenge, providing a valuable resource for researchers and developers. By focusing on the specific constraints of edge deployment, this work aims to pave the way for a new generation of intelligent smart home systems that are both powerful and privacy-preserving.
The research team emphasises the importance of addressing the ethical considerations surrounding AI use in smart home environments, particularly regarding data privacy and security. The dataset is designed to facilitate the development of models that can learn from user interactions without compromising sensitive information, prioritising user privacy and control. Researchers are committed to developing responsible AI technologies that benefit society as a whole, ensuring the benefits of smart home automation are accessible to all.
Looking ahead, the research team plans to explore federated learning techniques – which allow models to be trained on decentralised data sources without data sharing – further enhancing user privacy and security. They also plan to investigate reinforcement learning techniques, allowing models to learn from interactions with the environment, enabling adaptation to changing user preferences and optimised performance over time. This ongoing research effort is dedicated to pushing the boundaries of smart home intelligence and creating a more seamless and intuitive user experience.
The introduction of this dataset and benchmark represents a significant contribution to the field of smart home automation, providing a valuable resource for researchers and developers. By focusing on the specific challenges of deploying small language models on edge devices, this work addresses a critical gap in current evaluation methodologies and paves the way for a new generation of intelligent smart home systems. The commitment to transparency, reproducibility, and ethical considerations ensures this work will have a lasting impact, driving innovation and benefiting society as a whole.
👉 More information
🗞 EdgeWisePersona: A Dataset for On-Device User Profiling from Natural Language Interactions
🧠 DOI: https://doi.org/10.48550/arXiv.2505.11417
