Recommender systems increasingly rely on understanding not just what users like, but why they like it, a challenge for even the most advanced language models. Minjie Hong from Zhejiang University, Zetong Zhou from Shanghai Jiao Tong University, Zirun Guo from Zhejiang University, and colleagues address this problem by developing a new framework, GREAM, that enables large language models to perform generative reasoning recommendation. The team demonstrates how to adapt pre-trained language models to bridge the gap between understanding user preferences and providing relevant recommendations, achieving a unified approach to reasoning and prediction. This work introduces methods for aligning textual information with user behaviour, building training datasets that encourage logical reasoning, and optimising the system with sparse feedback, ultimately creating a practical path towards transparent and verifiable recommendation systems powered by artificial intelligence.

Instead, the system aims to reason about a user’s purchase history to understand their underlying needs and then recommend items accordingly. The framework utilizes large language models to perform this reasoning process, explicitly outlining steps for extracting behavioral evidence, modeling latent preferences, and inferring user intent and goal formulation. The system actively considers what a user needs, rather than relying solely on collaborative or content-based filtering. By integrating large language models, the system achieves a more nuanced understanding of user behavior and provides transparent reasoning for its recommendations. Case studies across diverse domains, including musical instruments, beauty products, and sports equipment, demonstrate how the system generates a thought process before suggesting items, promising more intelligent and personalized recommendations and potentially leading to greater user satisfaction.

Generative Reasoning for Enhanced Recommendation Accuracy

Researchers have pioneered a new framework, GREAM, which enhances recommendation accuracy and interpretability through generative reasoning. GREAM integrates collaborative-semantic alignment, reasoning curriculum activation, and sparse-regularized policy optimization. The team established a high-fidelity collaborative-semantic alignment by fusing diverse textual sources, including item titles, descriptions, and user reviews, to create consistent item representations, incorporating temporal behavioral patterns and user preferences. To imbue the model with reasoning capabilities, scientists constructed a synthetic dataset with explicit Chain-of-Thought supervision and implemented a curriculum learning strategy.

This curriculum progressively increased reasoning difficulty, guiding the model through a five-stage logical chain encompassing behavioral evidence extraction, latent preference modeling, intent inference, recommendation formulation, and denoised sequence rewriting, transforming the recommendation task from simple pattern matching into a process of causal reasoning. Furthermore, the team developed a Sparse-Regularized Group Policy Optimization algorithm to address instability caused by limited feedback, enhancing gradient diversity and ensuring stable policy updates. Experiments across multiple benchmarks demonstrate consistent gains over existing methods, validating the effectiveness and robustness of the approach.

Generative Reasoning Improves Recommendation Performance

The research team presents a novel framework, GREAM, which builds generative reasoning recommendation systems by adapting pre-trained large language models. This approach addresses the challenge of aligning textual understanding with collaborative filtering data, enabling both efficient direct recommendations and interpretable reasoning-based generation. GREAM integrates three key components: Collaborative-Semantic Alignment to connect language with user interactions, Reasoning Curriculum Activation to train the model with explicit reasoning steps, and Sparse-Regularized Group Policy Optimization to stabilize learning from limited feedback. Experiments across multiple datasets demonstrate that GREAM consistently improves recommendation performance compared to existing methods. Notably, training the model to focus on generating reasoning chains significantly boosts performance, suggesting that verifiable reinforcement signals are particularly effective at strengthening the reasoning process. This work advances the field of verifiable-RL-driven recommendation and provides a pathway towards transparent, scalable, and causally grounded recommender systems.

👉 More information
🗞 Generative Reasoning Recommendation via LLMs
🧠 ArXiv: https://arxiv.org/abs/2510.20815

Tags:

bonus-calibrated group advantage estimation Chain-of-Thought collaborative-semantic alignment generative reasoning recommendation Large Language Models latent preference inference reasoning curriculum activation sequence recommendation sparse-regularized group policy optimization verifiable reinforcement learning

Generative Reasoning Recommendation Via LLMs Achieves Unified Understanding-reasoning-prediction for Recommendation Tasks

Generative Reasoning for Enhanced Recommendation Accuracy

Generative Reasoning Improves Recommendation Performance

Rohail T.

Latest Posts by Rohail T.:

Error-Correcting Code Boosts Data Reliability in Superconducting Circuits

Machine Learning Framework Systematically Benchmarks 18 Quantum Models for Practical Gains

Machine Learning Models Now Better Capture Electrostatic Forces in Materials