Retuning Upgrades Large Language Models for Stock Movement Prediction, Addressing Reasoning Flaws in Three-class Classification

Predicting stock movements represents a significant challenge for even the most sophisticated analytical tools, and recent advances in large language models (LLMs) have yet to fully address this complex financial task. Xueyuan Lin, Cehao Yang, and colleagues from The Hong Kong University of Science and Technology (Guangzhou), alongside Ye Ma, Ming Li, Rongjunchen Zhang from Hithink RoyalFlush Information Network Co., Ltd, and Yang Ni, now demonstrate a method to substantially improve LLM performance in this area. Their work reveals that existing LLMs often mimic analyst opinions rather than applying independent logical reasoning, and struggle to weigh conflicting evidence effectively, hindering accurate prediction. To overcome these limitations, the team introduces Reflective Evidence Tuning (RETuning), a novel technique that encourages LLMs to construct a robust analytical framework, systematically evaluate evidence, and ultimately derive predictions based on logical reasoning, rather than contextual biases. This approach, validated on a newly created large-scale dataset encompassing all of 2024’s data for over five thousand stocks, unlocks the reasoning potential of LLMs in finance and ensures reliable performance even with evolving market conditions and unfamiliar stocks.

Financial Prediction with Reasoning LLMs

This study presents a compelling comparison of two large language models, DeepSeek-14B and DeepSeek-14B-SFT, as they tackle the complex task of financial prediction. The models were challenged to predict the opening price change for a specific stock, requiring them to understand market dynamics, technical analysis, and current events. Crucially, the models were also expected to clearly articulate their reasoning process, detailing the data analysis and evidence scoring that led to their predictions. The research reveals key differences in how these models approach the task, with the SFT version prioritizing clarity and conciseness.

Both models provide detailed analyses, covering macroeconomic factors, company fundamentals, technical indicators, news events, evidence scoring, and risk assessment. However, DeepSeek-14B-SFT delivers a more concise and focused prediction, streamlining information for easier understanding and actionability. The study addresses the tendency of these models to mimic analyst opinions rather than independently analyze information and critically evaluate conflicting evidence. RETuning actively encourages the dynamic construction of analytical frameworks, prompting the model to organize and score evidence for potential price increases or decreases, ultimately deriving predictions through reflective analysis. To facilitate this research, the team constructed a large-scale dataset encompassing all of 2024 for 5,123 stocks, totaling over 200,000 samples with long contexts of up to 32,000 tokens.

This comprehensive dataset integrates six key information sources, overcoming limitations of prior resources. Experiments demonstrate that RETuning unlocks the reasoning ability of the language model within the financial domain, serving as an effective cold-start method. Researchers developed Fin-2024, a large-scale dataset encompassing all of 2024 for 5,123 stocks, totaling over 209,063 samples with long context windows of up to 32,000 tokens. This dataset integrates six key information sources, overcoming limitations of prior datasets that lacked diversity. RETuning guides LLMs to construct an analytical framework, dynamically organizing and scoring evidence for potential price increases or decreases, rather than relying on contextual biases.

Experiments demonstrate that RETuning effectively unlocks prediction ability, improving performance over strong baseline models. Furthermore, the research shows that RETuning enables significant inference-time scalability, allowing LLMs to maintain performance even with limited computational resources. The method also generalizes beyond stock movement prediction, yielding improvements in other financial tasks and demonstrating robust performance on out-of-distribution stocks. These results lay the groundwork for deploying trustworthy, reasoning-driven LLMs in financial applications.

Reflective Tuning Improves Stock Prediction Accuracy

This research demonstrates a significant advancement in applying large language models to financial forecasting. Scientists discovered that existing models tend to mimic analysts’ opinions rather than independently analyzing information, and struggle to weigh conflicting evidence effectively. The researchers also constructed a comprehensive dataset encompassing all of 2024 for over five thousand stocks, incorporating diverse data sources including price history, news, analyst opinions, and macroeconomic indicators. While acknowledging that financial prediction remains challenging, the team highlights the model’s ability to assess sample difficulty, suggesting potential for more efficient training strategies.

👉 More information
🗞 RETuning: Upgrading Inference-Time Scaling for Stock Movement Prediction with Large Language Models
🧠 ArXiv: https://arxiv.org/abs/2510.21604

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Symmetry-based Quantum Sensing Enables High-Precision Measurements, Outperforming GHZ States

Symmetry-based Quantum Sensing Enables High-Precision Measurements, Outperforming GHZ States

January 13, 2026
Quantum Algorithm Enables Efficient Simulation of Sparse Quartic Hamiltonians for Time Horizons

Quantum Algorithm Enables Efficient Simulation of Sparse Quartic Hamiltonians for Time Horizons

January 13, 2026
Fermionic Fractional Chern Insulators Demonstrate Existence of Chiral Graviton Modes

Fermionic Fractional Chern Insulators Demonstrate Existence of Chiral Graviton Modes

January 13, 2026