Portfolio optimisation routinely seeks to identify the key factors driving asset pricing, but current methods using Conditional Autoencoders often struggle when considering a large number of these factors. Ryan Engel from Stony Brook University, Yu Chen, and Pawel Polak, alongside Ioana Boier from NVIDIA Corporation, address this limitation by developing a new scalable framework that effectively handles high-dimensional data. Their approach combines a powerful autoencoder with a novel uncertainty-aware factor selection procedure, allowing them to pinpoint the most predictable factors and significantly improve portfolio performance. The team demonstrates that this pruning strategy delivers substantial gains in risk-adjusted returns across multiple forecasting models, and that combining these models into a weighted ensemble consistently outperforms any single approach, achieving higher ratios for key performance metrics.
Latent Factors Balance Bias and Variance
This study addresses a key challenge in asset pricing: balancing model complexity with the risk of overfitting. Researchers investigated how to effectively utilize a large number of latent factors, hidden variables influencing asset returns, by selectively prioritizing those with the most reliable forecasts and downplaying those with high uncertainty. This approach aims to improve portfolio construction and achieve better risk-adjusted returns by carefully managing the trade-off between bias and variance. The team developed a novel method for uncertainty-aware factor selection, employing a scalable framework using Conditional Autoencoders (CAEs) and accelerated data processing with RAPIDS.
By combining multiple forecasting models, including IID-BS, Q-Boost, and ZS-Chronos, they created a robust and diversified approach to predicting asset returns, linking forecast uncertainty to estimation risk and optimal portfolio construction. Empirical results demonstrate that this method consistently achieves higher Sharpe, Sortino, and Omega ratios compared to using all latent factors or a fixed number, proving robust across different market conditions and investment horizons. The ZS-Chronos model consistently outperforms other forecasting methods, particularly after 2018, and combining multiple forecasting models further enhances performance. This research highlights the crucial role of uncertainty in selecting informative latent factors, controlling dimensionality, and improving portfolio diversification.
High Dimensional Portfolio Optimization with Conditional Autoencoders
Researchers pioneered a scalable framework that combines a high-dimensional Conditional Autoencoder (CAE) with an uncertainty-aware factor selection procedure to improve portfolio optimization. They trained a CAE model using firm-level characteristics, such as market equity, asset growth, and past return momentum, alongside asset returns, enabling a more expressive representation of the return-generating process and enhancing the signal-to-noise ratio of latent factor returns. To address limitations of restricting latent factor dimensionality, the team explored up to 50 latent factors and implemented a post-hoc filtering procedure to select the most predictable factors. They employed three distinct forecasting models, zero-shot Chronos, XGBoost, and a simple sample mean model, to generate predictive distributions and quantify forecast uncertainty, ranking factors based on predictive stability and constructing a performance-weighted ensemble. This uncertainty-aware selection minimizes expected out-of-sample utility loss and aligns factor selection with the goal of portfolio optimization, consistently outperforming conventional low-dimensional factor models with notably higher Sharpe, Sortino, and Omega ratios while maintaining maximum drawdowns below 10%. The study demonstrates that ensembles of diverse forecasts yield robust, market-neutral portfolios with superior and stable out-of-sample performance.
Predictable Factors Improve Portfolio Risk-Adjusted Performance
This research demonstrates a scalable framework for estimating latent asset-pricing factors from firm characteristics, achieving substantial gains in risk-adjusted performance across multiple forecasting models. Scientists coupled a high-dimensional Conditional Autoencoder (CAE) with an uncertainty-aware factor selection procedure, successfully addressing limitations in prior work that restricted latent factor dimensions. Experiments revealed that selectively choosing predictable factors enhances portfolio efficiency by concentrating exposure on stable drivers of return while reducing exposure to high-variance components. The team measured performance across various latent dimensionalities, ranging from 5 to 50, identifying configurations with fewer than 50 factors that frequently outperformed the full CAE benchmark.
Models with between 20 and 40 factors often offered the best return-volatility trade-off, with one ensemble delivering a Sortino ratio of 4. 010, a Sharpe ratio of 2. 204, and an Omega ratio of 5. 952. The study introduced an adaptive selection procedure for the number of latent factors, relying solely on information available at decision points.
This adaptation, employing an expanding-window framework and a temporally regularized objective based on the Sortino ratio, preserved the uncertainty-aware factor selection methodology while offering realistic and implementable forecasts, yielding the highest total return and annualized growth with an annualized return of 16. 62% and annual volatility of 8. 88%. The framework’s ability to consistently outperform benchmarks highlights its potential for improving portfolio construction and risk management.
Uncertainty Guided Factor Selection Improves Asset Pricing
This research presents a scalable framework that couples high-dimensional conditional autoencoders with an uncertainty-aware factor selection procedure, successfully addressing the challenge of performance degradation often associated with increasing the number of latent factors in asset pricing models. The team demonstrates that selecting latent factors based on forecast uncertainty consistently improves risk-adjusted returns across several forecasting models, notably achieving substantial gains by utilizing only a carefully chosen subset of available factors. Integrating three distinct forecasting methods, IID-BS, Q-Boost, and ZS-Chronos, yielded largely uncorrelated predictive signals, enabling ensemble strategies that significantly outperformed individual models, achieving a 2. 2 Sharpe ratio, 4.
01 Sortino ratio, and 5. 95 Omega ratio. Furthermore, the study establishes a theoretical foundation linking predictive uncertainty to the expected degradation of portfolio utility under model misspecification, reinforcing the principle that forecast dispersion governs expected utility loss and confidence weighting in optimal portfolios. Through rigorous analysis, the researchers ruled out concerns about data leakage affecting the performance of the pretrained ZS-Chronos model, confirming its robust generalization capabilities.
👉 More information
🗞 Scaling Conditional Autoencoders for Portfolio Optimization via Uncertainty-Aware Factor Selection
🧠 ArXiv: https://arxiv.org/abs/2511.17462
