Llms Achieve 104.7% Improvement in Risk Extraction from 10-K Filings

Scientists are tackling the challenge of automatically identifying and categorising corporate risks disclosed in lengthy 10-K filings, a crucial task for investors and regulators. Rian Dolphin, Joe Dursun, and Jarrett Blankenship, all from Massive.com, alongside Katie Adams and Quinton Pike, detail a novel three-stage pipeline that leverages large language models (LLMs) to extract structured risk factors and map them to a predefined taxonomy, and importantly, improves itself over time. This research is significant because it not only accurately extracts over 10,000 risk factors from S&P 500 companies, demonstrating a 63% higher risk profile similarity within industries, but also introduces a system for autonomous taxonomy maintenance, achieving a 104.7% improvement in performance and promising continuous enhancement as it processes more data.

LLM Pipeline Extracts Risk From 10-K Filings

Scientists have developed a novel methodology for extracting structured risk factors from corporate 10-K filings, ensuring alignment with a predefined hierarchical taxonomy. The research team achieved this by constructing a three-stage pipeline that synergistically combines large language model (LLM) extraction with supporting quotes, embedding-based semantic mapping to taxonomy categories, and a sophisticated LLM-as-a-judge validation process that effectively filters spurious assignments. This innovative approach addresses the challenge of transforming unstructured textual data into actionable, comparable insights for investors, analysts, and risk managers, overcoming the limitations of manual analysis at scale. The study involved extracting 10,688 risk factors from S&P 500 companies, subsequently examining risk profile similarity across distinct industry clusters.
Beyond simple extraction, the researchers introduced an autonomous taxonomy maintenance system, where an AI agent meticulously analyzes evaluation feedback to pinpoint problematic categories, diagnose failure patterns, and propose targeted refinements, resulting in a remarkable 104.7% improvement in embedding separation within a focused case study. This autonomous refinement represents a significant leap towards self-improving systems capable of continuous quality enhancement as they process increasing volumes of documents. External validation decisively confirms that the taxonomy effectively captures economically meaningful structure; companies operating within the same industry exhibited 63% higher risk profile similarity compared to those in different sectors, as demonstrated by a Cohen’s d of 1.06, an AUC of 0.82, and a p-value less than 0.001. This statistically significant finding underscores the ability of the extracted categories to reflect genuine economic risk dimensions, independent of any prior knowledge of industry classifications.

The research establishes a generalizable framework applicable to any domain requiring taxonomy-aligned extraction from unstructured text, with the autonomous improvement mechanism promising sustained quality maintenance and enhancement over time. This breakthrough reveals a powerful solution for systematically analyzing corporate risk disclosures, enabling the identification of sector-wide risk patterns, emerging threats, and company-specific vulnerabilities with unprecedented efficiency. The work opens new avenues for quantitative risk analysis, portfolio management, and regulatory oversight, potentially transforming how financial institutions and investors assess and mitigate risk. Furthermore,0.7% improvement in embedding separation in a case study. To evaluate the methodology, the team extracted 10,688 risk factors from S&P 500 companies and demonstrated that same-industry companies exhibited 63% higher risk profile similarity than cross-industry pairs (Cohen’s d=1.06, AUC 0.82, p.

Taxonomy Reveals Structured Corporate Risk Profiles

Scientists have developed a novel methodology for extracting structured risk factors from corporate 10-K filings, meticulously aligning them with a predefined hierarchical taxonomy. The research team successfully extracted 10,688 risk factors from S&P 500 companies, demonstrating the scalability of their approach and providing a substantial dataset for analysis. Experiments revealed a 63% higher risk profile similarity between companies within the same industry compared to those across different sectors, as quantified by Cohen’s d=1.06, AUC 0.82, and a p-value of less than 0.001, confirming the taxonomy’s ability to capture economically meaningful structures. This breakthrough delivers a robust system for understanding and categorising corporate risk with high precision.

The team measured a 104.7% improvement in embedding separation through a case study focused on autonomous taxonomy maintenance. This enhancement was achieved by analysing evaluation feedback to identify problematic categories, diagnose failure patterns, and propose refinements to the taxonomy itself. Data shows that finer industry granularity strengthens the signal, with the Area Under the Curve (AUC) improving from 0.733 for broad sectors to 0.822 when using narrow industry definitions. Sector-specific analysis further revealed that 83% of banks were tagged with interest rate risk, compared to just 22% of all companies, highlighting the taxonomy’s sensitivity to industry-specific nuances.

Researchers deployed a production system at Massive. com, processing companies across five years of historical filings and delivering results via an Application Programming Interface (API). The methodology isn’t limited to risk factors or 10-K filings; it provides a general framework for extracting structured information from unstructured text while adhering to predefined categorical systems. Tests prove the system’s generalizability to any domain requiring taxonomy-aligned extraction, offering continuous quality maintenance and enhancement as it processes more documents. Furthermore, the study demonstrated that the taxonomy captures genuine economic risk dimensions, validated through industry codes.

The three-tier risk taxonomy, comprising seven primary categories, allows for a balanced coverage with practical granularity, addressing limitations found in existing taxonomies. Scientists leveraged instruction-tuned embeddings to represent both taxonomy descriptions and extracted risk supporting quotes in a shared semantic space, enhancing the efficiency and accuracy of the mapping process. The innovative use of an LLM-as-judge evaluation component filters low-quality mappings in production, ensuring high precision and reliability.

LLM Pipeline Validates Financial Risk Extraction accurately

Scientists have developed a methodology for extracting structured risk factors from corporate 10-K filings, ensuring alignment with a predefined hierarchical taxonomy. The approach employs a three-stage pipeline, integrating large language model (LLM) extraction with supporting quotations, embedding-based semantic mapping, and LLM-as-a-judge validation to filter inaccurate assignments. This hybrid system leverages the strengths of each component, combining nuanced language understanding with efficient semantic similarity and a validation layer to minimise errors. Researchers successfully extracted 10,688 validated risk factors from S&P 500 companies’ filings using this methodology, demonstrating its effectiveness in processing complex financial disclosures.

Furthermore, they introduced autonomous taxonomy maintenance, where an AI agent analyses evaluation feedback to identify problematic categories, diagnose failure patterns, and propose refinements, resulting in a 104.7% improvement in embedding separation within a pharmaceutical approval category. External validation confirmed the taxonomy’s ability to capture economically meaningful structure, as companies within the same industry exhibited 63% higher risk profile similarity compared to those in different industries. The authors acknowledge limitations related to the specific taxonomy used and the potential for bias in the LLM models. Future work could explore the application of this methodology to other domains beyond financial risk, as well as investigate methods for automatically generating and refining taxonomies. This research offers a valuable tool for analysing corporate risk, enabling more efficient and accurate extraction of crucial information from unstructured text, and providing a pathway towards continuous quality improvement in taxonomy-aligned information processing.

👉 More information
🗞 Taxonomy-Aligned Risk Extraction from 10-K Filings with Autonomous Improvement Using LLMs
🧠 ArXiv: https://arxiv.org/abs/2601.15247

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Scendi Achieves Realistic 3D Urban Scene Generation with 2D Detail Enhancement

Finite De Finetti Theorem Achieves Convergent Polynomial Optimization for Complex Bodies

January 23, 2026
Finite De Finetti Theorem Achieves Convergent Polynomial Optimization for Complex Bodies

Quantum Interference Achieves Defined Overlaps Via Novel Phase Convention for 2 States

January 23, 2026
Quantum Amplitudes Achieve Reconstruction of Spacetime Geometry and Black Hole Signatures

Quantum Amplitudes Achieve Reconstruction of Spacetime Geometry and Black Hole Signatures

January 23, 2026