Automated feature discovery represents a significant hurdle in machine learning, particularly when analysing tabular data where identifying impactful features from vast possibilities typically requires considerable expertise. Researchers Keith Burghardt, Jienan Liu, Sadman Sakib, Yuning Hao, and Bo Li, all from Amazon.com, Inc., have developed FAMOSE (Feature AugMentation and Optimal Selection agEnt), a novel framework utilising the ReAct paradigm to autonomously explore, generate and refine features. This work marks the first application of an agentic ReAct framework to automated feature engineering for both regression and classification tasks, demonstrating state-of-the-art performance on regression tasks with a 2.0% average reduction in RMSE and competitive results on classification, achieving an average ROC-AUC increase of 0.23% on datasets exceeding 10,000 instances. The team’s findings suggest that ReAct’s iterative approach effectively leverages the LLM’s context window, guiding the invention of innovative features and highlighting the potential of AI agents to tackle complex, inventive problem-solving tasks.

A 2.0% reduction in root mean squared error demonstrates a clear advance in predictive accuracy for complex datasets. This automated system bypasses the need for specialist knowledge when preparing data for machine learning. Scientists continually seek methods to improve the performance of machine learning models, yet a significant obstacle remains: feature engineering.

Traditionally, identifying the most informative features from a vast and complex feature space requires considerable expertise and time. Now, researchers have developed FAMOSE (Feature AugMentation and Optimal Selection agEnt), a novel framework designed to automate this process. Unlike existing automated feature engineering techniques, FAMOSE employs a ReAct agent, an artificial intelligence agent that can reason and act. To autonomously explore, generate, and refine features.

This agent iteratively proposes new features, evaluates their impact, and adapts its strategy based on performance, mirroring the approach of a skilled data scientist. Effective feature selection is equally important as feature generation, and FAMOSE distinguishes itself by integrating feature selection tools directly within the agent’s architecture, allowing for a active and informed search.

To date, this represents the first application of a ReAct framework specifically tailored for automated feature engineering across both regression and classification problems. Initial experiments reveal substantial gains, particularly on larger datasets. For instance, classification accuracy, measured by ROC-AUC, improved by an average of 0.23% on datasets containing over 10,000 instances.

In regression tasks, FAMOSE reduced the root mean squared error (RMSE) by 2.0% on average, demonstrating its ability to create more precise predictive models. The framework also exhibits greater robustness to errors compared to other algorithms. At the heart of FAMOSE’s success lies the ReAct model. In turn, it allows the underlying language model to retain a record of successful and unsuccessful feature attempts.

In turn, this iterative learning process, akin to few-shot prompting, guides the agent towards generating more inventive and effective features. For many years, automated machine learning (AutoML) has sought to reduce the manual effort required to build effective models. Traditional AutoML systems often include feature discovery or iterative modification steps. But these methods can be limited by their inability to learn from past mistakes.

Recent advances have explored the use of large language models (LLMs) for feature generation — but these approaches often lack the capacity for iterative refinement. FAMOSE addresses this gap by creating a feedback-driven search process, where the agent learns from each experiment and adapts its strategy accordingly, and once the agent ceases to discover features that improve performance. A minimal-redundancy maximal-relevance (mRMR) feature selection step is applied to create a concise final feature set.

FAMOSE demonstrates improved ROC-AUC performance and scalability on large datasets

Across classification tasks, FAMOSE achieves a mean ROC-AUC improvement of 0.23% when applied to datasets containing over 10,000 instances. Demonstrating a clear advantage for larger, more complex datasets. Performance was evaluated using unweighted averages of one-versus-one ROC-AUC for any multi-class problems — classical feature engineering methods, specifically AutoFeat and OpenFE, encountered difficulties with datasets exceeding approximately 10.000 rows due to the exponential scaling of feature analysis and subsequent memory requirements.

Even with adjustments, these classical approaches still faltered on many realistic datasets, often requiring terabytes of memory and leading to frequent failures. By contrast, FAMOSE and CAAFE consistently processed all tasks without such limitations. FeatLLM exhibited issues with multi-class rule creation and task completion. When considering regression tasks, FAMOSE reduced the Root Mean Squared Error (RMSE) by an average of 2.0%, signifying a substantial improvement in predictive accuracy.

FAMOSE maintains greater robustness to errors compared to other algorithms tested. Classical methods sometimes encountered out-of-memory or out-of-time errors. Meanwhile, FAMOSE completed all five folds of even the largest task, covtype, with 580,000 instances and 55 features, in approximately six hours. Inside the experiments, XGBoost served as the primary prediction model, utilising default parameters and a fixed random seed of 42 for consistency.

The engineered features were also applied to Random Forest and Autogluon to assess their generalizability across different models. Across LLM-based feature engineering, both Sonnet 3.5 V2 and Deepseek-R1 were employed, with the LLM temperature fixed at 0.8, yielding similar feature quality.

ReAct Agent Driven Iterative Feature Engineering for Tabular Data

FAMOSE, a novel framework for automated feature engineering, centres around a ReAct (Reasoning and Acting) agent designed to autonomously explore and refine potential features. Initially, the agent interacts with tabular datasets to compute descriptive statistics and inspect data distributions, establishing a data-grounded foundation for feature proposals.

Unlike methods relying solely on prior knowledge, this direct interaction allows the agent to formulate hypotheses about feature utility based on actual data characteristics. Then, the agent generates new features, employing transformations and combinations of existing attributes, and then rigorously evaluates their impact on model performance.

At the same time, the process doesn’t end with initial evaluation; FAMOSE distinguishes itself through iterative refinement. At each round, the agent assesses the performance of newly created features alongside previously saved features, building upon past successes and correcting ineffective approaches. Here, the LLM explains its reasoning for each feature choice, enhancing interpretability and providing insight into the feature engineering process.

Once performance gains plateau during feature discovery, a minimal-redundancy maximal-relevance (mRMR) feature selection step is implemented to ensure a concise and effective final feature set. In turn, this choice anticipates that dedicated algorithms may offer greater accuracy in this specific task, contrasting with approaches that rely on the LLM for feature selection.

Meanwhile, fAMOSE was tested across diverse classification and regression datasets, alongside multiple tabular models and an alternative LLM, to assess its robustness. By simulating the iterative process of a data scientist, FAMOSE aims to overcome the limitations of static feature generation methods and navigate the expansive feature space more efficiently.

Large language models autonomously enhance data feature selection and refinement

Automated machine learning has promised to democratise data science, yet a stubborn obstacle remains: feature engineering. While algorithms can learn patterns, they often struggle to discern which raw data points are truly meaningful. Requiring human experts to painstakingly craft informative features. The new work presents FAMOSE, a system that attempts to bypass this bottleneck by using a large language model acting as an agent to autonomously explore, generate, and refine features.

It’s not simply about automating the process of feature engineering, but about replicating the inventive thinking a skilled practitioner brings to the task. The achievement extends beyond simply matching existing performance levels, with the system in particular reducing error rates in regression problems, a domain where automated feature engineering has historically lagged.

Gains, while measurable, are not enormous, averaging around 2%. This suggests we are not on the cusp of a complete automation of the process. But rather a significant step towards augmenting human capabilities. Feature engineering is increasingly understood as a creative challenge. Unlike earlier approaches that relied on pre-defined transformations, FAMOSE’s agentic approach, using the ReAct model — allows it to learn from its mistakes and iteratively improve its feature creation process.

The system’s ability to ‘remember’ what didn’t work appears to be key, guiding it towards more promising avenues. The reliance on large language models introduces familiar limitations, including computational cost and potential biases. The focus will likely shift towards scaling these agentic systems and applying them to more complex datasets. We can anticipate a broader trend towards AI agents tackling problems that demand ingenuity.

The next step might involve combining FAMOSE with other automated machine learning tools, creating a more holistic system. A fundamental question remains: can we truly automate creativity, or will human insight always be necessary to unlock the full potential of data.

👉 More information
🗞 FAMOSE: A ReAct Approach to Automated Feature Discovery
🧠 ArXiv: https://arxiv.org/abs/2602.17641

Tags:

AI agents. automated feature engineering Classification Tasks Feature Engineering LLM context window ReAct paradigm regression tasks RMSE ROC-AUC tabular data

AI Builds Better Data Features Automatically Now

FAMOSE demonstrates improved ROC-AUC performance and scalability on large datasets

ReAct Agent Driven Iterative Feature Engineering for Tabular Data

Large language models autonomously enhance data feature selection and refinement

Rohail T.

Latest Posts by Rohail T.:

Quantum Gates Mapped to Predictable Geometric Space

Quantum Error Framework Boosts Logical State Fidelity

Quantum Computers Cut Measurement Costs with New Method