Shows Rationale Extraction Improves DNN Performance with Limited Supervision and Feature Selection

Researchers are tackling the critical challenge of understanding how neural networks reach their decisions, particularly in sensitive applications. Jiayi Dai and Randy Goebel, both from the University of Alberta, alongside their collaborators, present a novel approach to rationale extraction, a technique designed to create inherently interpretable artificial intelligence systems. Their work introduces REKD (Rationale Extraction with Knowledge Distillation), a method where a ‘student’ network learns not only from task predictions but also from the rationales, the key features driving those predictions, provided by a more capable ‘teacher’ network. This innovative knowledge transfer significantly boosts the performance of less powerful models, offering a pathway to deploy interpretable AI even with limited computational resources and representing a step towards AI systems that learn more like humans.

The study addresses a critical challenge in the field of explainable artificial intelligence: improving the trustworthiness of deep neural networks in high-stakes applications such as healthcare and finance.

Researchers tackled the difficulty of simultaneously training a feature selector (generator) and a predictor within rationale extraction, a process complicated when base neural networks lack sufficient capacity. This breakthrough reveals a novel approach inspired by human learning, where a ‘student’ RE model learns not only through its own exploration but also by leveraging the rationales and predictions of a more powerful ‘teacher’ network.
The team achieved this by employing knowledge distillation, transferring interpretable knowledge from the teacher to the student, thereby overcoming the “chicken and egg” problem inherent in traditional rationale extraction methods. This structural adjustment aligns with how humans effectively learn from verifiable knowledge, allowing the student to benefit from the teacher’s pre-verified feature selections.

The research establishes a neural-model agnostic method, meaning any black-box neural network can be integrated as the foundation for the REKD framework. Experiments conducted across language and vision classification datasets, including IMDB movie reviews, CIFAR 10, and CIFAR 100, demonstrate that REKD substantially improves the predictive performance of student RE models.

Specifically, the team utilised variants of BERT and vision transformer (ViT) models to validate the viability of their approach. Furthermore, the study unveils a progressive complexity curriculum through synchronising knowledge distillation temperature with a Gumbel-Softmax annealing scheduler. This allows the student to initially absorb broad knowledge from the teacher, then gradually refine its feature selections as the temperature decreases, enforcing precision during the discretisation phase. The work opens avenues for building more transparent and reliable AI systems, fostering greater trust in their decision-making processes and enabling wider adoption in critical domains.

Teacher-student learning via rationales and predictions improves rationale extraction performance

Scientists developed Rationale Extraction with Knowledge Distillation (REKD) to enhance the performance of rationale extraction models, particularly when utilising less capable neural networks. The study addressed the challenge of simultaneously training a generator, responsible for feature selection, and a predictor, a process complicated by limited supervision signals.

Researchers drew inspiration from human learning, positing that a teacher model with established, verifiable rationales could effectively guide a student model’s learning process. To implement REKD, the team engineered a system where a student RE model learns not only through its own rationale extraction optimisation but also by leveraging the rationales and predictions of a pre-trained teacher, termed a ‘rationalist’.

This approach bypasses the ‘chicken and egg’ problem inherent in standard RE training, where the generator and predictor are mutually dependent. Experiments employed the Straight-Through Gumbel-Softmax estimator to enable differentiable feature selection, a crucial step for gradient-based optimisation.

The research pioneered a progressive complexity curriculum by synchronising knowledge distillation temperature with an annealing scheduler. This technique initially allows the student to absorb broad knowledge from the teacher’s soft targets, gradually increasing the focus on precise feature selection.

The team validated REKD’s viability across diverse datasets, including IMDB movie reviews, CIFAR 10, and CIFAR 100, utilising variants of both BERT and vision transformer (ViT) models as backbone architectures. Results demonstrated that REKD significantly improves the predictive performance of student RE models across language and vision classification tasks, confirming the effectiveness of knowledge distillation in this context.

Rationale extraction performance gains via knowledge distillation and rationale length optimisation are significant

Scientists achieved significant improvements in the predictive performance of rationale extraction (RE) models using a new technique called REKD, or Rationale Extraction with Knowledge Distillation. The research focused on enabling less capable neural networks, termed ‘students’, to learn effectively from more powerful ‘teacher’ networks by leveraging both rationales and predictions.

Experiments revealed that REKD substantially enhances accuracy across language and vision classification tasks, including IMDB movie reviews, CIFAR 10, and CIFAR 100 datasets. Results demonstrate a clear positive correlation between predictive performance and rationale length, as observed with ViT models on CIFAR 10.

The team measured that varying the rationale percentage target (ptarget) directly impacted accuracy, with smaller ptargets imposing stronger constraints and generally reducing performance. Crucially, REKD significantly improved the student RE models with reduced variance, as detailed in Tables 1, 2, and 3.

For instance, on CIFAR 10, REKD boosted the accuracy of ViT Small from 0.889 to 0.968 and ViT Tiny from 0.797 to 0.936, both at a rationale ratio of 15%. Measurements confirm that the accuracy drop from classification to rationale extraction is more pronounced when the base neural model has limited capacity.

Knowledge distillation improves rationale learning through synchronised annealing of teacher and student models

Scientists have developed a new method, Rationale Extraction with Knowledge Distillation (REKD), to enhance the performance of rationale extraction models, which aim to create more interpretable artificial neural networks. REKD improves predictive performance by enabling a ‘student’ neural network to learn not only through its own rationale exploration but also from the rationales and predictions of a more capable ‘teacher’ network.

This approach mirrors how humans learn effectively from interpretable and verifiable knowledge, addressing a common challenge in training smaller or less capable networks. The research demonstrates significant improvements in predictive accuracy across both language and vision tasks, utilising datasets such as IMDB movie reviews and CIFAR 10/100, with various BERT and vision transformer models.

By synchronising knowledge distillation temperature with Gumbel-Softmax annealing, REKD facilitates a curriculum for the student network, allowing it to learn more efficiently. The authors acknowledge a limitation in their current experiments, which focused on distillation between networks of the same architecture, and suggest future work could explore distillation between different architectures like ViT and ResNet.

Additionally, they propose dynamic weight scheduling for the rationale extraction and knowledge distillation loss terms to potentially further optimise performance. Future applications could include deployment in resource-constrained environments, such as mobile healthcare devices, and extending the method to distill other discrete latent structures like relational graphs.

👉 More information
🗞 Learn from A Rationalist: Distilling Intermediate Interpretable Rationales
🧠 ArXiv: https://arxiv.org/abs/2601.22531

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

AI Swiftly Answers Questions by Focusing on Key Areas

AI Swiftly Answers Questions by Focusing on Key Areas

February 27, 2026
Machine Learning Sorts Quantum States with High Accuracy

Machine Learning Sorts Quantum States with High Accuracy

February 27, 2026
Framework Improves Code Testing with Scenario Planning

Framework Improves Code Testing with Scenario Planning

February 27, 2026