Few-shot class-incremental learning remains a significant hurdle in artificial intelligence, demanding systems learn new concepts from minimal data without forgetting previously acquired knowledge. Shuai Huang, Xuhan Lin, and Yuwu Lu from the School of Artificial Intelligence, South China Normal University, address this challenge in their new research by introducing CLS Token Steering Prompts (CASP), a novel approach that steers attention using the CLS token to improve feature representation and generalisation. By modulating self-attention weights with trainable bias parameters and employing a Manifold Token Mixup strategy, CASP effectively leverages pre-trained knowledge and expands representation capacity, crucially important when dealing with extremely limited data. Their experiments on challenging datasets including CUB200, CIFAR100, and ImageNet-R demonstrate CASP surpasses existing state-of-the-art methods in both standard and fine-grained few-shot continual learning, all without the need for parameter fine-tuning during incremental learning.

This breakthrough research tackles the problem of rapidly adapting models to new classes with extremely limited data while simultaneously preventing catastrophic forgetting of previously learned information. The team achieved this by drawing inspiration from the human cognitive process of selective attention, specifically how the CLS token functions to filter irrelevant information. CASP introduces class-shared trainable bias parameters directly into the query, key, and value projections of the CLS token, effectively modulating self-attention weights and enhancing feature representation.

This innovative approach moves beyond conventional prompt-based methods by focusing on leveraging pretrained knowledge to learn shared feature representations across future categories during the initial base session. This technique synthesizes potential new class features, bolstering generalization and reserving representation capacity for subsequent tasks. Experiments conducted on the CUB200, CIFAR100, and ImageNet-R datasets demonstrate that CASP consistently outperforms state-of-the-art methods in both standard and fine-grained FSCIL scenarios.
Notably, CASP achieves these improvements without requiring any fine-tuning during the incremental phases, significantly reducing parameter overhead and computational demands. This work opens avenues for developing artificial intelligence systems that can learn and adapt more like humans, acquiring new knowledge with minimal examples and retaining previously learned information effectively. Through comprehensive experimentation, they empirically demonstrate the superior efficacy of CASP, establishing new state-of-the-art results across multiple benchmarks without the need for fine-tuning during incremental phases. This research addresses the difficulty of continually learning new classes from limited examples while avoiding catastrophic forgetting of previously learned information. The team engineered a novel framework that leverages pretrained knowledge to improve feature representation sharing across future categories during the initial base session. Inspired by the function of the CLS token, analogous to human attention, which progressively filters irrelevant information, the study pioneered a method to modulate self-attention weights explicitly.

Researchers introduced class-shared trainable bias parameters into the query, key, and value projections of the CLS token, effectively recalibrating the self-attention mechanism. This symmetric prompt injection concurrently modulates these projections, enabling more comprehensive feature extraction and amplification from the limited few-shot instances. To further boost generalization, the team designed an attention perturbation strategy, randomly perturbing the attention weights of the prompted CLS token, acting as a regularizer to encourage learning of generalized, stable decision boundaries. This innovative approach simulates noise and prevents overfitting to superficial patterns in the scarce data.

Experiments employed the CUB200, CIFAR100, and ImageNet-R datasets to rigorously evaluate CASP’s performance. This technique effectively expands the feature space and enhances the model’s ability to accommodate new information without compromising existing knowledge. The system delivers superior performance in both standard and fine-grained FSCIL settings, consistently outperforming state-of-the-art methods.

Notably, CASP achieves these improvements without requiring fine-tuning during incremental phases and with a significantly reduced parameter overhead. The work demonstrates that CASP not only enhances performance but also improves efficiency, making it a promising solution for real-world continual learning applications. The code implementing CASP is publicly available, facilitating further research and development in this crucial area of artificial intelligence, accessible at https://github. The research team focused on enhancing feature adaptation from limited samples and mitigating catastrophic forgetting in continual learning scenarios. Experiments revealed that CASP implements symmetric prompt injection across query, key, and value projections, demonstrably improving feature adaptation. The team measured performance across CUB200, CIFAR100, and ImageNet-R datasets, consistently achieving state-of-the-art results in both standard and fine-grained FSCIL settings.

Results demonstrate that CASP outperforms existing methods without requiring fine-tuning during incremental phases, a significant technical achievement. Tests prove that CASP’s approach focuses on refining the CLS token during the base session to learn domain-generalizable representations. The work details that the CLS token, similar to human cognitive filtering, progressively filters out task-irrelevant information, and CASP modulates the self-attention weights via trainable bias parameters in the query, key, and value projections. Empirical validation established new state-of-the-art results across multiple benchmarks, requiring fewer parameters and eliminating the need for fine-tuning during incremental phases.

Scientists recorded that the framework operates in parallel with forward-compatible strategies and remains architecturally conservative, requiring no modifications to the underlying Vision Transformer (ViT) model. The breakthrough delivers a significant advancement in FSCIL by attentively modulating the information aggregation process of the CLS token, leading to more robust and generalizable learning. Measurements confirm the efficacy of this approach in scenarios with severely limited labelled samples per category, paving the way for more adaptable and efficient continual learning systems.

👉 More information
🗞 CASP: Few-Shot Class-Incremental Learning with CLS Token Attention Steering Prompts
🧠 ArXiv: https://arxiv.org/abs/2601.16773

Tags:

Catastrophic Forgetting class-shared bias parameters CLS token Continual Learning feature representations Few-shot class-incremental learning ImageNet-R! Manifold Token Mixup prompt-based methods self-attention weights

Casp Achieves Few-Shot Class-Incremental Learning with Limited Samples and Prompts

Rohail T.

Latest Posts by Rohail T.:

Accurate Quantum Sensing Now Accounts for Real-World Limitations

Quantum Error Correction Gains a Clearer Building Mechanism for Robust Codes

Protected: Models Achieve Reliable Accuracy and Exploit Atomic Interactions Efficiently