Scientists are tackling the persistent challenge of few-shot learning , identifying new objects from only a few examples , with a novel approach detailed in a new paper by Jiaying Wu, Can Gao, and Jinglu Hu et al! The team, spanning Jiangsu Ocean University, Waseda University, and other institutions, present PMCE, a Probabilistic Multi-Granularity Semantics with Caption-Guided Enhancement framework that significantly improves performance by incorporating both coarse class-level knowledge and detailed instance-level descriptions! Unlike existing semantic methods which focus solely on labelled support sets, PMCE dynamically retrieves relevant base class information and refines both support prototypes and query features, leading to a substantial 7.71% performance gain on the MiniImageNet benchmark! This research promises to advance the field by offering a more robust and accurate method for learning from limited data, with potential applications in areas like image recognition and robotics.
PMCE tackles few-shot learning bias effectively, achieving state-of-the-art
Scientists have developed a new framework, PMCE, to significantly improve few-shot learning, the ability of machines to recognise novel categories from just a few labelled examples! The research, detailed in a recent publication, addresses a critical challenge in this field: the tendency for prototypes, built from limited data, to be biased and fail to generalise effectively. This innovative approach constructs a nonparametric knowledge bank storing visual statistics and CLIP-encoded class name embeddings of base classes, enabling the retrieval of the most relevant base classes for each new category at meta-test time.
The team achieved this breakthrough by aggregating these statistics into category-specific prior information, which is then seamlessly fused with support set prototypes using a simple Maximum A Posteriori (MAP) update. Simultaneously, a frozen BLIP captioner generates label-free image descriptions, and a lightweight enhancer, trained on base classes, optimises both support prototypes and query features. This optimisation is achieved through an inductive protocol incorporating consistency regularization, effectively stabilising potentially noisy captions and refining image representations. Experiments conducted on four established benchmarks demonstrate that PMCE consistently outperforms strong baseline methods, achieving an impressive absolute gain of up to 7.71% over the strongest semantic competitor on the MiniImageNet dataset in the challenging 1-shot setting.
This work establishes a novel paradigm for few-shot learning by strategically leveraging multi-granularity semantics. Unlike previous methods that primarily focus on modifying support representations, PMCE uniquely enhances both support prototypes and query features, creating a more aligned and robust representation space. The researchers prove that by identifying semantically relevant base categories using class-name embeddings and then refining image representations with label-free captions, they can significantly reduce bias in prototypes and improve generalisation performance. The introduction of a lightweight consistency regularizer further stabilises the learning process, ensuring reliable results even with limited data.
Furthermore, the study unveils a practical and efficient approach, training only a lightweight enhancement module while keeping the core backbone and vision-language encoders frozen. This design choice ensures that performance gains stem from the intelligent integration of semantics, rather than simply increasing model complexity. The code for PMCE is publicly available, facilitating further research and development in this rapidly evolving field, and opening avenues for applications in areas such as image recognition, object detection, and robotics where labelled data is scarce and adaptability is paramount.
Probabilistic Meta-Learning with Semantic Knowledge Transfer enables robust
Scientists developed PMCE, a probabilistic few-shot learning framework leveraging multi-granularity semantics with caption-guided enhancement to address limitations in identifying novel categories from limited labeled samples! The study pioneers a nonparametric knowledge bank storing visual statistics for each category alongside CLIP-encoded class name embeddings of base classes, enabling improved generalization from scarce data. At meta-test time, the research team retrieved the most relevant base classes based on the similarity of class name embeddings for each novel category, effectively establishing semantic connections between known and unknown classes. These retrieved statistics were then aggregated into category-specific prior information and fused with support set prototypes using a Maximum A Posteriori (MAP) update, refining initial estimations and reducing bias.
Simultaneously, researchers harnessed a frozen BLIP captioner to generate label-free instance-level image descriptions, providing rich semantic context without requiring additional annotations. A lightweight enhancer, trained on base classes, was then employed to optimize both support prototypes and query features under an inductive protocol, ensuring consistent learning and mitigating the impact of noisy captions. This innovative approach achieves a stronger alignment between support and query images, as demonstrated by a cosine similarity of 0.90 compared to 0.56 in baseline methods utilizing only class-level semantics, illustrated in Figure 1! Experiments employed four benchmarks, including MiniImageNet, to rigorously evaluate PMCE’s performance against strong baselines, revealing consistent improvements across all datasets.
Notably, the study achieved an absolute gain of up to 7.71% over the strongest semantic competitor on MiniImageNet in the 1-shot setting, demonstrating the effectiveness of the proposed methodology. The team meticulously measured performance using standard few-shot learning protocols, ensuring fair comparison and reliable results. This work’s code is publicly available, facilitating reproducibility and further research in the field, and is accessible at https://anonymous0.4open. science/r/PMCE-275D! The PMCE framework’s ability to integrate multi-granularity semantics and caption-guided enhancement represents a significant methodological advance in few-shot learning, paving the way for more robust and accurate category identification with limited data.
PMCE achieves state-of-the-art few-shot learning accuracy on multiple
The research team developed PMCE, a novel approach to few-shot learning, achieving state-of-the-art results on multiple benchmark datasets. PMCE leverages vision-language models to enhance feature representation and employs a unique training objective combining supervised contrastive learning, prototype preservation, and caption consistency. Experiments demonstrate that PMCE significantly outperforms existing methods, particularly when utilizing a Swin-T backbone. On the MiniImageNet dataset, PMCE with a ResNet-12 backbone achieved an accuracy of 82.90% in the 5-way 1-shot setting and 92.29% in the 5-way 5-shot setting.
Employing a Swin-T backbone further improved performance to 85.03% and 92.77% respectively. These results represent a substantial advancement over previous state-of-the-art methods. Similarly, on the CIFAR-FS dataset, PMCE with ResNet-12 reached 76.97% accuracy (5-way 1-shot) and 84.76% (5-way 5-shot), while the Swin-T backbone yielded 80.92% and 89.02% respectively. On the more challenging FC100 dataset, PMCE with ResNet-12 achieved 49.81% (5-way 1-shot) and 63.67% (5-way 5-shot), and the Swin-T backbone achieved 52.46% and 67.00%. The training objective incorporates a caption consistency loss, utilizing supervised contrastive learning to ensure coherent caption semantics and maintain inter-class separation.
A prototype-preserving term prevents excessive drift from the frozen backbone structure, ensuring stable base-class geometry. The overall objective combines these elements with a standard classification loss, tuned using a validation set. During meta-testing, the backbone and enhancer are frozen, with classification performed using enhanced features and logistic regression. The datasets used for evaluation included MiniImageNet, TieredImageNet, CIFAR-FS, and FC100, all standard benchmarks in the few-shot learning field.
PMCE refines few-shot learning with knowledge banks, improving
Scientists have developed a new probabilistic framework, PMCE, to improve few-shot learning, a machine learning approach where models learn from very limited labelled data! This research addresses the challenge of biased prototypes and poor generalisation when estimating representations from scarce data, a common problem in few-shot scenarios. PMCE leverages multi-granularity semantics with caption-guided enhancement to construct a nonparametric knowledge bank storing visual statistics and class name embeddings of base classes! The framework operates by retrieving relevant base classes based on the similarity of class name embeddings for each novel category, then aggregating their statistics into category-specific prior information to refine support set prototypes via a MAP update.
Simultaneously, a frozen BLIP captioner generates instance-level image descriptions, and a lightweight enhancer optimises both support prototypes and query features using a consistency regulariser to manage caption noise! Experiments across four benchmarks demonstrate that PMCE consistently outperforms strong baseline methods, achieving up to a 7.71% absolute gain on the MiniImageNet dataset in the 1-shot setting! The authors acknowledge that the confidence of the generated captions could be further explored to reduce semantic drift caused by noisy semantics! Future research will focus on this aspect to refine the framework’s performance! This work demonstrates a significant advancement in few-shot learning by effectively integrating semantic guidance and caption-based feature optimisation, resulting in more stable prototypes and better-aligned query representations, a unified metric space for improved performance! The consistent improvements observed across multiple benchmarks highlight the potential of PMCE for applications where labelled data is scarce and learning must occur rapidly.
👉 More information
🗞 PMCE: Probabilistic Multi-Granularity Semantics with Caption-Guided Enhancement for Few-Shot Learning
🧠 ArXiv: https://arxiv.org/abs/2601.14111
