Methodology Highlights
Semantic Cloud Learning
Instead of optimizing a brittle, static point vector, P2C learns a distribution (cloud) over the embedding space. This captures a robust semantic region resilient to input variations and domain shifts.
Dynamic Prompt Denoising
We implement a GMM-based noise generator with an annealed schedule. This perturbs prompts during training, forcing the model to navigate a smoother semantic landscape from coarse to fine.
Auxiliary V-L Denoising
The V-L Mapper is re-tasked as a denoising autoencoder. It learns to reconstruct clean visual prompts from noisy text inputs, ensuring deep and robust cross-modal alignment.
Architecture Overview
Points-to-Clouds (P2C) framework. The dual denoising mechanism learns a robust semantic CLOUD instead of a POINT. Text prompts are perturbed with annealed noise, while the V-L Mapper (F) is trained to reconstruct clean visual prompts from the noisy inputs.
State-of-the-Art Performance
Base-to-Novel Generalization (Average over 11 Datasets)
| Method | Base Accuracy | Novel Accuracy | Harmonic Mean (HM) |
|---|---|---|---|
| CoOp | 82.7 | 63.2 | 71.7 |
| CoCoOp | 80.5 | 71.7 | 75.8 |
| MaPLe | 82.3 | 75.1 | 78.6 |
| P2C (Ours) | 83.5 | 76.1 | 79.7 |