From Points to Clouds

Learning Robust Semantic Distributions for Multi-modal Prompts

Supplementary Material - Project

Methodology Highlights

Semantic Cloud Learning

Instead of optimizing a brittle, static point vector, P2C learns a distribution (cloud) over the embedding space. This captures a robust semantic region resilient to input variations and domain shifts.

Dynamic Prompt Denoising

We implement a GMM-based noise generator with an annealed schedule. This perturbs prompts during training, forcing the model to navigate a smoother semantic landscape from coarse to fine.

Auxiliary V-L Denoising

The V-L Mapper is re-tasked as a denoising autoencoder. It learns to reconstruct clean visual prompts from noisy text inputs, ensuring deep and robust cross-modal alignment.

Architecture Overview

Architecture Overview

Points-to-Clouds (P2C) framework. The dual denoising mechanism learns a robust semantic CLOUD instead of a POINT. Text prompts are perturbed with annealed noise, while the V-L Mapper (F) is trained to reconstruct clean visual prompts from the noisy inputs.

State-of-the-Art Performance

Base-to-Novel Generalization (Average over 11 Datasets)

Method Base Accuracy Novel Accuracy Harmonic Mean (HM)
CoOp 82.7 63.2 71.7
CoCoOp 80.5 71.7 75.8
MaPLe 82.3 75.1 78.6
P2C (Ours) 83.5 76.1 79.7

Core Implementation Preview

core.py

# From core.py: Implementation of GMM Noise Generator
class GaussianMixtureNoiseGenerator(nn.Module):
    def __init__(self, cfg, device):
        super().__init__()
        # Initialize GMM components from config
        self.num_components = cfg.TRAINER.PROMPT_DENOISING.GMM_COMPONENTS
        self.gmm_means = cfg.TRAINER.PROMPT_DENOISING.GMM_MEANS
        self.gmm_stds = cfg.TRAINER.PROMPT_DENOISING.GMM_STDS
        
        # ... (initialization code hidden for brevity) ...

    def forward(self, tensor_like):
        mix = Categorical(self.mix_weights)
        comp = Normal(self.means, self.stds)
        gmm = MixtureSameFamily(mix, comp)
        
        # Sample noise from the semantic cloud distribution
        noise = gmm.sample(tensor_like.shape)
        return noise.to(device=tensor_like.device, dtype=tensor_like.dtype)

# From core.py: Multi-modal Prompt Learner with DPD
class MultiModalPromptLearner(nn.Module):
    # ...
    def forward(self, epoch=None, max_epoch=None):
        # Get current annealed noise scale
        current_noise_scale = self._get_noise_scale(epoch, max_epoch)

        if current_noise_scale > 0:
            # Apply Dynamic Prompt Denoising (DPD)
            ctx = ctx + self._generate_noise(ctx, current_noise_scale)
            if self.use_atp:
                ctx_att1 = ctx_att1 + self._generate_noise(ctx_att1, current_noise_scale)
                # ... apply to other attributes ...

        shared_ctx = self.proj(self.ctx)
        return prompts, shared_ctx, ...