- 🗓️ 2025/06/12: AlignedNorm code is released!
- 🗓️ 2026/05/01: AlignedNorm is accepted by ICML 2026 🎉
Prompt learning has become an efficient way to adapt vision-language models (VLMs) to downstream tasks. However, existing end-to-end and decoupled methods often optimize base and new classes in isolated, task-specific feature spaces, which can lead to local optima and limited generalization.
We introduce AlignedNorm, a simple prompt-learning method built upon the concept of a Coupled Prompt Field. Instead of treating base and new classes independently, the coupled field places them in a shared optimization space where their learning dynamics mutually constrain each other. AlignedNorm realizes this coupling by dynamically aligning learnable prompts with the native feature scale of the pretrained VLM.
From isolated optimization to a Coupled Prompt Field shared by base and new classes.
- A new perspective on prompt learning. We formulate base-to-new generalization through the Coupled Prompt Field, which encourages joint rather than isolated optimization.
- Diagnosis of representation degradation. We reveal that uncontrolled prompt learning causes norm drift and Entanglement Collapse, weakening the pretrained representation structure.
- Simple and effective alignment. AlignedNorm aligns prompt norms with the VLM's native feature scale at both intermediate and output levels, without introducing a complex architecture.
- Better geometric preservation. AlignedNorm maintains a more favorable balance between feature uniformity and semantic tolerance across base and new classes.
AlignedNorm mitigates norm drift and Entanglement Collapse during prompt learning.
AlignedNorm better preserves the geometric structure of the pretrained representation space.
As illustrated above, end-to-end methods optimize prompts within a single task path, while decoupled methods separate the optimization of base and new classes. In contrast, AlignedNorm couples the two learning dynamics inside the vision encoder. It aligns the norms of learnable prompt tokens with the corresponding native [CLS] representations at intermediate layers and further aligns their projected features at the output level. These lightweight alignment objectives keep prompt updates on the pretrained model's feature scale, reducing representation distortion while preserving transferable knowledge for both base and new classes.
All commands below should be executed from the project root directory. Before running an experiment, directly modify DATA_ROOT in the corresponding script under scripts/alignednorm/:
DATA_ROOT="/path/to/your/datasets"The scripts that require this setting are:
- Base-to-Novel:
base2new_train.shandbase2new_test.sh - Cross-Dataset:
cross_datasets_train.shandcross_datasets_test.sh - Few-Shot:
few_shot.sh
By default, all scripts run three random seeds (1, 2, and 3) and summarize the results after completion.
Train on the base classes and evaluate the trained models on the new classes of all 11 datasets:
bash base_to_novel.shTo run only one dataset, first train on its base classes and then evaluate on its new classes:
bash scripts/alignednorm/base2new_train.sh eurosat
bash scripts/alignednorm/base2new_test.sh eurosatTrain a 16-shot model on ImageNet and evaluate it on all target datasets:
bash cross_datasets.shTo evaluate only one target dataset, run the training and evaluation stages separately:
bash scripts/alignednorm/cross_datasets_train.sh
bash scripts/alignednorm/cross_datasets_test.sh dtdRun the 1-, 2-, 4-, 8-, and 16-shot settings on all 11 datasets:
bash few_shot.shTo run all shot settings on a single dataset:
bash scripts/alignednorm/few_shot.sh eurosatUse SEEDS before a command to run only the selected seed or seeds:
# Run only seed 1
SEEDS=1 bash scripts/alignednorm/few_shot.sh eurosat
# Run seeds 1 and 3 for Base-to-Novel
SEEDS="1 3" bash scripts/alignednorm/base2new_train.sh eurosat
SEEDS="1 3" bash scripts/alignednorm/base2new_test.sh eurosat
# Run the complete Cross-Dataset experiment using only seed 2
SEEDS=2 bash cross_datasets.shFor Base-to-Novel and Cross-Dataset evaluation, use the same seeds for training and testing so that each evaluation script can locate its corresponding checkpoint. Existing output directories are skipped automatically; remove or rename the corresponding directory under output/ to rerun an experiment.
- Release code.
- Release model weights and corresponding log files.
If you have any questions, you can submit an issue on Github, or contact me by email (nkucsmq[at]gmail.com).
This codebase builds on MMRL/MMRL++ and clip_text_span.
If you find our paper or repo helpful for your research, please consider citing our paper and giving this repo a star⭐. Thank you!
@inproceedings{ma2026alignednorm,
title={AlignedNorm: Prompting Vision–Language Models via Coupled Prompt Field},
author={Ma, Qi and Wang, Chen-Yang and Gao, Dehong and Fan, Deng-Ping},
booktitle={ICML},
year={2026}
}



