Anemoi [Fine-tuning, Transfer Learning, Model Freezing] Roadmap

During 2025, I plan to work on the fine-tuning of Anemoi models. This issue should work as a roadmap and info-dump, as well as a discussion ground for different users.

## Current State

The current state of fine-tuning in Anemoi revolves around re-training the entire model.

While this makes sense in pre-training the dynamics with single-step rollout and then finalizing the training with multi-step rollout, it is disadvantageous when we want to fine-tune on data with different distributions. This is the `naive fine-tuning` approach:

1. Train model $`M`$ with weights $`W`$ on dataset (or task) $`A`$
2. Train model $`M'`$ modifying the weights $`W`$ to $`W'`$ with a lower learning rate (and other tricks to make it work)

The disadvantages here are that the extensive pre-training of the model often captures nuances that may be "forgotten" by retraining the model. This has been observed in the training process and often leads to a lot of "finicky" training set-ups.

### Implementation details

This is currently implemented as warm-starts and forking of the training, is implemented in multiple config entries in `training`:

```yaml
# resume or fork a training from a checkpoint last.ckpt or specified in hardware.files.warm_start
run_id: null
fork_run_id: null
transfer_learning: False # activate to perform transfer learning
load_weights_only: False # only load model weights, do not restore optimiser states etc.
```
cf: https://github.com/ecmwf/anemoi-core/blob/d02e0bb2a9bf1700de90c8d862f80a510f03eceb/training/src/anemoi/training/config/training/default.yaml#L3-L7

This has implications for traceability and the automation of training pipelines. But it also has certain drawbacks: the forking of a run has to be specified by run_id that is expected to be on the same system, which discourages collaboration and sharing of checkpoints. These were design choices made early in the project before we anticipated the breadth of adoption.

## Proposed Solution

To address these limitations, I propose implementing a comprehensive fine-tuning capability in Anemoi that includes:

### 1. Enhanced Model Freezing

Building on PR #61, we need to extend the submodule freezing functionality to enable more granular control over which parts of the model are trainable during fine-tuning. This will allow:

- Freezing arbitrary submodules at multiple levels of the model hierarchy
- Partial freezing of specific parameter groups
- Possibly even differential learning rates for different model components

### 2. Integration with PEFT Library

Rather than implementing Parameter-Efficient Fine-Tuning (PEFT) methods from scratch, I propose integrating with Hugging Face's PEFT library (https://huggingface.co/docs/peft/). This will provide access to multiple state-of-the-art fine-tuning methods including:

- **LoRA (Low-Rank Adaptation)**: A technique that significantly reduces the number of trainable parameters by injecting trainable low-rank matrices into each layer of the model.
  
> LoRA parameterizes the weight updates as $`\Delta W = BA`$, where $`B \in \mathbb{R}^{(d\times r)}`$ and $`A \in \mathbb{R}^{(r\times k)}`$ are low-rank matrices $`(r \ll \min(d,k))`$. Instead of fine-tuning all parameters in $`W \in \mathbb{R}^{(d\times k)}`$, we only need to train $`r(d+k)`$ parameters, resulting in significant memory savings while maintaining model performance.
  
  For a given matrix W, the output is computed as: h = Wx + BAx, where only BA is trained while W remains frozen.

- **QLoRA**: Quantized LoRA for even more memory-efficient fine-tuning
- **Prefix Tuning**: Optimizes a small continuous task-specific vector (prefix) while keeping the model frozen
- **Prompt Tuning**: Fine-tunes continuous prompts prepended to inputs
- **AdaLoRA**: Adaptive budget allocation across weight matrices based on importance

### 3. Decoupling of Checkpoint loading and IDs

The checkpoints should be loadable independent of the IDs and current systems, or at least side-loading functionality should exist.

### 4. Enhanced Configuration System

Expand the configuration system to support:

```yaml
training:
  fine_tuning:
    enabled: True
    strategy: "lora"  # Options: "full", "freeze", "lora", "qlora", "prompt", "prefix", etc.
    checkpoint:
      source: "s3://anemoi-models/global-10day/v1.2.3/checkpoint.pt"  # Remote sources supported
      local_cache: "~/.anemoi/cache/"
    peft:
      rank: 8  # For LoRA-based methods
      alpha: 16  # Scaling factor
      dropout: 0.1
    freeze:
      modules: ["encoder.block.0", "encoder.block.1"]  # Explicit module paths
      patterns: ["encoder.block.[2-11]", "processor.*"]  # Regex patterns supported
    optimizer:
      differential_lr: True
      lr_groups:
        - modules: ["decoder.*"]
          lr: 1e-4
        - modules: ["peft_layers.*"]
          lr: 5e-4
```

It would also be possible to implement an optional fine-tuning config. This will become more clear during implementation, I believe.

### 5. Integration with Training Pipelines

Collaborate with colleagues working on training pipeline automation to ensure:

- Fine-tuning configurations can be versioned and tracked
- Automated experimentation can be conducted across fine-tuning hyperparameters
- Results from fine-tuning can be systematically compared and evaluated

### 6. Maintain full traceability of training settings 

When implementing new configs and possible training pipelines, the different model training stages should be reflected in the provenance of the model to ensure high re-traceability of a model.

## Implementation Plan

### Phase 1: Foundation
- [ ] Enhance the existing model freezing functionality (building on PR #61)
- [ ] Create initial integration with PEFT library for LoRA

### Phase 2: Core Functionality
- [ ] Implement enhanced configuration system
- [ ] Complete PEFT integration with all major methods
- [ ] Develop comprehensive testing suite for fine-tuning capabilities
- [ ] Create documentation and examples

### Phase 3: Advanced Features and Integration
- [ ] Integrate with training pipeline automation
- [ ] Optimize performance for large-scale fine-tuning
- [ ] Create benchmarking tools for fine-tuning approaches

## Success Criteria

The fine-tuning capability will be considered successful when:

1. Users can fine-tune Anemoi models with a single configuration change
2. Memory usage during fine-tuning is reduced by at least 70% compared to full fine-tuning
3. Fine-tuned models maintain or improve performance metrics compared to current approaches
4. Checkpoint sharing and collaboration becomes seamless across different systems
5. Documentation and examples make the system approachable for new users

## Alternatives Considered

### Custom Implementation of PEFT Methods

While implementing our own versions of PEFT methods would give us maximum control, it would require significant development and maintenance effort. The Hugging Face PEFT library is well-maintained, extensively tested, and continuously updated with new methods, making it a more sustainable choice.

### Adapter-Based Approaches

Traditional adapter approaches insert new modules between existing layers. While effective, this can change the model architecture significantly. LoRA and similar methods preserve the original architecture while fine-tuning, which aligns better with our goals. (Although technically PEFT also uses adapters...)

### Full Model Distillation

Knowledge distillation could be used to transfer knowledge from the pre-trained model to a task-specific model. However, this approach requires training a new model from scratch for each task, which is computationally expensive and doesn't leverage the efficiency gains of modern fine-tuning techniques.

## Additional Context

This work on fine-tuning capabilities aligns with broader industry trends toward more efficient adaptation of large models. By implementing these capabilities in Anemoi, we'll enable users to:

1. Adapt global models to regional domains with minimal computational resources
2. Fine-tune on specific weather phenomena without degrading general performance
3. Build ensembles of specialized models derived from a common base
4. Collaborate more effectively by sharing and building upon each other's work

## Organization

ECMWF

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Anemoi [Fine-tuning, Transfer Learning, Model Freezing] Roadmap #248

Current State

Implementation details

Proposed Solution

1. Enhanced Model Freezing

2. Integration with PEFT Library

3. Decoupling of Checkpoint loading and IDs

4. Enhanced Configuration System

5. Integration with Training Pipelines

6. Maintain full traceability of training settings

Implementation Plan

Phase 1: Foundation

Phase 2: Core Functionality

Phase 3: Advanced Features and Integration

Success Criteria

Alternatives Considered

Custom Implementation of PEFT Methods

Adapter-Based Approaches

Full Model Distillation

Additional Context

Organization

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	# resume or fork a training from a checkpoint last.ckpt or specified in hardware.files.warm_start
	run_id: null
	fork_run_id: null
	transfer_learning: False # activate to perform transfer learning
	load_weights_only: False # only load model weights, do not restore optimiser states etc.

Anemoi [Fine-tuning, Transfer Learning, Model Freezing] Roadmap #248

Description

Current State

Implementation details

Proposed Solution

1. Enhanced Model Freezing

2. Integration with PEFT Library

3. Decoupling of Checkpoint loading and IDs

4. Enhanced Configuration System

5. Integration with Training Pipelines

6. Maintain full traceability of training settings

Implementation Plan

Phase 1: Foundation

Phase 2: Core Functionality

Phase 3: Advanced Features and Integration

Success Criteria

Alternatives Considered

Custom Implementation of PEFT Methods

Adapter-Based Approaches

Full Model Distillation

Additional Context

Organization

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions