Incorrect evaluation results when converting model to float16 + FlashAttention2 after loading

## Description

Identified an issue in the evaluation pipeline that leads to significantly degraded evaluation results when using float16 together with flash_attention_2

## Problem

The evaluation code converts the model to ```float16``` after loading the checkpoint and enables ```flash_attention_2``` in lines 147-148 of 
```src/modernvbert/contrastive_training/evaluate.py```. 
However, this approach does not work correctly for this checkpoint and produces very poor evaluation results.

### Key points:
- The checkpoint parameters are stored in float32
- During evaluation, the model is:
    - loaded in float32
    - then converted to float16
    - and evaluated with flash_attention_2

This configuration results in incorrect scores

When the checkpoint is evaluated without forcing ```float16``` and ```FlashAttention```, the results are significantly better and consistent.

- Training emits warnings indicating that ```FlashAttention2``` requires ```float16```, suggesting it was not correctly enabled during training
- This implies the model was not trained with ```FlashAttention2``` in ```float16```

## Suggested Fixes

- Do not convert the model to float16 after loading if the checkpoint was trained in float32
- Only enable flash_attention_2 when the model is trained and stored in compatible precision




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect evaluation results when converting model to float16 + FlashAttention2 after loading #4

Description

Problem

Key points:

Suggested Fixes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect evaluation results when converting model to float16 + FlashAttention2 after loading #4

Description

Description

Problem

Key points:

Suggested Fixes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions