Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion bionemo-recipes/models/esm2/.dockerignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
Dockerfile
README.md
checkpoint_export/
hf_to_te_checkpoint_export/
te_to_hf_checkpoint_export/
101 changes: 96 additions & 5 deletions bionemo-recipes/models/esm2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The ESM-2 implementation natively supports the following TransformerEngine-provi
| **Sequence Packing / THD input format** | ✅ Supported |
| **FP8 with THD input format** | ✅ Supported where FP8 is supported |
| **Import from HuggingFace checkpoints** | ✅ Supported |
| **Export to HuggingFace checkpoints** | 🚧 Under development |
| **Export to HuggingFace checkpoints** | ✅ Supported |

See [BioNemo Recipes](../../recipes/README.md) for more details on how to use these features to accelerate model
training and inference.
Expand Down Expand Up @@ -77,17 +77,108 @@ Training recipes are available in the `bionemo-recipes/recipes/` directory:
Generate converted ESM-2 checkpoints from existing HuggingFace transformers checkpoints:

```bash
mkdir -p checkpoint_export
mkdir -p hf_to_te_checkpoint_export
docker build -t esm2 .
docker run --rm -it --gpus all \
-v $PWD/checkpoint_export/:/workspace/bionemo/checkpoint_export \
-v $PWD/hf_to_te_checkpoint_export/:/workspace/bionemo/hf_to_te_checkpoint_export \
-v $HOME/.cache/huggingface/:/root/.cache/huggingface \
esm2 python export.py
esm2 python export.py hf-to-te
Comment on lines +80 to +85
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we need to make folders for the te-to-hf path, we just need to show how to take an existing TE model and convert it to HF (e.g., through a python code demo). Folks likely wont convert our TE models back to HF

```

### TE to HF Transformers conversion

(Coming soon)
```bash
MODEL_TAG=esm2_t6_8M_UR50D # specify which model to convert
mkdir -p te_to_hf_checkpoint_export
docker build -t esm2 .
docker run --rm -it --gpus all \
-v $PWD/te_to_hf_checkpoint_export/:/workspace/bionemo/te_to_hf_checkpoint_export \
-v $PWD/hf_to_te_checkpoint_export/$MODEL_TAG:/workspace/bionemo/hf_to_te_checkpoint_export/$MODEL_TAG \
-v $HOME/.cache/huggingface/:/root/.cache/huggingface \
esm2 python export.py te-to-hf --checkpoint-path /workspace/bionemo/hf_to_te_checkpoint_export/$MODEL_TAG
```
Comment on lines +90 to +99
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this could be just something like

from esm.convert import convert_esm_te_to_hf

te_model = AutoModel.from_pretrained("/path/to/te/checkpoint", trust_remote_code=True)
hf_model = convert_esm_te_to_hf(te_model)
hf_model.save_pretrained("/path/to/exported/model")


## Developer Conversion Workflow

This section explains how to convert between Hugging Face and Transformer Engine (TE) ESM2 model formats. The process demonstrates bidirectional conversion: from Hugging Face to TE format for optimized inference, and back to Hugging Face format for sharing and deployment. The workflow involves several key steps:

### Step 1: Load Original Hugging Face Model

First, load the original ESM2 model from Hugging Face:

```python
from transformers import AutoModelForMaskedLM

model_hf_original = AutoModelForMaskedLM.from_pretrained("facebook/esm2_t6_8M_UR50D")
```

This loads the pre-trained ESM2 model that will serve as our reference for comparison.

### Step 2: Export to Transformer Engine Format

Convert the Hugging Face model to Transformer Engine format using the high-level export API:

```python
from pathlib import Path
from esm.export import export_hf_checkpoint

te_checkpoint_path = Path("te_checkpoint")
export_hf_checkpoint("esm2_t6_8M_UR50D", te_checkpoint_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't as obvious a function name as convert_esm_te_to_hf, let's just keep the existing API and add the reverse path?

```

This creates a Transformer Engine checkpoint that can be used for optimized inference.

### Step 3: Export Back to Hugging Face Format

Convert the Transformer Engine checkpoint back to Hugging Face format:

```python
from esm.export import export_te_checkpoint

hf_export_path = Path("hf_export")
exported_model_path = te_checkpoint_path / "esm2_t6_8M_UR50D"
export_te_checkpoint(str(exported_model_path), hf_export_path)
```

This step creates a new Hugging Face model that should be functionally equivalent to the original.

### Step 4: Load and Test the Exported Model

Load the exported model and perform validation:

```python
from transformers import AutoTokenizer
model_hf_exported = AutoModelForMaskedLM.from_pretrained(str(hf_export_path))
tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t6_8M_UR50D")
```

### Step 5: Validate Model Equivalence

Test the exported model against the original using masked language modeling:

```python
import torch
from transformers import DataCollatorForLanguageModeling

# Prepare test sequence
sequence = "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"
batch = tokenizer([sequence], return_tensors="pt")
collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=True, mlm_probability=0.15)
inputs = collator([{"input_ids": batch["input_ids"][0]}])

# Compare outputs
with torch.no_grad():
outputs_original = model_hf_original(**inputs)
outputs_exported = model_hf_exported(**inputs)

# Check differences
logits_diff = torch.abs(outputs_original.logits - outputs_exported.logits).max()
print(f"Max logits difference: {logits_diff:.2e}")

if outputs_original.loss is not None and outputs_exported.loss is not None:
loss_diff = abs(outputs_original.loss - outputs_exported.loss)
print(f"Loss difference: {loss_diff:.2e}")
```

## Developer Guide

Expand Down
49 changes: 44 additions & 5 deletions bionemo-recipes/models/esm2/export.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,10 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
from pathlib import Path

from esm.export import export_hf_checkpoint
from esm.export import export_hf_checkpoint, export_te_checkpoint


ESM_TAGS = [
Expand All @@ -30,10 +31,48 @@

def main():
"""Export the ESM2 models from Hugging Face to the Transformer Engine format."""
# TODO (peter): maybe add a way to specify the model to export or option to export all models?
for tag in ESM_TAGS:
print(f"Converting {tag}...")
export_hf_checkpoint(tag, Path("./checkpoint_export"))
parser = argparse.ArgumentParser(description="Convert ESM2 models from Hugging Face to Transformer Engine format")

subparsers = parser.add_subparsers(dest="conversion_type", required=True, help="Type of conversion to perform")

hf_to_te_parser = subparsers.add_parser("hf-to-te", help="Convert from HuggingFace to Transformer Engine format")
hf_to_te_parser.add_argument(
"--model",
type=str,
choices=ESM_TAGS,
help="Specific model tag to convert. If not provided, all models will be converted.",
)
hf_to_te_parser.add_argument(
"--output-path",
type=str,
default="./hf_to_te_checkpoint_export",
help="Output directory path for the converted model. Defaults to './hf_to_te_checkpoint_export'",
)

te_to_hf_parser = subparsers.add_parser("te-to-hf", help="Convert from Transformer Engine to HuggingFace format")
te_to_hf_parser.add_argument(
"--checkpoint-path", type=str, required=True, help="Path to the Transformer Engine checkpoint to convert"
)
te_to_hf_parser.add_argument(
"--output-path",
type=str,
default="./te_to_hf_checkpoint_export",
help="Output directory path for the converted model. Defaults to './te_to_hf_checkpoint_export'",
)

args = parser.parse_args()

if args.conversion_type == "hf-to-te":
if args.model:
print(f"Converting {args.model} from HuggingFace to Transformer Engine format")
export_hf_checkpoint(args.model, Path(args.output_path))
else:
for tag in ESM_TAGS:
print(f"Converting {tag} from HuggingFace to Transformer Engine format")
export_hf_checkpoint(tag, Path(args.output_path))
else:
print(f"Converting {args.checkpoint_path} from Transformer Engine to HuggingFace format")
export_te_checkpoint(args.checkpoint_path, Path(args.output_path))


if __name__ == "__main__":
Expand Down
Loading