Multimodal Protein Language Model
This documentation provides an overview, installation instructions, usage examples, and API reference for the multimodal_protein_language_model repository by ayyucedemirbas. It supports sequence-to-structure/function prediction using transformer-based encoder-decoder architecture with a mixture-of-experts and optional structural image input.
The MultimodalProteinModel integrates:
- Protein Sequence Encoder based on transformer layers with mixture-of-experts routing.
- Protein Structure/Function Decoder generating structural tokens.
- Image Encoder for optional 2D structural data to perform multimodal fusion.
- Custom learning rate scheduler following the "Attention Is All You Need" warmup strategy.
Use cases include predicting protein secondary/tertiary structures, binding sites, or functional motifs, optionally guided by structural images.
multimodal_protein_language_model/
├── README.md            # Minimal original readme
├── LICENSE              # MIT License
├── encoder.py           # Transformer encoder with MoE layers
├── decoder.py           # Transformer decoder with MoE layers
├── layers.py            # Core MultiheadAttention, MixtureOfExperts, positional encoding
├── model.py             # Complete MultimodalProteinModel class
├── preprocessing.py     # Sequence and structure tokenization utilities
└── training.py          # High-level training routine and entry point
- 
Clone the repository git clone https://github.com/ayyucedemirbas/multimodal_protein_language_model.git cd multimodal_protein_language_model
- 
Create a virtual environment (recommended) python3 -m venv venv source venv/bin/activate
- 
Install dependencies pip install tensorflow numpy 
Two helper functions in preprocessing.py:
- 
preprocess_protein_sequence(sequence: str, max_length: int, vocab: dict) -> tf.TensorConverts an amino acid sequence to integer tokens, pads/truncates tomax_length.
- 
preprocess_structure_data(structure_data: List[str], max_length: int, vocab: dict) -> tf.TensorConverts structure tokens (e.g., secondary structure labels) to integers, adds start/end tokens, pads/truncates.
Example:
from preprocessing import preprocess_protein_sequence, preprocess_structure_data
# Sample vocab
aa_vocab = {aa: i+3 for i, aa in enumerate("ACDEFGHIKLMNPQRSTVWY")}
aa_vocab.update({"<PAD>":0, "<START>":1, "<END>":2, "<UNK>":3})
seq_tensor = preprocess_protein_sequence("ACDIPK", max_length=10, vocab=aa_vocab)- Layers: Embedding, positional encoding, num_layersofEncoderLayer.
- EncoderLayer: Multi-head self-attention (with dropout & layer norm) + Mixture-of-Experts feed-forward.
from encoder import ProteinEncoder
encoder = ProteinEncoder(
    num_layers=6, d_model=512, num_heads=8,
    d_ff=2048, num_experts=8, k=2,
    amino_acid_vocab_size=24, max_position=1024,
    dropout_rate=0.1
)
enc_output = encoder(input_seq_tensor)- Layers: Embedding, positional encoding, num_layersofDecoderLayer.
- DecoderLayer: Masked self-attention + encoder-decoder cross-attention + MoE feed-forward.
from decoder import ProteinDecoder
decoder = ProteinDecoder(
    num_layers=6, d_model=512, num_heads=8,
    d_ff=2048, num_experts=8, k=2,
    target_vocab_size=structure_vocab_size,
    max_position=1024
)
logits, attn_weights = decoder(target_tokens, enc_output)- Image Encoder: 3 Conv2D + MaxPool blocks, Flatten, Dense to d_model.
- Fusion: Concatenate sequence features and repeated image features, project via Dense(d_model).
from model import CustomLearningRateScheduler
lr_schedule = CustomLearningRateScheduler(d_model=512, warmup_steps=4000)
optimizer = tf.keras.optimizers.Adam(lr_schedule)train_multimodal_protein_model(...) orchestrates preprocessing, dataset creation, model compilation, and training.
- protein_seqs: List of strings (amino acid sequences).
- structure_data: List of lists/strings of structure labels.
- structural_images: Optional array of image tensors.
- batch_size,- epochs, model hyperparameters,- checkpoint_path.
Example Usage:
from training import train_multimodal_protein_model
# Dummy data
protein_seqs = ["ACDEFGHIKLMNPQRS"]
structure_data = [["H","E","C","C"]]
# Train
model, history, aa_vocab, struct_vocab = train_multimodal_protein_model(
    protein_seqs, structure_data, epochs=5, batch_size=2
)- MultiheadAttention: call([q,k,v], mask=None, training=None)→(output, attn_weights)
- ExpertLayer: Feed-forward sub-layer.
- MixtureOfExperts: call(x, training=None)→ gated MoE output.
- **positional_encoding(position, d_model)→ Tensor of shape(1, position, d_model)`
- 
MultimodalProteinModel: - call((protein_seq, structure_targets, structural_image), training)→- (logits, attention_weights)
- train_step(data)→ dict with- 'loss'and- 'accuracy'
- .create_masks(inp, tar)→- (enc_padding_mask, combined_mask, dec_padding_mask)
- .metricsproperty →- [loss_tracker, accuracy_metric]
 
- **train_multimodal_protein_model(...)** →(model, history, amino_acid_vocab, structure_vocab)`
This project is licensed under the GNU GENERAL PUBLIC LICENSE Version 3. Feel free to use and modify.