LongDLLM

Plug-and-play long context adaptation for diffusion language models

LongDLLM enables seamless extension of diffusion language models to handle long-context inputs (up to 128k tokens) with minimal code changes and a unified interface. We support Apple DiffuCoder-7B-Instruct and GSAI-ML LLaDA-8B-Instruct for long-context adapation.

We fixed two memory inefficiences with LLaDA: We remove unnecessary materialization of the attention bias term (also reported in this discussion). We also modify the generation function to only keep the masked portion of the the hidden states. Together these optimizations reduce 60% of the memory footprint, allowing us to process up to 131k input tokens on a single A6000 GPU.

📦 Installation

pip install longdllm

Installing FlashAttention is highly recommended, you can install it separately via pip install flash-attn --no-build-isolation.

🚀 Quick Start

DiffuCoder Example

import torch
from transformers import AutoModel, AutoTokenizer
from longdllm import adapt_for_long_context

# 1. Load your model as usual
model = AutoModel.from_pretrained(
    "apple/DiffuCoder-7B-Instruct",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)

# 2. Adapt for long context (128k tokens)
model = adapt_for_long_context(model, target_length=131072)

# 3. Generate with long sequences
tokenizer = AutoTokenizer.from_pretrained("apple/DiffuCoder-7B-Instruct")
inputs = tokenizer("Your long prompt here...", return_tensors="pt")

output = model.diffusion_generate(
    inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=256,
    steps=32,  # Diffusion steps
    temperature=0.3,
    top_p=0.95,
    alg="entropy"
)

LLaDA Example

⚠️ LLaDA Note: Patched methods ignore attention_bias for memory efficiency. This is safe per LLaDA issue #90.

from transformers import AutoTokenizer, AutoModel
from longdllm import adapt_for_long_context

# 1. Load and adapt LLaDA model  
model = AutoModel.from_pretrained("GSAI-ML/LLaDA-8B-Instruct", trust_remote_code=True)
model = adapt_for_long_context(model, target_length=131072)

# 2. Use unified diffusion_generate interface
tokenizer = AutoTokenizer.from_pretrained("GSAI-ML/LLaDA-8B-Instruct")
inputs = tokenizer("Your instruction here...", return_tensors="pt")

outputs = model.diffusion_generate(
    input_ids=inputs.input_ids,
    max_new_tokens=512,
    temperature=0.0,
    steps=128,
    block_length=128,
    remasking='low_confidence'
)

💡 Examples

Check out our example scripts to see LongDLLM in action:

examples/test_diffucoder.py - DiffuCoder passkey retrieval test
examples/test_llada.py - LLaDA passkey retrieval test

Running Examples

# Test DiffuCoder with 128k context
cd examples && python test_diffucoder.py

# Test LLaDA with 128k context  
cd examples && python test_llada.py

Both examples demonstrate passkey retrieval - finding a hidden number in long documents, a common benchmark for long-context capabilities.

⚙️ Advanced Configuration

Custom Rescale Factors

Want to experiment? You can provide custom factors:

# Example: Exponential rescale factors (approximating optimized values)
import numpy as np
custom_factors = (
    list(np.logspace(0, 1.5, 34)) +  # 1.0 to ~31.6, exponentially spaced
    list(np.linspace(16.3, 31.3, 30))  # Linear spacing for higher frequencies
)  

model = adapt_for_long_context(
    model,
    target_length=65536,  # Custom length
    scaling_method='longrope',
    rescale_factors=custom_factors
)

License

MIT

Citation

If you use LongDLLM in your research, please cite:

@misc{ge2025longcontext,
  title = {Long-Context Extension for Language Diffusion Models up to 128k Tokens},
  url = {https://albertge.notion.site/longcontext},
  author = {Ge, Albert and Singh, Chandan and Zhang, Dinghuai and Peng, Letian and Zhuang, Yufan and Shang, Ning and Zhang, Li Lyna and Liu, Liyuan and Gao, Jianfeng},
  journal = {Albert Ge's Notion},
  year = {2025},
  month = sep,
}

🤝 Support & Contributing

🐛 Issues & Questions

GitHub Issues: Report bugs or ask questions
Email: Albert Ge for direct support

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
examples		examples
longdllm		longdllm
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
PACKAGE_INFO.md		PACKAGE_INFO.md
README.md		README.md
build_package.sh		build_package.sh
debug_installation.py		debug_installation.py
install_dev.sh		install_dev.sh
pyproject.toml		pyproject.toml
pyproject_minimal.toml		pyproject_minimal.toml
quick_test.py		quick_test.py
requirements.txt		requirements.txt
setup.py		setup.py
test_mock_validation.py		test_mock_validation.py
test_no_install.py		test_no_install.py
test_package.py		test_package.py
test_rope_duplication_fix.py		test_rope_duplication_fix.py
validate_package.py		validate_package.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LongDLLM

📦 Installation

🚀 Quick Start

DiffuCoder Example

LLaDA Example

💡 Examples

Running Examples

⚙️ Advanced Configuration

Custom Rescale Factors

License

Citation

🤝 Support & Contributing

🐛 Issues & Questions

About

Uh oh!

Releases

Packages

Languages

License

lbertge/longdllm

Folders and files

Latest commit

History

Repository files navigation

LongDLLM

📦 Installation

🚀 Quick Start

DiffuCoder Example

LLaDA Example

💡 Examples

Running Examples

⚙️ Advanced Configuration

Custom Rescale Factors

License

Citation

🤝 Support & Contributing

🐛 Issues & Questions

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages