Add LLaDA 8b Diffusion model #14771

am17an · 2025-07-19T09:50:27Z

Continuing on #14644, this PR adds another diffusion model https://huggingface.co/GSAI-ML/LLaDA-8B-Instruct, which has different semantics compared to the dream-7b model, and overall seems to have better performance

There are very few similarities between how they seem to generate tokens, so for now I've just created two different examples llama-diffusion-dream-cli (for the earlier version) and llama-diffusion-llada-cli, for running the new LLaDA model. Added a README as well

I've uploaded a GGUF.

Example command
./build/bin/llama-diffusion-llada-cli -m llada-8b.gguf -p "Lily can run 12 kilometers per hour for 4 hours. After that, she runs 6 kilometers per hour. How many kilometers can she run in 8 hours?" --diffusion_steps 128 -ngl 99 --temp 0 -ub 128 --diffusion-visual

Also I would like this to the server, but I'm not sure what API would be acceptable so I'm hoping to have a discussion on that as well

llama: fix llama-model fixup working

convert_hf_to_gguf.py

common/arg.cpp

examples/diffusion/README.md

CISC · 2025-07-20T17:24:30Z

convert_hf_to_gguf.py

+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+    def get_vocab_base(self) -> tuple[list[str], list[int], str]:


Suggested change

def __init__(self, *args, **kwargs):

super().__init__(*args, **kwargs)

def get_vocab_base(self) -> tuple[list[str], list[int], str]:

def get_vocab_base(self) -> tuple[list[str], list[int], str]:

CISC · 2025-07-20T17:25:15Z

convert_hf_to_gguf.py

+        except FileNotFoundError:
+            try:
+                self._set_vocab_llama_hf()
+            except (FileNotFoundError, TypeError):
+                # Llama 3
+                self._set_vocab_gpt2()


Suggested change

except FileNotFoundError:

try:

self._set_vocab_llama_hf()

except (FileNotFoundError, TypeError):

# Llama 3

self._set_vocab_gpt2()

except FileNotFoundError:

self._set_vocab_gpt2()

CISC · 2025-07-20T17:28:33Z

convert_hf_to_gguf.py

+        # Handle RoPE scaling similar to LlamaModel and Dream
+        rope_scaling = self.hparams.get("rope_scaling") or {}
+        if rope_scaling.get("rope_type", rope_scaling.get("type")) == "linear" and "factor" in rope_scaling:
+            self.gguf_writer.add_rope_scaling_type(gguf.RopeScalingType.LINEAR)
+            self.gguf_writer.add_rope_scaling_factor(rope_scaling["factor"])
+        elif rope_scaling.get("rope_type", rope_scaling.get("type")) == "yarn" and "factor" in rope_scaling:
+            self.gguf_writer.add_rope_scaling_type(gguf.RopeScalingType.YARN)
+            self.gguf_writer.add_rope_scaling_factor(rope_scaling["factor"])
+            self.gguf_writer.add_rope_scaling_orig_ctx_len(rope_scaling["original_max_position_embeddings"])


I don't think LLaDA supports this.

ggerganov · 2025-07-21T05:05:23Z

I would like to avoid adding a second diffusion example - we are increasing the maintenance efforts for not significant benefit. The diffusion architecture is not yet well established.

We can think about extending the llama_sampler functionality to support these use cases and since it is already modular it would make more sense to implement the sampling logic there. Ideally the diffusion CLI example would be just one for all diffusion models, with different samplers attached.

am17an · 2025-07-21T05:37:17Z

I would like to avoid adding a second diffusion example - we are increasing the maintenance efforts for not significant benefit. The diffusion architecture is not yet well established.

We can think about extending the llama_sampler functionality to support these use cases and since it is already modular it would make more sense to implement the sampling logic there. Ideally the diffusion CLI example would be just one for all diffusion models, with different samplers attached.

Yeah agree, I initially wrote them to be one example. However, passing arguments via CLI for two separate sets of sampling parameters/algorithms was quite confusing to me and would be even more so for the end-user, so for the sake of clarity I wrote them separately.
diffusion_generate_dream and diffusion_generate_llada are two different functions with the same outline, decode => sample => unmask, so there is an abstraction to be made, the only thing is to clarify is how we pass separate sets of parameters to the example without overloading the same thing (e.g. --diffusion-algorithm being supported in dream but not llada and vice versa), llama_sampler be used also, but I don't see how it would solve this particular problem

github-actions bot added examples python python script changes labels Jul 19, 2025

am17an force-pushed the add_llada_8b branch from e362c14 to 87b3235 Compare July 19, 2025 10:05

am17an requested a review from ggerganov July 19, 2025 10:06

Add support for Llada-8b: diffusion model

d81ce04

llama: fix llama-model fixup working

am17an force-pushed the add_llada_8b branch from 87b3235 to d27740c Compare July 19, 2025 10:17

am17an requested a review from CISC July 19, 2025 11:05

am17an force-pushed the add_llada_8b branch 2 times, most recently from 77fc759 to e4b7346 Compare July 19, 2025 14:41

Add README

5644f2f

am17an force-pushed the add_llada_8b branch from e4b7346 to 5644f2f Compare July 19, 2025 14:59

CISC reviewed Jul 19, 2025

View reviewed changes

Fix README and convert_hf_to_gguf

6317827

am17an force-pushed the add_llada_8b branch from 7a5747d to 6317827 Compare July 20, 2025 02:45

CISC reviewed Jul 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add LLaDA 8b Diffusion model #14771

Add LLaDA 8b Diffusion model #14771

am17an commented Jul 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CISC Jul 20, 2025

Uh oh!

CISC Jul 20, 2025

Uh oh!

CISC Jul 20, 2025

Uh oh!

ggerganov commented Jul 21, 2025

Uh oh!

am17an commented Jul 21, 2025

Uh oh!

Uh oh!

Add LLaDA 8b Diffusion model #14771

Are you sure you want to change the base?

Add LLaDA 8b Diffusion model #14771

Conversation

am17an commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CISC Jul 20, 2025

Choose a reason for hiding this comment

Uh oh!

CISC Jul 20, 2025

Choose a reason for hiding this comment

Uh oh!

CISC Jul 20, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov commented Jul 21, 2025

Uh oh!

am17an commented Jul 21, 2025

Uh oh!

Uh oh!

am17an commented Jul 19, 2025 •

edited

Loading