[Bug] The trained dflash has an extremely low acceptance rate

### Checklist

- [ ] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [ ] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/SpecForge/discussions/new/choose Otherwise, it will be closed.
- [ ] 5. Please use English, otherwise it will be closed.

### Describe the bug

I cloned the latest SpecForge codebase and noticed that it now supports training with DFlash. Based on this, I launched a DFlash training job using the script below.

During training, both **loss** and **accuracy** behaved normally. On the evaluation set, there was only mild overfitting, which did not seem significant. However, when I loaded the trained weights into the **official DFlash benchmark script** on gsm8k dataset , (https://github.com/z-lab/dflash/blob/main/run_benchmark.sh), I observed an **acceptance rate of only 1.29/(1+3)**, which is extremely low. This suggests that the training has effectively failed.

I would like to ask whether anyone has **successfully trained and inferred a DFlash model**. Any discussion or help in locating the root cause would be greatly appreciated.

For context, I have previously **successfully trained an Eagle3 model**, which indicates that my data preprocessing, training pipeline, and evaluation setup should generally be correct.

Below is the training script I used:

```bash
#!/bin/bash

SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
ROOT_DIR=$(dirname $SCRIPT_DIR)
export TORCHINDUCTOR_CACHE_DIR=$ROOT_DIR/cache/compiled_kernels
export SPECFORGE_DATA_NUM_PROC=32

NUM_GPUS=16

ATTENTION_BACKEND=sdpa

torchrun \
    --standalone \
    --nproc_per_node $NUM_GPUS \
    $ROOT_DIR/scripts/train_dflash.py \
    --target-model-path /data/weights/qwen3-8b \
    --draft-config-path $ROOT_DIR/configs/qwen3-8b-dflash.json \
    --train-data-path $ROOT_DIR/cache/dataset/perfectblend_train.jsonl \
    --output-dir $ROOT_DIR/outputs/qwen3-8b-dflash-perfectblend-baseline \
    --num-epochs 15 \
    --batch-size 1 \
    --learning-rate 1e-4 \
    --max-length 2048 \
    --chat-template qwen \
    --attention-backend $ATTENTION_BACKEND \
    --log-interval 100 \
    --eval-interval 5000 \
    --save-interval 10000 \
    --eval-data-path $ROOT_DIR/cache/dataset/opc_test.jsonl \
    --cache-dir $ROOT_DIR/cache \
    --report-to tensorboard \
    --target-model-backend sglang \
    --resume
```

Looking forward to any insights or suggestions. Thanks!

Best regards,
BAI Fan

### Reproduction

Below is the training script I used:

```bash
#!/bin/bash

SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
ROOT_DIR=$(dirname $SCRIPT_DIR)
export TORCHINDUCTOR_CACHE_DIR=$ROOT_DIR/cache/compiled_kernels
export SPECFORGE_DATA_NUM_PROC=32

NUM_GPUS=16

ATTENTION_BACKEND=sdpa

torchrun \
    --standalone \
    --nproc_per_node $NUM_GPUS \
    $ROOT_DIR/scripts/train_dflash.py \
    --target-model-path /data/weights/qwen3-8b \
    --draft-config-path $ROOT_DIR/configs/qwen3-8b-dflash.json \
    --train-data-path $ROOT_DIR/cache/dataset/perfectblend_train.jsonl \
    --output-dir $ROOT_DIR/outputs/qwen3-8b-dflash-perfectblend-baseline \
    --num-epochs 15 \
    --batch-size 1 \
    --learning-rate 1e-4 \
    --max-length 2048 \
    --chat-template qwen \
    --attention-backend $ATTENTION_BACKEND \
    --log-interval 100 \
    --eval-interval 5000 \
    --save-interval 10000 \
    --eval-data-path $ROOT_DIR/cache/dataset/opc_test.jsonl \
    --cache-dir $ROOT_DIR/cache \
    --report-to tensorboard \
    --target-model-backend sglang \
    --resume
```

### Environment

SpecForge-main

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] The trained dflash has an extremely low acceptance rate #455

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] The trained dflash has an extremely low acceptance rate #455

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions