I implement slot in vllm in this way: train model to get and save $\delta$ that matches my system prompt, and then load $\delta$ in vllm.model_runner. I am sure slot is implemented correctly, but accuracy drops in my downstream task (evaluate factuality). I tried different hyper-parameters, none of which improves the accuracy, and so now I doubt the generality of this method. Here is my modified model_runner.py:
#### SLOT Begin Here
if not hasattr(self, 'ptuning_params'):
print("Initializing Delta for SLOT")
import os, json
delta_path = os.environ.get("delta_path", "code/slot/saved_delta/systemprompt_v4_5_0.1.json")
tensor_parallel_size_my = int(os.environ.get("tensor_parallel_size_my", 1))
with open(delta_path, "r") as f:
delta = json.load(f)["delta"] # shape
delta = torch.tensor(delta[0], device=sample_hidden_states.device, dtype=sample_hidden_states.dtype)/tensor_parallel_size_my
self.ptuning_params = delta
assert delta.shape == sample_hidden_states.shape, f"Delta Shape Mismatch!!! delta.shape: {delta.shape}, sample_hidden_states.shape: {sample_hidden_states.shape}"
print("Initializing END!")
logits = self.model.compute_logits(sample_hidden_states+self.ptuning_params, None)
#### SLOT End Here
I implement slot in vllm in this way: train model to get and save$\delta$ that matches my system prompt, and then load $\delta$ in vllm.model_runner. I am sure slot is implemented correctly, but accuracy drops in my downstream task (evaluate factuality). I tried different hyper-parameters, none of which improves the accuracy, and so now I doubt the generality of this method. Here is my modified model_runner.py: