[WIP] Fix the transformers' error and update the score_logra and score_TRAK #214

DanielNi868 · 2025-11-09T20:54:36Z

Fix the transformers' error ( setting the attn_implementation = 'eager' )

Modify checkpoints_load_func for score_logra.py and score_TRAK.py to fix hugging face error

Update readme and comment put the vamp error for transformers

ssh_config_template.txt

TheaperDeng · 2025-11-10T02:09:03Z

experiments/gpt2_wikitext/score_TRAK_slurm.sh

+    --proj_dim 256 \
+    --proj_max_batch_size 8 \
+    --proj_type random_mask
+


remove this file

TheaperDeng · 2025-11-10T02:09:24Z

experiments/gpt2_wikitext/score_logra_slurm.sh

+    --output_dir ../checkpoints \
+    --block_size 512 \
+    --seed ${SEED}
+


remove this file

TheaperDeng · 2025-11-10T02:10:22Z

experiments/gpt2_wikitext/readme.md

+            trust_remote_code=args.trust_remote_code,
+            attn_implementation="eager",  # Use eager attention for better performance
+        )
+        model = model.cuda()


No need to have information of this troubleshotting since it is no longer an issue.

Just remove the troubleshotting message and no need to tell user how did the toolkit developer resolved the problem.

TheaperDeng · 2025-11-10T02:10:39Z

experiments/gpt2_wikitext/score_logra.py

            config=config,
            low_cpu_mem_usage=args.low_cpu_mem_usage,
            trust_remote_code=args.trust_remote_code,
+            attn_implementation="eager",  # Use eager attention for better performance


The only thing in this PR is to add this line in score_TRAK and score_logra.

If there are something else you need to change in score_logra and score_TRAK, please comment why they are needed in order to fix the transformer error regarding the vmap. Otherwise, we may keep them unchanged.

DanielNi868 · 2025-11-10T18:06:48Z

I deleted those files and updated readme.md

TheaperDeng · 2025-11-10T19:33:11Z

experiments/gpt2_wikitext/readme.md

+            trust_remote_code=args.trust_remote_code,
+            attn_implementation="eager",  # Use eager attention for better performance
+        )
+        model = model.cuda()


Just remove the troubleshotting message and no need to tell user how did the toolkit developer resolved the problem.

TheaperDeng · 2025-11-10T19:35:32Z

experiments/gpt2_wikitext/score_logra.py

            config=config,
            low_cpu_mem_usage=args.low_cpu_mem_usage,
            trust_remote_code=args.trust_remote_code,
+            attn_implementation="eager",  # Use eager attention for better performance


If there are something else you need to change in score_logra and score_TRAK, please comment why they are needed in order to fix the transformer error regarding the vmap. Otherwise, we may keep them unchanged.

DanielNi868 · 2025-11-13T06:33:13Z

I updated readme.md

in TRAK, the function f in main should keep the unsqueeze(0) added to avoid the dimension mismatch error
and I think this is needed to fix error

def f(params, batch):
"""
Log-odds objective for TRAK.
"""
input_ids, attention_mask, labels = batch

    # Re-add batch dimension removed by vmap
    input_ids = input_ids.unsqueeze(0).cuda()
    attention_mask = attention_mask.unsqueeze(0).cuda()
    labels = labels.unsqueeze(0).cuda()

    outputs = torch.func.functional_call(
        model,
        params,
        (input_ids,),  # Pass as tuple to avoid dimension issues
        kwargs={"attention_mask": attention_mask, "labels": labels},
    )
    logp = -outputs.loss
    return logp - torch.log(1 - torch.exp(logp))

DanielNi868 · 2025-11-29T02:52:35Z