PDU, Llama 2 13B, MUSE finetuning #121

tahaEntesari · 2025-06-06T16:10:05Z

What does this PR do?

Addresses #118
This PR adds several new features:

We add a new multi_loss base trainer class that all multi-loss trainers extend. To simplify the linear scalarization of multiple losses, we aggregate the preferences into a single input trainer.method_args.preferences that takes a list as input, instead of having the different gamma and alpha inputs.
We add our new PDU trainer. We provide unlearned models using this method on huggingface.co/tamarsonha. For the details of the unlearning see the paper.
We add a new evaluation script to use LLMs as a judge for evaluation.
We add a new Llama 2 13B model and provide pretrained checkpoints for the MUSE dataset on huggingface.co/tamarsonha. The models are pretrained for 10 epochs.
We add new fine-tuning configurations for the MUSE dataset. To streamline the fine-tuning we provide the merged dataset needed for fine-tuning MUSE on huggingface.co/tamarsonha
To address the changes that was made by the base multi_loss trainer, we performed searches based on the gamma and alpha key words and performed necessary changes in the documentation of the repo
We add documentation on PDU in the community folder.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Have you gone through the contributions guide?
Are your changes documented? Read documentation guidelines here.

Added llm_judge and all the required files and configs required for it Added multi_loss base trainer and made corresponding changes in other trainers. Changed linear scalarization parameter input format to a list, allowing more losses to be added in a simple fashion. Made appropriate changes to all trainers and also documentation to reflect this. Added finetuning for MUSE with the corresponding dataset Added Llama 2 13B model. We provide checkpoints for this model on huggingface.co/tamarsonha

armanhtm · 2025-06-20T16:32:40Z

Hi dear Dorna and OU team,

I hope you're doing well. Thank you so much for your excellent repository; it has been incredibly helpful for us (the PDU team) in implementing and evaluating our method. I truly appreciate your time and effort.

I wanted to kindly ask if you could review our code and pull request, provide any feedback, and possibly merge it into your repository. We believe that PDU offers a new and interesting perspective on this problem, and we're excited to see more people engage with and use our method to make a meaningful impact.

Thank you once again for your amazing repository!

molereddy · 2025-06-21T07:10:27Z

Hi, we've seen the PR, but didn't get a chance to read it carefully. Both of us maintainers just graduated, so it's been busy for us. Things might be slower as we find and hand over to newer maintainers, but we will try to get to reviewing PDU and get it in. If you have any cleanup or code structure improvements you can make, please do those in the meantime, as that would make it easier for us. Thank you very much for your contribution!

molereddy · 2025-06-21T07:14:37Z

There are 40 files changes, and in particular I see that there are changes to many files unrelated to your PR as well, as you seem to have done some standardization. Have you checked all the changed points and tested enough of them to know this won't break things? We don't have unit tests in the code and can't run things ourselves right now so we are relying on contributors verifying all their changes.

community/methods/template/run.sh

molereddy · 2025-06-21T07:25:42Z

configs/eval/llm_judge.yaml

(unless i'm missing something) this is a metric and is implemented in evals, which must only contain evaluators. look at this example from the same folder: configs/eval/muse.yaml shows an example of what an evaluator is -- it's an evaluation suite having several individual metrics. the yaml config for a llm judge handler must be in the metrics folder of the eval suite it belongs to.

The LLM judge is, in principle, an evaluator, not just a metric. The LLM judge performs its evaluation based on some metrics like accuracy, forgetting success, etc. As such, I do not think it should be a metric.
In the current implementation, the metrics are fixed. However, in a future release, it could be made more modular and contain some other metrics. I unfortunately don't have the time right now to make changes regarding this.

configs/experiment/eval/tofu/llm_judge.yaml

configs/experiment/eval/muse/llm_judge.yaml

molereddy · 2025-06-21T07:29:55Z

configs/trainer/GradDiff.yaml

depends on resolution of other comments

molereddy · 2025-06-21T07:34:47Z

configs/trainer/MultiLoss.yaml

it seems you are contributing multiloss as like a new base trainer for most unlearning methods. that's not ideal because there are other open PRs and other users who are using the old structure.

it seems the multiloss mentions primal dual too many times to be a base trainer for other methods, esp ones as simple as graddiff.

if you make this structural change, it would be better if the implementation were much cleaner: with a general base trainer and a more specific primal-dual class that modifies it further and also ensure consistency with other PRs

or just simply leave other methods as they are and add PDU separately.

molereddy · 2025-06-21T07:39:06Z

configs/trainer/finetune.yaml

dont think we need to add new arguments in default trainingargs. you can add new arguments using + and don't need the attribute to have already been in the config. https://github.com/locuslab/open-unlearning/blob/main/docs/hydra.md has this documented.

I am aware of the + new arguments. I just felt like this is something that is important and good to have by default in the default parameters. I'll leave the final decision regarding this to you.

Can we revert this? If people sets eval_strategy=steps and forgets to set eval_steps. This will evaluate on each step. Other option is you can set it eval_steps: 1.0

molereddy · 2025-06-21T07:43:28Z

docs/components.md

depends on resolution of other comments

molereddy · 2025-06-21T07:44:47Z

docs/evaluation.md

seems to me this should be a metric. @Dornavineeth thoughts?

molereddy · 2025-06-21T07:45:12Z

docs/repro.md

depends on resolution of other comments

molereddy · 2025-06-21T07:56:23Z

src/evals/metrics/default_prompt_generator.py

if this is a defaultprompt_generator file with a generic create_prompt, it shouldn't be this specific. also prompts might be better given through the config files than written here

Better we not give the prompts though configs because there are few characters say """ will be tough to parse in hydra.

@tahaEntesari better we should rename create_prompt name to more specific name.

molereddy · 2025-06-21T07:56:55Z

src/trainer/unlearn/multi_loss_base.py

refer prior comment on multiloss yaml

molereddy · 2025-06-22T18:26:56Z

You may want to link your paper here: https://github.com/locuslab/open-unlearning/blob/main/docs/links.md

molereddy · 2025-06-22T18:35:05Z

Thank you for citing our work in your paper. We've just updated the README recently with a modified citation to our paper on OpenUnlearning. It would be great if you can change the citation to point to that instead!

tahaEntesari · 2025-07-04T21:28:26Z

I will revert my changes regarding the multi_loss file.
I had been very busy and hadn't been able to check this thread. I'll try to make the required changes soon.

Added llm_judge and all the required files and configs required for it Added multi_loss base trainer and made corresponding changes in other trainers. Changed linear scalarization parameter input format to a list, allowing more losses to be added in a simple fashion. Made appropriate changes to all trainers and also documentation to reflect this. Added finetuning for MUSE with the corresponding dataset Added Llama 2 13B model. We provide checkpoints for this model on huggingface.co/tamarsonha

# Conflicts: # community/methods/PDU/README.md # community/methods/PDU/run.sh # configs/trainer/PDU.yaml # src/trainer/unlearn/pdu.py

molereddy · 2025-07-07T06:18:23Z

Can decide how to go about things on the basis of @Dornavineeth's review.

armanhtm · 2025-07-17T20:59:56Z

Hi dear Dorna and Anmol,

I hope you are doing well. Looks like Taha fixed the problems of the previous pull request. Is there anything else to do? Please let us know if something is missing so we can address it quickly. Thank you so much for your consideration.

Dornavineeth · 2025-07-17T21:01:36Z

Sorry. This has been delayed from my end. Will soon go over this PR.

armanhtm · 2025-07-17T21:03:55Z

No problem at all. Thank you so much for your attention!

Dornavineeth · 2025-07-19T17:27:06Z

@tahaEntesari Thank you for the considerable work you've put into this. A few thoughts after reviewing:

Trainer/Method

Components look excellent, really well done.

Eval

Regarding the llm_judge, I wonder if it might be cleaner to structure it as a metric instead. I understand your reasoning - framing it as a "judge" allows for multiple metrics under one umbrella, similar to how changing prompts leads to different evaluations. That said, this is also true for metrics like forget_QA_ROUGE and retain_QA_ROUGE, which share an implementation but are defined distinctly in the config.

When implementing MIA, we faced a similar design choice and leaned toward organizing metrics this way. You can implement in a similar way. More specifically

a) Move this into src/evals/metrics/llm_judge/
b) Create a prompts/ subfolder, so it's easy for others to contribute and extend
c) Register prompt variants explicitly in src/evals/metrics/llm_judge/init.py

One other note, I noticed the current evaluator takes in result files (like TOFU_EVAL.json) as arguments. This might restrict flexibility for users who work with different datasets and can't expect them to be in TOFU_EVAL.json files. Ideally, the evaluation interface should accept inputs directly rather than rely on file outputs.

Given the scope of changes around llm_judge and your availability, it makes sense to land the training parts first and revisit the evals piece in a separate PR. Can you remove the llm_judge in this PR and make a new PR including evals? I know it’s a tedious, sorry to ask, but this would really help make the system more modular and makes it easer for the other people.

PRO TIP: Always better to have n number of smaller PRs as it is easy to iterate and review.

Thanks again for all your effort on this!

tahaEntesari · 2025-07-19T19:50:51Z

@Dornavineeth Thank you for your review and comments.
Ok. Let me remove the LLM judge components and let you know.

tahaEntesari · 2025-07-19T20:06:51Z

All LLM judge files have been removed from this PR. I also reverted the fine-tune config to its original format.

Dornavineeth · 2025-07-20T02:13:23Z

Wow. This is super quick! Can you please resolve the conflicts (these should be minor)?
We can merge it into main after this. Also can you open a new PR with the llm_judge as judge, we can iterate on it.

tahaEntesari · 2025-07-20T03:22:44Z

Fixed the conflicts.
I will need to work on the llm judge. I don't have the time right now unfortunately. Once a bit more settled, I can start looking into it and will take what you have said about implementing it as a metric into consideration.

Dornavineeth

Great Work!

Dornavineeth · 2025-07-20T17:18:15Z

@tahaEntesari Sounds good. You can open a [WIP] PR for llm_judge so the community can pick it up if interested. I’ve seen folks interested in implementing metrics via prompting LLMs using APIs; this might be a useful starting point for them.

tahaEntesari and others added 2 commits May 30, 2025 16:35

edited READMEs

d32bc85

tahaEntesari temporarily deployed to tests June 6, 2025 16:10 — with GitHub Actions Inactive

Dornavineeth requested review from Dornavineeth and molereddy and removed request for molereddy June 9, 2025 05:10