-
Notifications
You must be signed in to change notification settings - Fork 122
PDU, Llama 2 13B, MUSE finetuning #121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added llm_judge and all the required files and configs required for it Added multi_loss base trainer and made corresponding changes in other trainers. Changed linear scalarization parameter input format to a list, allowing more losses to be added in a simple fashion. Made appropriate changes to all trainers and also documentation to reflect this. Added finetuning for MUSE with the corresponding dataset Added Llama 2 13B model. We provide checkpoints for this model on huggingface.co/tamarsonha
|
Hi dear Dorna and OU team, I hope you're doing well. Thank you so much for your excellent repository; it has been incredibly helpful for us (the PDU team) in implementing and evaluating our method. I truly appreciate your time and effort. I wanted to kindly ask if you could review our code and pull request, provide any feedback, and possibly merge it into your repository. We believe that PDU offers a new and interesting perspective on this problem, and we're excited to see more people engage with and use our method to make a meaningful impact. Thank you once again for your amazing repository! |
|
Hi, we've seen the PR, but didn't get a chance to read it carefully. Both of us maintainers just graduated, so it's been busy for us. Things might be slower as we find and hand over to newer maintainers, but we will try to get to reviewing PDU and get it in. If you have any cleanup or code structure improvements you can make, please do those in the meantime, as that would make it easier for us. Thank you very much for your contribution! |
|
There are 40 files changes, and in particular I see that there are changes to many files unrelated to your PR as well, as you seem to have done some standardization. Have you checked all the changed points and tested enough of them to know this won't break things? We don't have unit tests in the code and can't run things ourselves right now so we are relying on contributors verifying all their changes. |
configs/eval/llm_judge.yaml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(unless i'm missing something) this is a metric and is implemented in evals, which must only contain evaluators. look at this example from the same folder: configs/eval/muse.yaml shows an example of what an evaluator is -- it's an evaluation suite having several individual metrics. the yaml config for a llm judge handler must be in the metrics folder of the eval suite it belongs to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LLM judge is, in principle, an evaluator, not just a metric. The LLM judge performs its evaluation based on some metrics like accuracy, forgetting success, etc. As such, I do not think it should be a metric.
In the current implementation, the metrics are fixed. However, in a future release, it could be made more modular and contain some other metrics. I unfortunately don't have the time right now to make changes regarding this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depends on resolution of other comments
configs/trainer/MultiLoss.yaml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems you are contributing multiloss as like a new base trainer for most unlearning methods. that's not ideal because there are other open PRs and other users who are using the old structure.
- it seems the multiloss mentions primal dual too many times to be a base trainer for other methods, esp ones as simple as graddiff.
if you make this structural change, it would be better if the implementation were much cleaner: with a general base trainer and a more specific primal-dual class that modifies it further and also ensure consistency with other PRs
or just simply leave other methods as they are and add PDU separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont think we need to add new arguments in default trainingargs. you can add new arguments using + and don't need the attribute to have already been in the config. https://github.com/locuslab/open-unlearning/blob/main/docs/hydra.md has this documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am aware of the + new arguments. I just felt like this is something that is important and good to have by default in the default parameters. I'll leave the final decision regarding this to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we revert this? If people sets eval_strategy=steps and forgets to set eval_steps. This will evaluate on each step. Other option is you can set it eval_steps: 1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depends on resolution of other comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to me this should be a metric. @Dornavineeth thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depends on resolution of other comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is a defaultprompt_generator file with a generic create_prompt, it shouldn't be this specific. also prompts might be better given through the config files than written here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better we not give the prompts though configs because there are few characters say """ will be tough to parse in hydra.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tahaEntesari better we should rename create_prompt name to more specific name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refer prior comment on multiloss yaml
|
You may want to link your paper here: https://github.com/locuslab/open-unlearning/blob/main/docs/links.md |
|
Thank you for citing our work in your paper. We've just updated the README recently with a modified citation to our paper on OpenUnlearning. It would be great if you can change the citation to point to that instead! |
|
I will revert my changes regarding the multi_loss file. |
Added llm_judge and all the required files and configs required for it Added multi_loss base trainer and made corresponding changes in other trainers. Changed linear scalarization parameter input format to a list, allowing more losses to be added in a simple fashion. Made appropriate changes to all trainers and also documentation to reflect this. Added finetuning for MUSE with the corresponding dataset Added Llama 2 13B model. We provide checkpoints for this model on huggingface.co/tamarsonha
# Conflicts: # community/methods/PDU/README.md # community/methods/PDU/run.sh # configs/trainer/PDU.yaml # src/trainer/unlearn/pdu.py
|
Can decide how to go about things on the basis of @Dornavineeth's review. |
|
Hi dear Dorna and Anmol, I hope you are doing well. Looks like Taha fixed the problems of the previous pull request. Is there anything else to do? Please let us know if something is missing so we can address it quickly. Thank you so much for your consideration. |
|
Sorry. This has been delayed from my end. Will soon go over this PR. |
|
No problem at all. Thank you so much for your attention! |
|
@tahaEntesari Thank you for the considerable work you've put into this. A few thoughts after reviewing: Trainer/MethodComponents look excellent, really well done. EvalRegarding the llm_judge, I wonder if it might be cleaner to structure it as a metric instead. I understand your reasoning - framing it as a "judge" allows for multiple metrics under one umbrella, similar to how changing prompts leads to different evaluations. That said, this is also true for metrics like forget_QA_ROUGE and retain_QA_ROUGE, which share an implementation but are defined distinctly in the config. When implementing MIA, we faced a similar design choice and leaned toward organizing metrics this way. You can implement in a similar way. More specifically a) Move this into src/evals/metrics/llm_judge/ One other note, I noticed the current evaluator takes in result files (like TOFU_EVAL.json) as arguments. This might restrict flexibility for users who work with different datasets and can't expect them to be in TOFU_EVAL.json files. Ideally, the evaluation interface should accept inputs directly rather than rely on file outputs. Given the scope of changes around llm_judge and your availability, it makes sense to land the training parts first and revisit the evals piece in a separate PR. Can you remove the llm_judge in this PR and make a new PR including evals? I know it’s a tedious, sorry to ask, but this would really help make the system more modular and makes it easer for the other people.
Thanks again for all your effort on this! |
|
@Dornavineeth Thank you for your review and comments. |
|
All LLM judge files have been removed from this PR. I also reverted the fine-tune config to its original format. |
|
Wow. This is super quick! Can you please resolve the conflicts (these should be minor)? |
|
Fixed the conflicts. |
Dornavineeth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great Work!
|
@tahaEntesari Sounds good. You can open a [WIP] PR for llm_judge so the community can pick it up if interested. I’ve seen folks interested in implementing metrics via prompting LLMs using APIs; this might be a useful starting point for them. |
What does this PR do?
Addresses #118
This PR adds several new features:
Before submitting