Before running, please define EXEHOME, OUTPUTHOME, and DATAHOME accordingly in the script.
- e.g.,
EXEHOME=/home/username/SelfEval-Guided-Decoding/src DATAHOME=/home/username/SelfEval-Guided-Decoding/data OUTPUTHOME=/home/username/SelfEval-Guided-Decoding/outputs/${dtname}/${split}_outputs
We provide three types of example scripts as follows: (1) baseline running; (2) ours running; (3) LLM evaluating.
PS: please adjust the variables dtname and split to specify the dataset
(main code: src/generate_code_baseline.py)
-
arithmeticreasoning --run_baseline.sh -
symbolicreasoning --run_baseline_symbolic.sh -
commonsensereasoning --run_baseline_commonsense.sh
(main code: src/generate_code.py)
-
arithmeticreasoningGSM8K: Ours (PAL), Ours (CoT)AQUA: Ours (PAL)SVAMP: Ours (PAL)ASDiv: Ours (PAL)TabMWP: Ours (PAL)
-
symbolicreasoningDate Understanding: Ours (PAL)Object Counting: Ours (PAL)
-
commonsensereasoningCSQA: Ours (CoT)StrategyQA: Ours (CoT)Sports Understanding: Ours (CoT)
(main code: src/self_evaluate_code.py)