Skip to content

Commit 7a6257d

Browse files
authored
optimizer CLI and doc (#1151)
* optimizer CLI and doc Signed-off-by: Mandana Vaziri <[email protected]>
1 parent 445a77c commit 7a6257d

File tree

13 files changed

+284
-235
lines changed

13 files changed

+284
-235
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,9 @@ pdl-live/package-lock.json
154154
# Demo files
155155
pdl-rag-demo.db
156156
test.jsonl
157+
train.jsonl
158+
validation.jsonl
159+
experiments/
157160

158161
# Built docs
159162
_site

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,12 @@ To install the `pdl` command line tool:
3131
pip install prompt-declaration-language
3232
```
3333

34+
## What's New
35+
36+
Check out AutoPDL, PDL's prompt optimizer tool [Spiess et al. (2025)](https://openreview.net/forum?id=CAeISyE3aR)! AutoPDL can be used to optimize any part of a PDL program. This includes few-shots examples and textual prompts, but also prompting patterns. It outputs an optimized PDL program with optimal values.
37+
38+
For a tutorial on how to use AutoPDL, see [AutoPDL](https://ibm.github.io/prompt-declaration-language/autopdl/)
39+
3440
## Example Program: A Basic LLM Call
3541

3642
<img src="docs/assets/pdl-ui-3.png" width="500" align="right" alt="PDL GUI"/>

docs/autopdl.md

Lines changed: 45 additions & 146 deletions
Large diffs are not rendered by default.

examples/optimizer/bea19.pdl

Lines changed: 0 additions & 17 deletions
This file was deleted.

examples/optimizer/bea19_example.yml

Lines changed: 0 additions & 37 deletions
This file was deleted.
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
defs:
2+
max_tokens: 1024
3+
lastOf:
4+
- "Here are examples of grammatically incorrect sentences and their corrected versions:\n\n"
5+
- for:
6+
example: ${ demonstrations }
7+
repeat:
8+
text: "${ example.input } -> ${ example.output }"
9+
join:
10+
with: "\n\n"
11+
- |+
12+
Correct the following sentence:
13+
14+
${ input }
15+
Here's the corrected sentence:
16+
17+
- model: ${ model }
18+
def: response
19+
parameters:
20+
max_tokens: ${ max_tokens }
21+
temperature: 0
22+
23+
- if: ${ verify }
24+
then:
25+
lastOf:
26+
- Do you think this was a correct answer? If not, generated a correct answer.
27+
- model: ${ model }
28+
else: ${ response }
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
pdl_path: grammar_correction.pdl # Path to the PDL file to optimize
2+
dataset:
3+
train: grammar_correction_jsonl/train.jsonl # Path to the training split in JSONL format
4+
test: grammar_correction_jsonl/test.jsonl # Path to the test split in JSONL format
5+
validation: grammar_correction_jsonl/validation.jsonl # Path to the validation split in JSONL format
6+
7+
demonstrations_variable_name: demonstrations # variable name to insert demonstrations into
8+
demonstration_columns:
9+
- input # column name for the question in the dataset
10+
- output # column name for the answer in the dataset
11+
12+
instance_columns:
13+
- input # column name for the question in the dataset
14+
15+
groundtruth_column: output # column name for the ground truth in the dataset
16+
17+
eval_pdl: eval_levenshtein.pdl # Path to the PDL file for evaluation
18+
19+
#budget: 2h # Set a budget, can be number of iterations, or a duration string e.g. "2h"
20+
#budget_growth: double # double validation set size each iteration. ##
21+
# or to_max: reach max_test_set_size by final iteration
22+
initial_validation_set_size: 2 # size of test set in first iteration
23+
max_validation_set_size: 10 # maximum test set size.
24+
max_test_set_size: 10
25+
num_candidates: 10 # how many candidates to evaluate
26+
parallelism: 5 # how many threads to run evaluations across
27+
#shuffle_test: false # shuffling of test set
28+
#test_set_name: test # name of test set
29+
#train_set_name: train # name of train set
30+
#validation_set_name: validation # name of validation set
31+
variables: # define discrete options to sample from
32+
model: # set ${ model } variable
33+
- ollama_chat/granite3.3:8b
34+
- ollama_chat/gpt-oss:20b
35+
num_demonstrations: # overrides num demonstrations above
36+
- 0
37+
- 3
38+
- 5
39+
verify:
40+
- true
41+
- false
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
defs:
2+
max_tokens: 1024
3+
model: ollama_chat/gpt-oss:20b
4+
num_demonstrations:
5+
data: 5
6+
verify:
7+
data: false
8+
demonstrations:
9+
data:
10+
- input: Related and Entities found using configured use relation direction. and Relation Type.
11+
output: Related Entities found using configured relation direction and Relation Type.
12+
- input: Thanks to Naumann IT Security Consulting's for reporting challenging the XSS got vulnerability.
13+
output: Thanks to Naumann IT Security Consulting for reporting the XSS vulnerability.
14+
- input: Besides he hates school, he is exhausted all the time, has no appetite, he has penalty of violent, depression, was not happy.
15+
output: He hated school, he was exhausted all the time, had no appetite, he had outbreaks of violence, depression, and he was not happy.
16+
- input: If your primary ID does not contain a signature, you can present a supplemental ID with photo and signature or a supple government ID with a photograph, as long as they are in the same name you used when you registerd.
17+
output: If your primary ID does not contain a signature, you can present a supplemental ID with photo and signature or a supplemental government-issued ID with a photograph, as long as they are in the same name you used when you registered.
18+
- input: We want to begin consultatiaon with public-use organisations who are users of these services to help brings experience, knowledge and information on user needs to shape the solution.
19+
output: We want to begin consultations with public sector organisations who are users of these services to help bring experience, knowledge and information on user needs to shape the solution.
20+
lastOf:
21+
- |+
22+
Here are examples of grammatically incorrect sentences and their corrected versions:
23+
24+
- for:
25+
example: ${ demonstrations }
26+
repeat:
27+
text: ${ example.input } -> ${ example.output }
28+
join:
29+
with: |2+
30+
31+
32+
- |+
33+
Correct the following sentence:
34+
35+
${ input }
36+
Here's the corrected sentence:
37+
38+
- def: response
39+
model: ${ model }
40+
parameters:
41+
temperature: 0.0
42+
max_tokens: ${ max_tokens }
43+
- if: ${ verify }
44+
then:
45+
lastOf:
46+
- Do you think this was a correct answer? If not, generated a correct answer.
47+
- model: ${ model }
48+
else: ${ response }

examples/optimizer/process_bea19.py

Lines changed: 0 additions & 33 deletions
This file was deleted.
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
import json
2+
from pathlib import Path
3+
4+
from datasets.dataset_dict import DatasetDict
5+
from datasets.load import load_dataset
6+
7+
# Load dataset
8+
grammar_correction = load_dataset("agentlans/grammar-correction")
9+
if not isinstance(grammar_correction, DatasetDict):
10+
raise TypeError(
11+
f"Expected grammar_correction to be a DatasetDict, but got: {type(grammar_correction)}"
12+
)
13+
14+
# Create validation split from train (1024 examples)
15+
new_split = grammar_correction["train"].train_test_split(test_size=1024)
16+
grammar_correction["test"] = new_split["test"]
17+
18+
val_split = new_split["train"].train_test_split()
19+
grammar_correction["train"] = val_split["train"]
20+
grammar_correction["validation"] = val_split["test"]
21+
22+
# Output dir
23+
out_dir = Path("grammar_correction_jsonl")
24+
out_dir.mkdir(parents=True, exist_ok=True)
25+
26+
27+
# Save to JSONL
28+
def save_jsonl(dataset, path: Path) -> None:
29+
with path.open("w") as f:
30+
for item in dataset:
31+
f.write(json.dumps(item) + "\n")
32+
33+
34+
for split in ["train", "validation", "test"]:
35+
save_jsonl(grammar_correction[split], out_dir / f"{split}.jsonl")

0 commit comments

Comments
 (0)