Skip to content

Commit eef2a29

Browse files
committed
Update evaluation instructions, improve dataset generation section, and clarify external tools
1 parent d1dd75c commit eef2a29

File tree

1 file changed

+19
-26
lines changed

1 file changed

+19
-26
lines changed

evaluation/README.md

Lines changed: 19 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,12 @@
22

33
## `evals`: LLM evaluations to test and improve model outputs
44

5-
### Metrics
6-
7-
[Extractiveness](https://huggingface.co/docs/lighteval/en/metric-list#automatic-metrics-for-generative-tasks):
5+
### Evaluation Metrics
86

97
Natural Language Generation Performance:
108

9+
[Extractiveness](https://huggingface.co/docs/lighteval/en/metric-list#automatic-metrics-for-generative-tasks):
10+
1111
* Extractiveness Coverage:
1212
- Percentage of words in the summary that are part of an extractive fragment with the article
1313
* Extractiveness Density:
@@ -23,7 +23,7 @@ API Performance:
2323

2424
### Test Data
2525

26-
Generate the dataset file by connecting to a database of references
26+
Generate the dataset file by connecting to a database of research papers:
2727

2828
Connect to the Postgres database of your local Balancer instance:
2929

@@ -36,72 +36,63 @@ engine = create_engine("postgresql+psycopg2://balancer:balancer@localhost:5433/b
3636
Connect to the Postgres database of the production Balancer instance using a SQL file:
3737

3838
```
39-
# Install Postgres.app and add binaries to the PATH
39+
# Add Postgres.app binaries to the PATH
4040
echo 'export PATH="/Applications/Postgres.app/Contents/Versions/latest/bin:$PATH"' >> ~/.zshrc
4141
4242
createdb <DB_NAME>
4343
pg_restore -v -d <DB_NAME> <PATH_TO_BACKUP>.sql
4444
```
4545

46-
```
47-
from sqlalchemy import create_engine
48-
engine = create_engine("postgresql://<USER>@localhost:5432/<DB_NAME>")
49-
```
50-
5146
Generate the dataset CSV file:
5247

5348
```
49+
from sqlalchemy import create_engine
5450
import pandas as pd
5551
52+
engine = create_engine("postgresql://<USER>@localhost:5432/<DB_NAME>")
53+
5654
query = "SELECT * FROM api_embeddings;"
5755
df = pd.read_sql(query, engine)
5856
59-
df['formatted_chunk'] = df.apply(lambda row: f"ID: {row['chunk_number']} | CONTENT: {row['text']}", axis=1)
57+
df['INPUT'] = df.apply(lambda row: f"ID: {row['chunk_number']} | CONTENT: {row['text']}", axis=1)
6058
6159
# Ensure the chunks are joined in order of chunk_number by sorting the DataFrame before grouping and joining
6260
df = df.sort_values(by=['name', 'upload_file_id', 'chunk_number'])
63-
df_grouped = df.groupby(['name', 'upload_file_id'])['formatted_chunk'].apply(lambda chunks: "\n".join(chunks)).reset_index()
61+
df_grouped = df.groupby(['name', 'upload_file_id'])['INPUT'].apply(lambda chunks: "\n".join(chunks)).reset_index()
6462
65-
df_grouped = df_grouped.rename(columns={'formatted_chunk': 'concatenated_chunks'})
6663
df_grouped.to_csv('<DATASET_CSV_PATH>', index=False)
6764
```
6865

69-
7066
### Running an Evaluation
7167

72-
#### Test Input: Bulk model and prompt experimentation
68+
#### Bulk Model and Prompt Experimentation
7369

7470
Compare the results of many different prompts and models at once
7571

7672
```
7773
import pandas as pd
7874
79-
# Define the data
8075
data = [
81-
8276
{
83-
"Model Name": "<MODEL_NAME_1>",
84-
"Query": """<YOUR_QUERY_1>"""
77+
"MODEL": "<MODEL_NAME_1>",
78+
"INSTRUCTIONS": """<YOUR_QUERY_1>"""
8579
},
86-
8780
{
88-
"Model Name": "<MODEL_NAME_2>",
89-
"Query": """<YOUR_QUERY_2>"""
81+
"MODEL": "<MODEL_NAME_2>",
82+
"INSTRUCTIONS": """<YOUR_QUERY_2>"""
9083
},
9184
]
9285
93-
# Create DataFrame from records
9486
df = pd.DataFrame.from_records(data)
9587
96-
# Write to CSV
9788
df.to_csv("<EXPERIMENTS_CSV_PATH>", index=False)
9889
```
9990

10091

101-
#### Execute on the command line
92+
#### Execute on the Command Line
10293

10394

104-
Execute [using `uv` to manage depenendices](https://docs.astral.sh/uv/guides/scripts/) without manually managing enviornments:
95+
Execute [using `uv` to manage dependencies](https://docs.astral.sh/uv/guides/scripts/) without manually managing enviornments:
10596

10697
```sh
10798
uv run evals.py --experiments path/to/<EXPERIMENTS_CSV> --dataset path/to/<DATASET_CSV> --results path/to/<RESULTS_CSV>
@@ -156,3 +147,5 @@ plt.show()
156147
```
157148

158149
### Contributing
150+
151+
You're welcome to add LLM models to test in `server/api/services/llm_services`

0 commit comments

Comments
 (0)