This repository was archived by the owner on Oct 25, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 214
[NeuralChat] RAG evaluation #1333
Open
Liangyx2
wants to merge
158
commits into
main
Choose a base branch
from
yuxiang/evaluation
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 63 commits
Commits
Show all changes
158 commits
Select commit
Hold shift + click to select a range
f820019
add retrieval dataset construction codes
Liangyx2 06f8162
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 5ef0332
Update llm_generate_raw_data.py
Liangyx2 ee1db83
Delete intel_extension_for_transformers/neural_chat/tools/evaluation/…
Liangyx2 89597f2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] b132d66
Delete intel_extension_for_transformers/neural_chat/tools/evaluation/…
Liangyx2 8e955ce
update
Liangyx2 635b906
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] d7d3d03
Delete intel_extension_for_transformers/neural_chat/tools/evaluation/…
Liangyx2 c9fec02
Delete intel_extension_for_transformers/neural_chat/tools/evaluation/…
Liangyx2 5e32113
Delete intel_extension_for_transformers/neural_chat/tools/evaluation/…
Liangyx2 f67622c
Delete intel_extension_for_transformers/neural_chat/tools/evaluation/…
Liangyx2 f2e344a
Delete intel_extension_for_transformers/neural_chat/tools/evaluation/…
Liangyx2 383e5b3
Update prompt.py
Liangyx2 81014d1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 4b7bec7
Update llm_generate_raw_data.py
Liangyx2 0df51a6
Update llm_generate_raw_data.py
Liangyx2 95b16bd
Update retrieval_dataset_construction.py
Liangyx2 80dd21b
Update llm_generate_raw_data.py
Liangyx2 f495b22
Update mine_hard_negatives_check_similarity.py
Liangyx2 593dee3
add test_evaluation.py to nightly test
Liangyx2 cf59b18
Update and rename requirements.txt to requirements_cpu.txt
Liangyx2 40e0b0e
Create requirements_cuda.txt
Liangyx2 bf1b1aa
Update requirements.txt
Liangyx2 5552ebc
Update retrieval_dataset_construction.py
Liangyx2 d3b7579
Update llm_generate_raw_data.py
Liangyx2 f500b2b
Update retrieval_dataset_construction.py
Liangyx2 b65c4bf
Update llm_generate_raw_data.py
Liangyx2 c43ab73
Update test_evaluation.py
Liangyx2 feda3c0
Update retrieval_dataset_construction.py
Liangyx2 1c2c22c
Update mine_hard_negatives_check_similarity.py
Liangyx2 55a5cda
add README.md
Liangyx2 7a74f86
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 39754d0
Update README.md
Liangyx2 d7e95f0
add evaluate_retrieval.py
Liangyx2 186ab43
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 1496219
Update test_evaluation.py
Liangyx2 03a768e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 128d587
Update test_evaluation.py
Liangyx2 25177bd
Merge branch 'main' into yuxiang/evaluation
XuehaoSun 705752a
add README.md
Liangyx2 675fe2e
Update prompt.py
Liangyx2 988e542
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] d0c3c34
add llm_generate_truth.py and data
Liangyx2 be1106b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 48788d4
add ragas_evaluation.py
Liangyx2 54cc6c0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] e1b5585
Create requirements.txt
Liangyx2 88a4293
Update llm_generate_truth.py
Liangyx2 83060f9
Update evaluate_retrieval.py
Liangyx2 76b1175
Update ragas_evaluation.py
Liangyx2 b775095
Update test_evaluation.py
Liangyx2 edbb32c
Update llm_generate_truth.py
Liangyx2 8962abf
Update README.md
Liangyx2 2ef4e05
Update README.md
Liangyx2 d2ab7d8
add README.md
Liangyx2 bcdf209
Update README.md
Liangyx2 102649b
Update README.md
Liangyx2 36a28a4
Update README.md
Liangyx2 548fdd9
Add files via upload
Liangyx2 36448ea
Delete intel_extension_for_transformers/neural_chat/tests/ci/tools/te…
Liangyx2 26e3e9d
Update requirements.txt
Liangyx2 e4793d3
Update README.md
Liangyx2 0569b54
Update hn_mine.py
Liangyx2 2d15ec0
Update README.md
Liangyx2 e8127e9
Update ragas_evaluation.py
Liangyx2 321e9b6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] f9b4dab
Update requirements.txt
Liangyx2 76dc219
Update README.md
Liangyx2 b9db553
Update README.md
Liangyx2 d7b68cb
Update README.md
Liangyx2 48de606
Update requirements.txt
Liangyx2 415ebc8
Update ragas_evaluation.py
Liangyx2 f03badd
Update test_evaluation.py
Liangyx2 2b92e74
Update README.md
Liangyx2 9091729
Update retrieval_dataset_construction.py
Liangyx2 be32736
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 2c4f452
Update hn_mine.py
Liangyx2 c48f66a
Update llm_generate_raw_data.py
Liangyx2 654c44a
Update mine_hard_negatives_check_similarity.py
Liangyx2 5208c98
Update hn_mine.py
Liangyx2 ace1090
Update test_evaluation.py
Liangyx2 83f10e9
Update ragas_evaluation.py
Liangyx2 ac0aef1
Update README.md
Liangyx2 8deaabd
Update README.md
Liangyx2 2eb084c
Update README.md
Liangyx2 510e801
Update README.md
Liangyx2 dd1f37c
Update README.md
Liangyx2 ed95d2d
Update prompt.py
Liangyx2 e253f41
Update ragas_evaluation.py
Liangyx2 fc0b6b9
add evaluate_retrieval_auto.py
Liangyx2 6f081b5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 746adec
Update evaluate_retrieval_auto.py
Liangyx2 100322e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 5e07789
Update evaluate_retrieval.py
Liangyx2 0a2f742
Update ragas_evaluation.py
Liangyx2 1752684
Update test_evaluation.py
Liangyx2 2a2238e
Update ragas_evaluation.py
Liangyx2 e8f0f9c
Update README.md
Liangyx2 8d65078
Update and rename evaluate_retrieval_auto.py to evaluate_retrieval_be…
Liangyx2 a951a89
Update evaluate_retrieval_benchmark.py
Liangyx2 13921f6
add retrieval_benchmark.py
Liangyx2 02c0813
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] d212d66
Update retrieval_benchmark.py
Liangyx2 20529a4
add ragas_benchmark ragas_evaluation_benchmark
Liangyx2 5026421
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] cfa7d9c
Update retrieval_benchmark.py
Liangyx2 8d1215e
Update evaluate_retrieval_benchmark.py
Liangyx2 3458a8e
Update retrieval_benchmark.py
Liangyx2 4effd37
Update ragas_evaluation_benchmark.py
Liangyx2 3c38ae6
Update ragas_benchmark.py
Liangyx2 b02da07
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] a2a7de1
Update ragas_evaluation_benchmark.py
Liangyx2 4191f4b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 35b2d7d
Update evaluate_retrieval_benchmark.py
Liangyx2 56037b9
Update ragas_evaluation_benchmark.py
Liangyx2 de44f0d
add retrieval_benchmark.sh
Liangyx2 67456e4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 2a91336
add ragas_benchmark.sh
Liangyx2 8f05a34
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] c64ca3c
add data.txt
Liangyx2 fbef1f6
Update ragas_benchmark.sh
Liangyx2 f50aeb4
Update ragas_evaluation_benchmark.py
Liangyx2 84aea7c
Update ragas_benchmark.sh
Liangyx2 ad1814a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 932562d
Update and rename ragas_benchmark.py to ragas_superbenchmark.py
Liangyx2 50d8c83
Update evaluate_retrieval_benchmark.py
Liangyx2 a4ea5dd
Update retrieval_benchmark.sh
Liangyx2 6e29d43
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 702f9a9
Update and rename retrieval_benchmark.py to retrieval_superbenchmark.py
Liangyx2 0452526
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 008a892
add README.md
Liangyx2 5303837
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 8957b18
Update README.md
Liangyx2 96f477c
Update README.md
Liangyx2 c99856d
Update README.md
Liangyx2 19dfb93
Update README.md
Liangyx2 99940f3
Update README.md
Liangyx2 464d52b
Update README.md
Liangyx2 da2e829
Update README.md
Liangyx2 3ce2cb2
Update README.md
Liangyx2 268d89c
Update README.md
Liangyx2 40fc2e9
Update README.md
Liangyx2 13bb3b8
Update README.md
Liangyx2 763bd1d
add config file form rag evaluation
xmx-521 092e951
complete config superbenchmark
xmx-521 e931143
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] f0a0cd6
Merge branch 'main' into yuxiang/evaluation
XuhuiRen 895075b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 6b60154
Create test_evaluation.py in CI
Liangyx2 c73a68f
Update requirements.txt
Liangyx2 c6f8906
Merge branch 'main' into yuxiang/evaluation
Liangyx2 7c80ce2
Merge branch 'main' into yuxiang/evaluation
VincyZhang 576ce57
Merge branch 'main' into yuxiang/evaluation
Liangyx2 2a3ddd9
Merge branch 'main' into yuxiang/evaluation
Liangyx2 b4c0e67
Update ragas_evaluation_benchmark.py
Liangyx2 e75bbe4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] a0853a8
Merge branch 'main' into yuxiang/evaluation
Liangyx2 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
79 changes: 79 additions & 0 deletions
79
intel_extension_for_transformers/neural_chat/tests/nightly/tools/test_evaluation.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
# !/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
# | ||
# Copyright (c) 2023 Intel Corporation | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import unittest, os, shutil | ||
from unittest.mock import patch | ||
from intel_extension_for_transformers.neural_chat.tools.evaluation.data_augmentation import retrieval_dataset_construction, llm_generate_truth | ||
from intel_extension_for_transformers.neural_chat.tools.evaluation.retriever import evaluate_retrieval | ||
|
||
|
||
class TestEvaluation(unittest.TestCase): | ||
def setUp(self) -> None: | ||
if os.path.exists("data.jsonl"): | ||
os.remove("data.jsonl") | ||
if os.path.exists("data_minedHN.jsonl"): | ||
os.remove("data_minedHN.jsonl") | ||
if os.path.exists("data_minedHN_split.jsonl"): | ||
os.remove("data_minedHN_split.jsonl") | ||
if os.path.exists("ground_truth.jsonl"): | ||
os.remove("ground_truth.jsonl") | ||
if os.path.exists("output"): | ||
shutil.rmtree("output", ignore_errors=True) | ||
return super().setUp() | ||
|
||
def tearDown(self) -> None: | ||
if os.path.exists("data.jsonl"): | ||
os.remove("data.jsonl") | ||
if os.path.exists("data_minedHN.jsonl"): | ||
os.remove("data_minedHN.jsonl") | ||
if os.path.exists("data_minedHN_split.jsonl"): | ||
os.remove("data_minedHN_split.jsonl") | ||
if os.path.exists("ground_truth.jsonl"): | ||
os.remove("ground_truth.jsonl") | ||
if os.path.exists("output"): | ||
shutil.rmtree("output", ignore_errors=True) | ||
return super().tearDown() | ||
|
||
def test_retrieval_dataset_construction(self): | ||
argv = ['--llm_model', '/tf_dataset2/models/nlp_toolkit/neural-chat-7b-v3-1', \ | ||
'--embedding_model', '/tf_dataset2/inc-ut/gte-base', \ | ||
'--input', '/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/assets/docs/retrieve_multi_doc/', \ | ||
'--output', 'data', \ | ||
'--range_for_sampling', '2-2', \ | ||
'--negative_number', '1'] | ||
with patch('sys.argv', ['python retrieval_dataset_construction.py'] + argv): | ||
retrieval_dataset_construction.main() | ||
self.assertTrue(os.path.exists("data_minedHN_split.jsonl")) | ||
|
||
def test_llm_generate_truth(self): | ||
argv = ['--llm_model', '/tf_dataset2/models/nlp_toolkit/neural-chat-7b-v3-1', \ | ||
'--input', '/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation/example.jsonl', \ | ||
'--output', 'ground_truth.jsonl'] | ||
with patch('sys.argv', ['python llm_generate_truth.py'] + argv): | ||
llm_generate_truth.main() | ||
self.assertTrue(os.path.exists("ground_truth.jsonl")) | ||
|
||
def test_evaluate_retrieval(self): | ||
argv = ['--index_file_jsonl_path', '/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation/candidate_context.jsonl', \ | ||
'--query_file_jsonl_path', '/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation/example.jsonl', \ | ||
'--embedding_model', '/tf_dataset2/inc-ut/gte-base'] | ||
with patch('sys.argv', ['python evaluate_retrieval.py'] + argv): | ||
result = evaluate_retrieval.main() | ||
self.assertIsNotNone(result) | ||
|
||
if __name__ == '__main__': | ||
unittest.main() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,6 +34,7 @@ langchain_core==0.1.18 | |
langid | ||
librosa | ||
markdown | ||
modelscope | ||
neural-compressor | ||
neural_speed | ||
num2words | ||
|
16 changes: 16 additions & 0 deletions
16
intel_extension_for_transformers/neural_chat/tools/evaluation/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
#!/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
# | ||
# Copyright (c) 2023 Intel Corporation | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. |
114 changes: 114 additions & 0 deletions
114
...nsion_for_transformers/neural_chat/tools/evaluation/data_augmentation/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
# Retrieval Data Augmentation | ||
|
||
## 1. Introduction | ||
In this example, we show how to do data augmentation to construct a retrieval dataset. | ||
|
||
* **Context to Question and Mine Hard Negatives** | ||
The effect is to generate several specific open-ended questions based on the context of the input file provided. The questions are directly related to the context to form a query-positive pair, suitable for use in constructing a retrieval dataset. Then we sample some from the entire corpus as the negatives by mining hard negatives, which is a widely used method to improve the quality of finetuning sentence embedding models. | ||
|
||
* **Context, Question to Ground Truth** | ||
The effect is to generate the right answer based on the context and question provided. The answer is directly related to the context and the question, suitable for use in constructing a synthetic retrieval evaluation dataset. | ||
|
||
## 2. Supported Devices | ||
CPU, CUDA | ||
|
||
## 3. Requirements | ||
``` | ||
git clone https://github.com/intel/intel-extension-for-transformers.git | ||
cd intel-extension-for-transformers/intel_extension_for_transformers/neural_chat | ||
pip install -r requirements.txt | ||
cd pipeline/plugins/retrieval | ||
pip install -r requirements.txt | ||
``` | ||
|
||
* **On CPU** | ||
``` | ||
cd intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation | ||
pip install -r requirements_cpu.txt | ||
``` | ||
|
||
* **On CUDA** | ||
``` | ||
cd intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation | ||
pip install -r requirements_cuda.txt | ||
``` | ||
|
||
## 4. Retrieval Dataset Construction | ||
### Context to Questions and Mine Hard Negatives | ||
* **On CPU** | ||
``` | ||
cd intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/tools/evaluation | ||
python -m data_augmentation.retrieval_dataset_construction \ | ||
--llm_model <llm model path> \ | ||
--embedding_model <embedding model path> \ | ||
--input <your input file path> | ||
``` | ||
|
||
* **On CUDA** | ||
``` | ||
cd intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/tools/evaluation | ||
python -m data_augmentation.retrieval_dataset_construction \ | ||
--llm_model <llm model path> \ | ||
--embedding_model <embedding model path> \ | ||
--input <your input file path> \ | ||
--use_gpu_for_searching True | ||
``` | ||
|
||
**Some Important Arguments**: | ||
- `llm_model`: The path for the LLM model. | ||
- `embedding_model`: The path for the text embedding model. | ||
- `input`: The path of the file/folder/link of the content. | ||
- `output`: The name of output files. The default value is 'data'. The default output files are 'data.jsonl', 'data_minedHN.jsonl', 'data_minedHN_split.jsonl'. | ||
- `temperature`: The value is used to modulate the next token probabilities, and will influence the distribution of similarity scores. The default value is 0.8. | ||
- `top_p`: If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation. The default value is 0.9. | ||
- `top_k`: The number of highest probability vocabulary tokens to keep for top-k-filtering. The default value is 40. | ||
- `repetition_penalty`: The parameter for repetition penalty. 1.0 means no penalty. The default value is 2.0. | ||
- `max_new_tokens`: The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt. The default value is 48. | ||
- `do_sample`: Whether or not to use sampling ; use greedy decoding otherwise. The default value is True. | ||
- `num_beams`: Number of beams for beam search. 1 means no beam search. The default value is 2. | ||
- `num_return_sequences`: The number of independently computed returned sequences for each element in the batch. The default value is 2. | ||
- `use_cache`: Whether or not the model should use the past last key/values attentions (if applicable to the model) to speed up decoding. The default value is True. | ||
- `range_for_sampling`: The range to sample negatives. For example, `2-100` means sampling `negative_number` negatives from top2-top200 documents. You can set a larger value to reduce the difficulty of negatives (e.g., set it `60-300` to sample negatives from top60-300 passages). The default value is '2-10'. | ||
- `negative_number`: The number of sampled negatives. The default value is 5. | ||
- `use_gpu_for_searching`: Whether to use faiss-gpu to retrieve negatives. The default value is False. | ||
- `similarity_threshold`: The cosine similarity threshold used to filter the generated queries. The default value is 0.6. | ||
|
||
**Result**: | ||
Three files will be generated. The default output files are `data.jsonl`, `data_minedHN.jsonl`, `data_minedHN_split.jsonl`. The third is the final output dataset, where each line is a dict like this: | ||
``` | ||
{"query": str, "pos": List[str], "neg": List[str]} | ||
``` | ||
`query` is the query, and `pos` is a positive text, based on the context of the input file provided, `neg` is a list of negative texts. | ||
See [augmented_example.jsonl](https://github.com/intel/intel-extension-for-transformers/blob/master/intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation/augmented_example.jsonl) for a data file. | ||
|
||
|
||
### Context, Question to Ground Truth | ||
``` | ||
cd intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation | ||
python llm_generate_truth.py \ | ||
--llm_model <llm model path> \ | ||
--input example.jsonl \ | ||
--output ground_truth.jsonl | ||
``` | ||
|
||
**Some Important Arguments**: | ||
- `llm_model`: The path for the LLM model. | ||
- `input`: The path of JSON data including queries and positives where each line is a dict like this:```{"query": str, "pos": List[str]}```. See [example.jsonl](https://github.com/intel/intel-extension-for-transformers/blob/master/intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation/example.jsonl) for a data file. | ||
- `output`: The path of the output JSON data. | ||
- `temperature`: The value is used to modulate the next token probabilities, and will influence the distribution of similarity scores. The default value is 0.8. | ||
- `top_p`: If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation. The default value is 0.9. | ||
- `top_k`: The number of highest probability vocabulary tokens to keep for top-k-filtering. The default value is 40. | ||
- `repetition_penalty`: The parameter for repetition penalty. 1.0 means no penalty. The default value is 2.0. | ||
- `max_new_tokens`: The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt. The default value is 48. | ||
- `do_sample`: Whether or not to use sampling ; use greedy decoding otherwise. The default value is True. | ||
- `num_beams`: Number of beams for beam search. 1 means no beam search. The default value is 2. | ||
- `num_return_sequences`: The number of independently computed returned sequences for each element in the batch. The default value is 2. | ||
- `use_cache`: Whether or not the model should use the past last key/values attentions (if applicable to the model) to speed up decoding. The default value is True. | ||
|
||
**Result**: | ||
Each line of the output JSON data is a dict like this: | ||
``` | ||
{"question": str, "context": List[str], "ground_truth": str} | ||
``` | ||
`ground_truth` is the generated ground truth, based on the question and context provided. | ||
See [ground_truth.jsonl](https://github.com/intel/intel-extension-for-transformers/blob/master/intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation/ground_truth.jsonl) for a data file. |
16 changes: 16 additions & 0 deletions
16
intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
#!/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
# | ||
# Copyright (c) 2023 Intel Corporation | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. |
10 changes: 10 additions & 0 deletions
10
intel_extension_for_transformers/neural_chat/tools/evaluation/data_augmentation/answer.jsonl
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
{"question": "What types of platforms does the organization focus on?", "answer": "The organization focuses on delivering open software and hardware platforms with industry-defining standards, as well as leadership products, open and secure platforms, and resilient manufacturing."} | ||
{"question": "What are the core values that drive our company's actions?", "answer": "The core values driving the company's actions include focusing on having a positive impact on business, society, and the planet by working together with talented individuals. They also emphasize delivering leadership products, open and secure platforms, and resilient manufacturing to support global digitalization and ensure customer success."} | ||
{"question": "What types of companies does Intel invest in?", "answer": "Intel invests in public and private companies."} | ||
{"question": "How has technology been central to our lives in recent years?", "answer": "In recent years, technology has become more essential as it permeates various aspects of our daily lives. This includes advancements in communication, entertainment, transportation, healthcare, and many other sectors. All these rely heavily on semiconductors, which play a crucial role in powering and enabling these technologies."} | ||
{"question": "What is Intel's focus in terms of delivering leadership products?", "answer": "Intel's focus in terms of delivering leadership products includes providing open and secure platforms as well as resilient manufacturing for enabling global digitalization and fueling customer success."} | ||
{"question": "How has Intel been affected by the COVID-19 pandemic so far, and what?", "answer": "Intel has not provided specific details on how they have been directly affected by the COVID-19 pandemic. However, it can be inferred that like many other companies, they might have experienced challenges related to supply chain disruptions, workforce adjustments, and potential changes in demand for their products due to the global economic impact of the pandemic."} | ||
{"question": "How does the company protect personal data to prevent unauthorized access or misuse?", "answer": "The text provided doesn't specifically mention how the company protects personal data to prevent unauthorized access or misuse. However, it highlights the potential consequences of such incidents, which might imply that they have measures in place to minimize these risks."} | ||
{"question": "What are the conditions for accessing third-party IP?", "answer": "The conditions for accessing third-party IP can vary depending on the specific agreement between the parties involved. However, generally, it includes ensuring availability on commercially reasonable terms or at all."} | ||
{"question": "How many customers contribute to the majority of our revenue?", "answer": "A limited number of customers contribute to the majority of your revenue."} | ||
{"question": "When does Intel plan to deliver on its goal of five manufacturing technology nodes in four years?", "answer": "Intel remains on track to deliver on this goal within four years."} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.