Language models' definitions vs dictionaries' ones

Project Overview

This project compares the originality of the definitions generated in large language models, including ChatGPT, Gemini, and Meta AI, against established dictionary definitions, including WordNet, Merriam-Webster, and dictionary.com. This will help detect possible plagiarism by comparing the similarities in AI-generated definitions with those from reputable dictionaries.

Objective

To investigate whether AI-generated definitions mimic dictionary definitions or offer original content. This is achieved by transforming definitions into vectors and comparing them using the sentence_transformers library to detect similarities.

Method

Collect definitions for 10,000 common English words from some large language models (chatGPT, Gemini, etc).
Gather definitions from trusted dictionaries (WordNet, Merriam-Webster dictionary, and dictionary.com).
Convert the definitions into vectors or embeddings for comparison using 2 models from the sentence_transformers library: all-mpnet-base-v2 and bert-base-nli-mean-tokens.
Evaluate potential plagiarism by measuring vector similarities between AI and dictionary definitions using the k-nearest neighbors (knn) algorithm.
Provide statistics and plots to illustrate model performance in maintaining originality.

Installation

Clone the repository:

git clone https://github.com/2006coder/LMs-words-defs-vs-dictionaries-defs

Install dependencies:
```
pip install -r requirements.txt
```
Set up API keys (if required) for language models like ChatGPT, Gemini, etc.

Create an evaluation for your own:

Prepare the list of words you wish to analyze.
Run the script to collect definitions from LLMs and dictionaries. You can split the 10000.csv into smaller .csv files and run the script simultaneously.
Convert the definitions into vectors using the provided script.
Perform the similarity analysis and view the results in the output report.

Todo list:

1. Get the definitions from mentioned dictionaries.
2. Conclusion

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
10000.csv		10000.csv
1st_get_AI_definitions.py		1st_get_AI_definitions.py
2nd_extract_features_1.py		2nd_extract_features_1.py
3rd_extract_features_2.py		3rd_extract_features_2.py
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
visualization1.gif		visualization1.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Language models' definitions vs dictionaries' ones

Project Overview

Objective

Method

Installation

Create an evaluation for your own:

Todo list:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

2006coder/LLMs-words-defs-vs-dictionaries-defs

Folders and files

Latest commit

History

Repository files navigation

Language models' definitions vs dictionaries' ones

Project Overview

Objective

Method

Installation

Create an evaluation for your own:

Todo list:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages