AIR-Project

Project Overview

This information retrieval project focuses on improving the effectiveness of document retrieval.

Methodology:

1. Dataset Analysis:

Thorough examination of the dataset to identify the most common topics.
Sentiment analysis: https://huggingface.co/docs/transformers/model_doc/distilbert)
The Train.jsonl dataset file is not uploaded due to its size, you can find it here: https://huggingface.co/datasets/BeIR/signal1m-generated-queries

2. BM25 Algorithm:

Utilization of the BM25 algorithm for the initial document retrieval.
Analyzing the performance of BM25 in capturing the relevance of documents to queries.

3. BERT Reranking:

Integration of BERT for reranking the documents retrieved by BM25.
This BERT model: is a sentence-transformers model. It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. You can find it here: https://huggingface.co/sentence-transformers/bert-base-nli-mean-tokens consists

4. Query-Document Relationship Refinement:

Correction of inaccuracies in query-document assignments.
Ensuring that queries are appropriately matched with relevant documents.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
CLEAN DATASETS		CLEAN DATASETS
Data_Cleaning.ipynb		Data_Cleaning.ipynb
Dataset_Analysis.ipynb		Dataset_Analysis.ipynb
Final presentation.pdf		Final presentation.pdf
README.md		README.md
Ranking_BM25_and_ReRanking_with_BERT.ipynb		Ranking_BM25_and_ReRanking_with_BERT.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AIR-Project

Project Overview

Methodology:

1. Dataset Analysis:

2. BM25 Algorithm:

3. BERT Reranking:

4. Query-Document Relationship Refinement:

About

Uh oh!

Releases

Packages

Languages

cverazam/AIR-Project

Folders and files

Latest commit

History

Repository files navigation

AIR-Project

Project Overview

Methodology:

1. Dataset Analysis:

2. BM25 Algorithm:

3. BERT Reranking:

4. Query-Document Relationship Refinement:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages