Question-Answering System for Geoportale Nazionale Archeologia (GNA)

This repository contains the codebase and evaluation material for the design and development of a Question-Answering (QA) system for the Geoportale Nazionale per l’Archeologia (GNA). The project was carried out as part of the internship in preparation for my Master’s thesis in Digital Humanities and Digital Knowledge at the University of Bologna.

Project Overview

The GNA QA system is a retrieval-augmented question-answering assistant designed to respond to natural language queries based on official GNA documentation. It integrates web crawling, document chunking, vector-based retrieval and generation in a modular and scalable architecture.

The application is available at:

👉 gna-assistant-ai.streamlit.app

Main features include:

Focused crawling and sitemap generation from GNA wiki operative manual
Chunked document processing and metadata annotation
Dense embeddings generation using multilingual Sentence Transformers
Retrieval-augmented generation
Citation-aware prompting
Streamlit-based user interface and feedback tracking
Evaluation suite for retrieval metrics (Precision@k, Recall@k, MRR)

📂 Structure

generate_sitemap.py, create_chunks.py, vector_store.py: Knowledge base preparation
rag_sys.py: Retrieval-Augmented Generation pipeline
main.py: Streamlit application logic
feedback_handling.py: Feedback management
evaluate_retrieval.py: Evaluation framework
create_test_data.py: Test set generation
main_preprocess.py: Combined pipeline for sitemap, chunking, and vectorization
data/: Document chunks, test datasets, metrics, logs
feedback/: Local SQLite database for user feedback
sitemap/: XML sitemap of the GNA website
OCR/: OCR-related scripts
.faiss_db/: FAISS vector store
.streamlit/: Streamlit configuration files
requirements.txt: Python dependencies
packages.txt: Additional system requirements for Streamlit Cloud

📊 Evaluation

Automated evaluation was performed using a synthetic test set of 400 domain-specific questions. Key metrics include:

Precision@5
Recall@5
MRR (Mean-Reciprocal Rank)
Avg. Retrieval Time

Acknowledgments

This project was supervised and supported by Mario Caruso and Simone Persiani from BUP Solutions, whose guidance and technical insights were instrumental throughout the internship. I sincerely thank them for their time, encouragement, and valuable mentorship.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Question-Answering System for Geoportale Nazionale Archeologia (GNA)

Project Overview

The application is available at:

👉 gna-assistant-ai.streamlit.app

Main features include:

📂 Structure

📊 Evaluation

Acknowledgments

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
.faiss_db		.faiss_db
.streamlit		.streamlit
.vscode		.vscode
OCR		OCR
ablation		ablation
data		data
eval		eval
feedback		feedback
generate_test		generate_test
sitemap		sitemap
.env_example		.env_example
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
create_chunks.py		create_chunks.py
create_test_data.py		create_test_data.py
evaluate_feedback.py		evaluate_feedback.py
feedback_handling.py		feedback_handling.py
generate_sitemap.py		generate_sitemap.py
main.py		main.py
main_preprocess.py		main_preprocess.py
packages.txt		packages.txt
rag_sys.py		rag_sys.py
requirements.txt		requirements.txt
vector_store.py		vector_store.py

License

Asemica-me/GNAvigator

Folders and files

Latest commit

History

Repository files navigation

Question-Answering System for Geoportale Nazionale Archeologia (GNA)

Project Overview

The application is available at:

👉 gna-assistant-ai.streamlit.app

Main features include:

📂 Structure

📊 Evaluation

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages