Topic modelling comparison suite

A project aimed at comparing the results of topic modelling tasks performed with various techniques:

Gensim implementation of LDA (Latent Dirichlet Aloocation).
prodLDA: implemented.
LLM_TopicModel (implemented): a generative AI powered topic model, directly outputting labels from input texts .
BERTopic: https://github.com/MaartenGr/BERTopic .

The dataset used for this experiment is a corpus of the UN debates held throughout 1946-2023. Link: https://www.kaggle.com/datasets/namigabbasov/united-nations-general-debate-corpus-1946-2023 .

Requirements

Ollama installation to run LLM_topic modelling + pulling deepseek-r1:8b from ollama.
packages in requirements.txt.

Having a GPU is reccomended to run LLM_TopicModel

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data/results		data/results
prompts		prompts
.gitignore		.gitignore
LDA_baseline.py		LDA_baseline.py
README.md		README.md
bertopicTM.py		bertopicTM.py
llm_tm.py		llm_tm.py
main.py		main.py
preprocessing.py		preprocessing.py
prodLDA.py		prodLDA.py
requirements.txt		requirements.txt