Code for GloCOM: A Short Text Neural Topic Model via Global Clustering Context

Preparing Libraries

Python 3.10.14

Install the following libraries

numpy==1.26.3
scipy==1.10.1
sentence-transformers==2.7.0
torch==2.4.1+cu124
torchvision==0.19.1+cu124
gensim==4.3.3
scikit-learn==1.5.1
tqdm==4.66.5

Install Java
Download this Java jar to ./evaluations and rename it to palmetto.jar

Download and extract this processed Wikipedia corpus to ./evaluations/wiki_data as an external reference corpus.

Here is the folder structure:

    |- evaluations
        | - wiki_data
            | - wikipedia_bd/
            | - wikipedia_bd.histogram
        |- ...
        |- palmetto.jar

Running

To run and evaluate our model, run the following command:

python run.py --model GloCOM --num_topics 50 --data_dir data/SearchSnippets

You can also specify additional arguments when running the model:

--aug_coef <float> # Default: 0.5 - Coefficient for augmentation 
--prior_var <float> # Default: 0.1 - Prior variance
--weight_loss_ECR <float> # Default: 60.0 - Weight for ECR loss

KNNTM Running

We provide KNNTM OT distances and code in this link. Unzip the file and the folder structure should be like this:

```
|- data
    | - SearchSnippets
        | - KNNTM/
            | - M_coo.npz
            | - M_cos.npz
```

To run the KNNTM model

python run.py --model KNNTM --num_topics 50 --data_dir data/SearchSnippets

You can also specify additional arguments when running the model:

--alpha <float> # Default: 1.0
--num_k <int> # Default: 30 
--eta <float> # Default: 0.2 
--rho <float> # Default: 0.6 
--p_epochs <int> # Default: 20

Acknowledgement

Some part of this implementation is based on TopMost. We also utilizes Palmetto for the evaluation of topic coherence.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data/SearchSnippets		data/SearchSnippets
dataloader		dataloader
evaluations		evaluations
models		models
results		results
trainer		trainer
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code for GloCOM: A Short Text Neural Topic Model via Global Clustering Context

Preparing Libraries

Running

KNNTM Running

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Code for GloCOM: A Short Text Neural Topic Model via Global Clustering Context

Preparing Libraries

Running

KNNTM Running

Acknowledgement

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages