Skip to content

theme-ontology/lto-classification-benchmark

Repository files navigation

A Classification Benchmark based on the Literary Theme Ontology

This repository contains the datasets and code described in the manuscript "A Classification Benchmark based on the Literary Theme Ontology".

The summary and subtitle train and test sets can be found in the folder dataset. The suffix _sub in the filesnames refer to the subtitle datasets, the suffix _sum to the summary datasets.

The code for all the experiments in the manuscript can be found in the 6 different notebooks. Note that an X and a y dataset needs to be created from the train and test datasets for the SVM_LogReg experiments.

  • The code for the bag of words classifications (logistic regression and SVM) and the FastText experiments can be found in SVM_LogReg.ipynb
  • The code for the Setfit experiments can be found in setfit.ipynb
  • The code for the the zero-shot experiments can be found in the notebooks with the prefix prompting_zero_shot. The suffix _sub refers to the subtitle datasets, the suffix _sum to the summary dataset.
  • The code for the few-shot experiments can be found in the notebooks with the prefics prompoting_few_shot. The suffix _sub refers to the subtitle datasets, the suffix _sum to the summary dataset.

Questions and Feedback

If you have a technical question about the manuscript, feel free to post it as an issue.

For more open-ended inquiries, we encourage starting a discussion.

About

A Classification Benchmark based on the Literary Theme Ontology

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •