Skip to content

Codernob/IndicDialogue

Repository files navigation

IndicDialogue

Usage

Download the raw dataset from google drive https://drive.google.com/file/d/1GgI-ebyLE1J6rkE6KzpB3e5TkRn2UYty/view?usp=drive_link Extract raw_dataset.zip in the directory.

pip install -r requirements.txt

Run

python make_dataset.py --clean

Citation

If you use my code or ideas from my paper in your work, please cite my paper.

@article{arnob2024indicdialogue,
  title={IndicDialogue: A dataset of subtitles in 10 Indic languages for Indic language modeling},
  author={Arnob, Noor Mairukh Khan and Faiyaz, A and Fuad, Md Mubtasim and Al Masud, Shah Murtaza Rashid and Das, Baivab and Mridha, MF},
  journal={Data in Brief},
  volume={55},
  pages={110690},
  year={2024},
  publisher={Elsevier}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •