Hi, Thank you for your work. I noticed this is trained on cDNA data, while the tokeniser seems to use RNA vocab (https://github.com/oxpig/CaLM/blob/main/calm/alphabet.py) Can you please clarify the data preprocessing pipeline?
Hi,
Thank you for your work.
I noticed this is trained on cDNA data, while the tokeniser seems to use RNA vocab (https://github.com/oxpig/CaLM/blob/main/calm/alphabet.py)
Can you please clarify the data preprocessing pipeline?