BioCLIP 2.5 will be the new family of models (BioCLIP 2.5-H, BioCLIP 2.5-L, etc., where the letter references the CLIP model size).
- All these models are/will be trained on the revised TreeOfLife-200M, which includes an additional 19M images and more cleaning, the taxonomy remains un-changed.
- Text embeddings will all be under model name folders here.
- This is a breaking change for the pybioclip embed feature (as explained below).
- pybioclip default model will remain BioCLIP 2 because 2.5 is quite a bit larger, but it can still be used with pybioclip (we may update to 2.5-L, once that is ready).
The question for pybioclip is how we address a breaking change to the embed feature, since embeddings are a hard-coded pull from embeddings/ txt_emb_species.npy and embeddings/ txt_emb_species.json at the repo root (see code).
Related to #160
BioCLIP 2.5 will be the new family of models (BioCLIP 2.5-H, BioCLIP 2.5-L, etc., where the letter references the CLIP model size).
The question for pybioclip is how we address a breaking change to the embed feature, since embeddings are a hard-coded pull from
embeddings/ txt_emb_species.npyandembeddings/ txt_emb_species.jsonat the repo root (see code).Related to #160