Skip to content

Address embedding update with updated models #161

@egrace479

Description

@egrace479

BioCLIP 2.5 will be the new family of models (BioCLIP 2.5-H, BioCLIP 2.5-L, etc., where the letter references the CLIP model size).

  • All these models are/will be trained on the revised TreeOfLife-200M, which includes an additional 19M images and more cleaning, the taxonomy remains un-changed.
  • Text embeddings will all be under model name folders here.
    • This is a breaking change for the pybioclip embed feature (as explained below).
  • pybioclip default model will remain BioCLIP 2 because 2.5 is quite a bit larger, but it can still be used with pybioclip (we may update to 2.5-L, once that is ready).

The question for pybioclip is how we address a breaking change to the embed feature, since embeddings are a hard-coded pull from embeddings/ txt_emb_species.npy and embeddings/ txt_emb_species.json at the repo root (see code).

Related to #160

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions