Here's the embedding code :
from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoModel, AutoTokenizer
import numpy as np
model_ort = ORTModelForFeatureExtraction.from_pretrained('BAAI/bge-small-en-v1.5', file_name="onnx/model.onnx")
tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-small-en-v1.5')
model = AutoModel.from_pretrained('BAAI/bge-small-en-v1.5')
...
inputs = tokenizer(documents, padding=True, truncation=True, return_tensors='pt', max_length=512)
embeddings = model(**inputs)[0][:, 0].detach().numpy()
It works but only using cpu, when I tried using to("mps"), it wont work
How can I use mps for this scenario ?
Thanks
Here's the embedding code :
It works but only using cpu, when I tried using to("mps"), it wont work
How can I use mps for this scenario ?
Thanks