@@ -101,7 +101,7 @@ model=BAAI/bge-large-en-v1.5
101101revision=refs/pr/5
102102volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
103103
104- docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.4.0 --model-id $model --revision $revision
104+ docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.5 --model-id $model --revision $revision
105105```
106106
107107And then you can make requests like
@@ -242,15 +242,15 @@ Options:
242242
243243Text Embeddings Inference ships with multiple Docker images that you can use to target a specific backend:
244244
245- | Architecture | Image |
246- | -------------------------------------| --------------------------------------------------------------------------- |
247- | CPU | ghcr.io/huggingface/text-embeddings-inference: cpu-0 .4.0 |
248- | Volta | NOT SUPPORTED |
249- | Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference: turing-0 .4.0 (experimental) |
250- | Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:0.4.0 |
251- | Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-0.4.0 |
252- | Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-0.4.0 |
253- | Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference: hopper-0 .4.0 (experimental) |
245+ | Architecture | Image |
246+ | -------------------------------------| -------------------------------------------------------------------------|
247+ | CPU | ghcr.io/huggingface/text-embeddings-inference: cpu-0 .5 |
248+ | Volta | NOT SUPPORTED |
249+ | Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference: turing-0 .5 (experimental) |
250+ | Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:0.5 |
251+ | Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-0.5 |
252+ | Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-0.5 |
253+ | Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference: hopper-0 .5 (experimental) |
254254
255255** Warning** : Flash Attention is turned off by default for the Turing image as it suffers from precision issues.
256256You can turn Flash Attention v1 ON by using the ` USE_FLASH_ATTENTION=True ` environment variable.
@@ -279,7 +279,7 @@ model=<your private model>
279279volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
280280token=< your cli READ token>
281281
282- docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.4.0 --model-id $model
282+ docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.5 --model-id $model
283283```
284284
285285### Using Re-rankers models
@@ -297,7 +297,7 @@ model=BAAI/bge-reranker-large
297297revision=refs/pr/4
298298volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
299299
300- docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.4.0 --model-id $model --revision $revision
300+ docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.5 --model-id $model --revision $revision
301301```
302302
303303And then you can rank the similarity between a query and a list of passages with:
@@ -317,7 +317,7 @@ You can also use classic Sequence Classification models like `SamLowe/roberta-ba
317317model=SamLowe/roberta-base-go_emotions
318318volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
319319
320- docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.4.0 --model-id $model
320+ docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.5 --model-id $model
321321```
322322
323323Once you have deployed the model you can use the ` predict ` endpoint to get the emotions most associated with an input:
0 commit comments