-
Notifications
You must be signed in to change notification settings - Fork 321
Open
Milestone
Description
System Info
GPU being used:
Fri Oct 24 20:39:42 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.02 Driver Version: 581.42 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GTX 1650 On | 00000000:0A:00.0 On | N/A |
| 0% 45C P8 N/A / 45W | 842MiB / 4096MiB | 2% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 30 G /Xwayland N/A |
| 0 N/A N/A 33 G /Xwayland N/A |
+-----------------------------------------------------------------------------------------+
And my OS
$ neofetch
.-/+oossssoo+/-. nemo@DESKTOP-ICS0GDD
`:+ssssssssssssssssss+:` --------------------
-+ssssssssssssssssssyyssss+- OS: Ubuntu 22.04.5 LTS on Windows 10 x86_64
.ossssssssssssssssssdMMMNysssso. Kernel: 6.6.87.2-microsoft-standard-WSL2
/ssssssssssshdmmNNmmyNMMMMhssssss/ Uptime: 1 hour, 8 mins
+ssssssssshmydMMMMMMMNddddyssssssss+ Packages: 704 (dpkg), 6 (snap)
/sssssssshNMMMyhhyyyyhmNMMMNhssssssss/ Shell: bash 5.1.16
.ssssssssdMMMNhsssssssssshNMMMdssssssss. Theme: Adwaita [GTK3]
+sssshhhyNMMNyssssssssssssyNMMMysssssss+ Icons: Adwaita [GTK3]
ossyNMMMNyMMhsssssssssssssshmmmhssssssso Terminal: vscode
ossyNMMMNyMMhsssssssssssssshmmmhssssssso CPU: AMD Ryzen 5 3600 (12) @ 3.593GHz
+sssshhhyNMMNyssssssssssssyNMMMysssssss+ GPU: d653:00:00.0 Microsoft Corporation Device 008e
.ssssssssdMMMNhsssssssssshNMMMdssssssss. Memory: 1239MiB / 7913MiB
/sssssssshNMMMyhhyyyyhdNMMMNhssssssss/
+sssssssssdmydMMMMMMMMddddyssssssss+
/ssssssssssshdmNNNNmyNMMMMhssssss/
.ossssssssssssssssssdMMMNysssso.
-+sssssssssssssssssyyyssss+-
`:+ssssssssssssssssss+:`
.-/+oossssoo+/-.
And docker version
$ docker version
Client:
Version: 28.3.2
API version: 1.51
Go version: go1.24.5
Git commit: 578ccf6
Built: Wed Jul 9 16:12:50 2025
OS/Arch: linux/amd64
Context: default
Server: Docker Desktop 4.44.2 (202017)
Engine:
Version: 28.3.2
API version: 1.51 (minimum version 1.24)
Go version: go1.24.5
Git commit: e77ff99
Built: Wed Jul 9 16:13:55 2025
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.7.27
GitCommit: 05044ec0a9a75232cad458027ca83437aae3f4da
runc:
Version: 1.2.5
GitCommit: v1.2.5-0-g59923ef
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
I'm trying to run this localy inside WSL2. I use this command to launch the server and this is the output
$ make start-embed-server
docker run \
--gpus all \
-p 8080:80 --name embed \
--env RUST_BACKTRACE=full \
-v "./embeddings/data:/data" --pull always --rm \
ghcr.io/huggingface/text-embeddings-inference:turing-1.8.2 \
--model-id Qwen/Qwen3-Embedding-0.6B \
--max-batch-tokens 1024 \
--max-concurrent-requests 1 \
--max-client-batch-size 1
turing-1.8.2: Pulling from huggingface/text-embeddings-inference
Digest: sha256:600c06ef2ea5ee804a6cd656fe357aa8bf0977cdff7271756b6536e98912c589
Status: Image is up to date for ghcr.io/huggingface/text-embeddings-inference:turing-1.8.2
2025-10-24T18:40:32.279522Z INFO text_embeddings_router: router/src/main.rs:203: Args { model_id: "Qwe*/*****-*********-0.6B", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 1, max_batch_tokens: 1024, max_batch_requests: None, max_client_batch_size: 1, auto_truncate: false, default_prompt_name: None, default_prompt: None, dense_path: None, hf_api_token: None, hf_token: None, hostname: "827e7e97024c", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
2025-10-24T18:40:32.363020Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:42: Starting download
2025-10-24T18:40:32.363055Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `1_Pooling/config.json`
2025-10-24T18:40:32.363115Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_bert_config.json`
2025-10-24T18:40:32.540194Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_roberta_config.json`
2025-10-24T18:40:32.694410Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_distilbert_config.json`
2025-10-24T18:40:32.821505Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_camembert_config.json`
2025-10-24T18:40:32.952155Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_albert_config.json`
2025-10-24T18:40:33.079634Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlm-roberta_config.json`
2025-10-24T18:40:33.207285Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlnet_config.json`
2025-10-24T18:40:33.331654Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config_sentence_transformers.json`
2025-10-24T18:40:33.331741Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config.json`
2025-10-24T18:40:33.331760Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `tokenizer.json`
2025-10-24T18:40:33.331785Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:72: Model artifacts downloaded in 968.769707ms
2025-10-24T18:40:33.848108Z WARN text_embeddings_router: router/src/lib.rs:190: Could not find a Sentence Transformers config
2025-10-24T18:40:33.848147Z INFO text_embeddings_router: router/src/lib.rs:194: Maximum number of tokens per request: 32768
2025-10-24T18:40:33.848406Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 12 tokenization workers
2025-10-24T18:40:34.616653Z INFO text_embeddings_router: router/src/lib.rs:242: Starting model backend
2025-10-24T18:40:34.618113Z INFO text_embeddings_backend: backends/src/lib.rs:553: Downloading `model.safetensors`
2025-10-24T18:40:34.618574Z INFO text_embeddings_backend: backends/src/lib.rs:421: Model weights downloaded in 464.065µs
2025-10-24T18:40:34.618629Z INFO download_dense_modules: text_embeddings_backend: backends/src/lib.rs:652: Downloading `modules.json`
2025-10-24T18:40:34.618755Z INFO text_embeddings_backend: backends/src/lib.rs:433: Dense modules downloaded in 145.581µs
2025-10-24T18:40:35.601738Z INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:503: Starting Qwen3 model on Cuda(CudaDevice(DeviceId(1)))
2025-10-24T18:40:44.488836Z INFO text_embeddings_router: router/src/lib.rs:260: Warming up model
2025-10-24T18:40:48.369640Z WARN text_embeddings_router: router/src/lib.rs:319: Invalid hostname, defaulting to 0.0.0.0
2025-10-24T18:40:48.371041Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1852: Starting HTTP server: 0.0.0.0:80
2025-10-24T18:40:48.371065Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1853: Ready
From another console
$ curl http://localhost:8080/info
{"model_id":"Qwen/Qwen3-Embedding-0.6B","model_sha":null,"model_dtype":"float16","model_type":{"embedding":{"pooling":"last_token"}},"max_concurrent_requests":1,"max_input_length":32768,"max_batch_tokens":1024,"max_batch_requests":null,"max_client_batch_size":1,"auto_truncate":false,"tokenization_workers":12,"version":"1.8.2","sha":"d7af1fcc509902d8cc66cebf5a61c5e8e000e442","docker_label":"sha-d7af1fc"}
The issue happens when I do
$ curl http://localhost:8080/embed -X POST -H 'Content-Type: application/json' -d '{"inputs": ["hel
lo world"]}'
curl: (52) Empty reply from server
And the console from the container shows
thread 'tokio-runtime-worker' panicked at core/src/queue.rs:87:14:
Queue background task dropped the receiver or the receiver is too behind. This is a bug.: "Full(..)"
stack backtrace:
0: 0x623ba0dc2eff - <unknown>
1: 0x623ba09afe33 - <unknown>
2: 0x623ba0dc2652 - <unknown>
3: 0x623ba0dc2d63 - <unknown>
4: 0x623ba0dc2427 - <unknown>
5: 0x623ba0dfef18 - <unknown>
6: 0x623ba0dfee79 - <unknown>
7: 0x623ba0dff31c - <unknown>
8: 0x623ba0210bef - <unknown>
9: 0x623ba0210f95 - <unknown>
10: 0x623ba0f56d5d - <unknown>
11: 0x623ba0f560f4 - <unknown>
12: 0x623ba0f557fd - <unknown>
13: 0x623ba0f54afa - <unknown>
14: 0x623ba103c06e - <unknown>
15: 0x623ba103f230 - <unknown>
16: 0x623ba102fbf4 - <unknown>
17: 0x623ba1033b0b - <unknown>
18: 0x623ba0e0002b - <unknown>
19: 0x74a23727dac3 - <unknown>
20: 0x74a23730ebf4 - clone
21: 0x0 - <unknown>
make: *** [Makefile:5: start-embed-server] Error 139
Expected behavior
It should be able to work and not crash. If I run a python 3.13 container also with --gpus all and install sentence-transformers, I'm able to use the model:
root@190c68a05b95:/# python3
Python 3.13.9 (main, Oct 21 2025, 11:49:28) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from sentence_transformers import SentenceTransformer
>>> model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")
>>> print(model.encode(["Hello World"]))
[[ 0.00581264 -0.00302087 -0.01198636 ... 0.00724208 -0.00435534
0.00336421]]
Metadata
Metadata
Assignees
Labels
No labels