-
Is this a bug, or am I doing something wrong? > podman run \
-p 8000:8000 \
--device nvidia.com/gpu=all \
ghcr.io/ggml-org/llama.cpp:server-cuda \
-hf ggml-org/gpt-oss-20b-GGUF \
--port 8000 \
--host 0.0.0.0 \
--jinja \
-v > curl -v -s http://127.0.0.1:8000/v1/chat/completions -s --json '{
"messages": [
{
"role": "user",
"content": "Hello!"
},
{
"role": "assistant",
"content": "I have n"
}
]
}' | jq
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"reasoning_content": "We need to respond politely asking for clarification.",
"content": "Hi there! 👋\n\nIt looks like your message got a bit garbled with all those ellipses. Could you please restate the problem or the question you’re working on? A quick rundown of:\n\n1. **What you’re trying to solve** (e.g., a math problem, a programming task, etc.) \n2. **The exact statement or formula** (e.g., “I have \\(n = 1, 2, \\dots, N\\)” or “sum of…”) \n3. **Any constraints or special conditions** you’re aware of.\n\nOnce we have a clearer picture, I’ll be happy to help you out!"
}
}
],
"created": 1756728870,
"model": "gpt-3.5-turbo",
"system_fingerprint": "b6327-4d74393b",
"object": "chat.completion",
"usage": {
"completion_tokens": 492,
"prompt_tokens": 72,
"total_tokens": 564
},
"id": "chatcmpl-Bu9wkCe2pcxwdRzzOKjNieBWZZbCcKMw",
"__verbose": {
"index": 0,
"content": " = 1 … … …………… … ... ……...? The problem …... ...……? ...??……\n\nWe need to interpret. User says: \"Hello! I have n = 1 … … ………… … … ...…?? The problem …... ...…? ...??……\". The user is unclear. Might be a math problem but truncated. They likely want help with a problem about n = 1 ... something. The context might be about a sequence or something like that. The user says \"I have n=1 … … ………… … … ...…?? The problem …... ...…? ...??……\". They likely want to ask about something like \"I have n=1,..., maybe 100, what's sum of something?\" But the message is garbled.\n\nWe must ask clarifying questions. Also try to guess what they want: maybe they have a problem about summation, like \"I have n=1,...,N, find sum of something\"? The user might be using ellipses to indicate a range. The phrase \"I have n = 1 … … ………… … … ...…??\" might be \"I have n=1,...,N? The problem ...?\" It's ambiguous.\n\nWe can respond: \"Hello! Could you please clarify the problem? The statement is not clear.\" Provide a request for clarification. Also maybe ask about the nature: math, programming, etc. Also mention that ellipses may indicate missing details.\n\nLet's produce a helpful answer: ask clarifying questions.<|end|><|start|>assistant<|channel|>analysis<|message|>We need to respond politely asking for clarification.<|end|><|start|>assistant<|channel|>final<|message|>Hi there! 👋\n\nIt looks like your message got a bit garbled with all those ellipses. Could you please restate the problem or the question you’re working on? A quick rundown of:\n\n1. **What you’re trying to solve** (e.g., a math problem, a programming task, etc.) \n2. **The exact statement or formula** (e.g., “I have \\(n = 1, 2, \\dots, N\\)” or “sum of…”) \n3. **Any constraints or special conditions** you’re aware of.\n\nOnce we have a clearer picture, I’ll be happy to help you out!",
"tokens": [],
"id_slot": 0,
"stop": true,
"model": "gpt-3.5-turbo",
"tokens_predicted": 492,
"tokens_evaluated": 72,
"generation_settings": {
"n_predict": -1,
"seed": 4294967295,
"temperature": 0.800000011920929,
"dynatemp_range": 0.0,
"dynatemp_exponent": 1.0,
"top_k": 40,
"top_p": 0.949999988079071,
"min_p": 0.05000000074505806,
"top_n_sigma": -1.0,
"xtc_probability": 0.0,
"xtc_threshold": 0.10000000149011612,
"typical_p": 1.0,
"repeat_last_n": 64,
"repeat_penalty": 1.0,
"presence_penalty": 0.0,
"frequency_penalty": 0.0,
"dry_multiplier": 0.0,
"dry_base": 1.75,
"dry_allowed_length": 2,
"dry_penalty_last_n": 131072,
"dry_sequence_breakers": [
"\n",
":",
"\"",
"*"
],
"mirostat": 0,
"mirostat_tau": 5.0,
"mirostat_eta": 0.10000000149011612,
"stop": [],
"max_tokens": -1,
"n_keep": 0,
"n_discard": 0,
"ignore_eos": false,
"stream": false,
"logit_bias": [],
"n_probs": 0,
"min_keep": 0,
"grammar": "",
"grammar_lazy": false,
"grammar_triggers": [],
"preserved_tokens": [
200003,
200005,
200006,
200007,
200008
],
"chat_format": "GPT-OSS",
"reasoning_format": "auto",
"reasoning_in_content": false,
"thinking_forced_open": false,
"samplers": [
"penalties",
"dry",
"top_n_sigma",
"top_k",
"typ_p",
"top_p",
"min_p",
"xtc",
"temperature"
],
"speculative.n_max": 16,
"speculative.n_min": 0,
"speculative.p_min": 0.75,
"timings_per_token": false,
"post_sampling_probs": false,
"lora": []
},
"prompt": "<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.\nKnowledge cutoff: 2024-06\nCurrent date: 2025-09-01\n\nReasoning: medium\n\n# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>user<|message|>Hello!<|end|><|start|>assistantI have n",
"has_new_line": true,
"truncated": false,
"stop_type": "eos",
"stopping_word": "",
"tokens_cached": 563,
"timings": {
"prompt_n": 1,
"prompt_ms": 23.163,
"prompt_per_token_ms": 23.163,
"prompt_per_second": 43.172300651901736,
"predicted_n": 492,
"predicted_ms": 2910.808,
"predicted_per_token_ms": 5.916276422764228,
"predicted_per_second": 169.02523285630656
}
},
"timings": {
"prompt_n": 1,
"prompt_ms": 23.163,
"prompt_per_token_ms": 23.163,
"prompt_per_second": 43.172300651901736,
"predicted_n": 492,
"predicted_ms": 2910.808,
"predicted_per_token_ms": 5.916276422764228,
"predicted_per_second": 169.02523285630656
}
} |
Beta Was this translation helpful? Give feedback.
Answered by
aldehir
Sep 1, 2025
Replies: 1 comment
-
Try > curl -v -s http://127.0.0.1:8000/v1/chat/completions -s --json '{
"messages": [
{
"role": "user",
"content": "Hello!"
},
{
"role": "assistant",
"content": "<|channel|>analysis<|message|><|end|><|start|>assistant<|channel|>final<|message|>I have n"
}
]
}' | jq response{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": " patterns matching 1..~8 which give them even and !- etc. In this because *applied RPP and after graph.\n\nThe consistent the # the pattern to for up to any problem of your... Not. I'm order of these should we to them your. We are I to and by using perfect left you. And I'm\n\nGiven a M© and but and problem the.\n\nI think you have to be aware. The answer might be a few '0's? \n\nAP for any helpful 2 patterns...\n\nI hope you find this helpful. If you have any other questions, let me know."
}
}
],
"created": 1756758383,
"model": "gpt-3.5-turbo",
"system_fingerprint": "b6328-8edb5c46",
"object": "chat.completion",
"usage": {
"completion_tokens": 128,
"prompt_tokens": 81,
"total_tokens": 209
},
"id": "chatcmpl-btFyTsD1po8zRQ2TxYv1xqtTKVLR9mRg",
"__verbose": {
"index": 0,
"content": " patterns matching 1..~8 which give them even and !- etc. In this because *applied RPP and after graph.\n\nThe consistent the # the pattern to for up to any problem of your... Not. I'm order of these should we to them your. We are I to and by using perfect left you. And I'm\n\nGiven a M© and but and problem the.\n\nI think you have to be aware. The answer might be a few '0's? \n\nAP for any helpful 2 patterns...\n\nI hope you find this helpful. If you have any other questions, let me know.",
"tokens": [],
"id_slot": 2,
"stop": true,
"model": "gpt-3.5-turbo",
"tokens_predicted": 128,
"tokens_evaluated": 81,
"generation_settings": {
"n_predict": -1,
"seed": 4294967295,
"temperature": 1.0,
"dynatemp_range": 0.0,
"dynatemp_exponent": 1.0,
"top_k": 1000,
"top_p": 1.0,
"min_p": 0.0,
"top_n_sigma": -1.0,
"xtc_probability": 0.0,
"xtc_threshold": 0.10000000149011612,
"typical_p": 1.0,
"repeat_last_n": 64,
"repeat_penalty": 1.0,
"presence_penalty": 0.0,
"frequency_penalty": 0.0,
"dry_multiplier": 0.0,
"dry_base": 1.75,
"dry_allowed_length": 2,
"dry_penalty_last_n": 267264,
"dry_sequence_breakers": [
"\n",
":",
"\"",
"*"
],
"mirostat": 0,
"mirostat_tau": 5.0,
"mirostat_eta": 0.10000000149011612,
"stop": [],
"max_tokens": -1,
"n_keep": 0,
"n_discard": 0,
"ignore_eos": false,
"stream": false,
"logit_bias": [],
"n_probs": 0,
"min_keep": 0,
"grammar": "",
"grammar_lazy": false,
"grammar_triggers": [],
"preserved_tokens": [
200003,
200005,
200006,
200007,
200008
],
"chat_format": "GPT-OSS",
"reasoning_format": "auto",
"reasoning_in_content": false,
"thinking_forced_open": false,
"samplers": [
"penalties",
"dry",
"top_n_sigma",
"top_k",
"typ_p",
"top_p",
"min_p",
"xtc",
"temperature"
],
"speculative.n_max": 16,
"speculative.n_min": 0,
"speculative.p_min": 0.75,
"timings_per_token": false,
"post_sampling_probs": false,
"lora": []
},
"prompt": "<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.\nKnowledge cutoff: 2024-06\nCurrent date: 2025-09-01\n\nReasoning: high\n\n# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>user<|message|>Hello!<|end|><|start|>assistant<|channel|>analysis<|message|><|end|><|start|>assistant<|channel|>final<|message|>I have n",
"has_new_line": true,
"truncated": false,
"stop_type": "eos",
"stopping_word": "",
"tokens_cached": 208,
"timings": {
"prompt_n": 81,
"prompt_ms": 69.476,
"prompt_per_token_ms": 0.8577283950617284,
"prompt_per_second": 1165.8702285681386,
"predicted_n": 128,
"predicted_ms": 1642.158,
"predicted_per_token_ms": 12.829359375,
"predicted_per_second": 77.94621467605432
}
},
"timings": {
"prompt_n": 81,
"prompt_ms": 69.476,
"prompt_per_token_ms": 0.8577283950617284,
"prompt_per_second": 1165.8702285681386,
"predicted_n": 128,
"predicted_ms": 1642.158,
"predicted_per_token_ms": 12.829359375,
"predicted_per_second": 77.94621467605432
}
} The prefill needs to use the harmony response format and since it's a generation, should contain an analysis channel. In this example I left it empty, i.e. no "reasoning". With this, the parsing breaks because it begins parsing from the end of the prefill. Your best bet might be to exclude |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
dstoc
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Try
response