Skip to content

Commit 570da12

Browse files
[FEATURE] add json schema support (#115)
* Add --json-response flag for structured API responses Adds a new CLI flag that enables JSON response formatting: - Adds json_response field to RequestFuncInput model - Modifies OpenAI backend to apply JSON formatting when flag is enabled - Includes response_format and chat_template_kwargs settings - Prompts model to avoid premature JSON closure * change prompt * add --disable-thinking separately * slightly prompt change * update README * Implement JSON schema support for structured outputs - Add --json-schema-file and --json-schema-inline CLI arguments - Add --json-response-prompt for customizable JSON formatting messages - Extend RequestFuncInput and Client classes with json_schema support - Update OpenAI chat completions backend to use proper JSON schema format - Add sample JSON schema files for testing - Maintain backward compatibility with existing --json-response flag * Enhance JSON schema system with flexible prompt handling - Replace --json-response-prompt with unified --json-prompt argument - Add @file syntax support for loading prompts from files - Add --include-schema-in-prompt flag to include schema in prompt text - Implement comprehensive input validation with clear error messages - Simplify backend prompt logic with consistent schema formatting - Add extensive README documentation with examples and usage patterns - Remove deprecated --json-response-prompt for cleaner API - Fix error handling for malformed JSON responses in streaming mode * Fix overly general exception handling in main.py - Replace broad Exception catches with specific exception types - Use OSError, PermissionError for file operations - Use json.JSONDecodeError for JSON parsing errors - Improve error messages with more specific context * Clean up sample schemas, keep only simple_schema.json - Remove complex_schema.json and sample_response_schema.json - Keep simple_schema.json as the primary example schema - Update simple_schema.json with improved structure * Simplify JSON schema documentation in README - Remove verbose examples and compatibility notes - Keep only essential file-based and inline schema usage - Reference tests/data/simple_schema.json for example schema - Make documentation concise and focused * Refactor JSON validation to parse_args function - Move JSON argument validation from run_main() to parse_args() - Create validate_json_args() function for better separation of concerns - Process and validate JSON arguments early during argument parsing - Store processed custom_prompt and json_schema in args namespace - Maintain same validation logic but in proper location - Follow pattern of other argument validations in parse_args() * Consolidate JSON schema arguments into unified --json-schema flag Replace separate --json-schema-file and --json-schema-inline arguments with single --json-schema that supports both inline JSON and @file syntax, matching the pattern established by --json-prompt. * Clean up code to address comments * Update README.md Co-authored-by: Benjamin Chislett <[email protected]> * Update README.md Co-authored-by: Benjamin Chislett <[email protected]> --------- Co-authored-by: Benjamin Chislett <[email protected]>
1 parent 74816b1 commit 570da12

File tree

5 files changed

+268
-2
lines changed

5 files changed

+268
-2
lines changed

README.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,11 @@ After benchmarking, the results are saved to `output-file.json` (or specified by
5959
| `--disable-tqdm` | Specify to disable tqdm progress bar. |
6060
| `--best-of` | Number of best completions to return. |
6161
| `--use-beam-search` | Use beam search for completions. |
62+
| `--json-response` | Request responses in JSON object format from the API. |
63+
| `--json-prompt` | No additional context is included in the prompt. Use `--json-prompt` to add custom instructions (appended to end of original prompt) if desired when using one of the JSON modes. Supports inline text or file input with `@file` syntax (e.g., `--json-prompt @prompt.txt`). |
64+
| `--json-schema` | JSON schema for structured output validation. Supports inline JSON string or file input with `@file` syntax (e.g., `--json-schema @schema.json`). |
65+
| `--include-schema-in-prompt` | Include the JSON schema in the prompt text for better LLM comprehension. Requires `--json-schema` to be specified. |
66+
| `--disable-thinking` | Disable thinking mode in chat templates. |
6267
| `--output-file` | Output json file to save the results. |
6368
| `--debug` | Log debug messages. |
6469
| `--profile` | Use Torch Profiler. The endpoint must be launched with VLLM_TORCH_PROFILER_DIR to enable profiler. |
@@ -72,6 +77,18 @@ After benchmarking, the results are saved to `output-file.json` (or specified by
7277
| `--top-p` | Top-P to use for sampling. Defaults to None, or 1.0 for backends which require it to be specified. |
7378
| `--top-k` | Top-K to use for sampling. Defaults to None. |
7479

80+
### JSON Schema Support
81+
82+
For structured JSON outputs with schema validation:
83+
84+
```bash
85+
# File-based schema (see tests/data/simple_schema.json for example)
86+
fib benchmark --json-schema @tests/data/simple_schema.json -n 20 -rps 10 --backend openai-chat --endpoint /v1/chat/completions
87+
88+
# Inline schema
89+
fib benchmark --json-schema '{"type":"object","properties":{"answer":{"type":"string"}},"required":["answer"]}' -n 20 -rps 10 --backend openai-chat --endpoint /v1/chat/completions
90+
```
91+
7592
In addition to providing these arguments on the command-line, you can use `--config-file` to pre-define the parameters for your use case. Examples are provided in `examples/`
7693

7794
### Output
@@ -180,4 +197,5 @@ Mean ITL (ms): 9.35
180197
Median ITL (ms): 8.00
181198
P99 ITL (ms): 89.88
182199
==================================================
183-
```
200+
```
201+

src/flexible_inference_benchmark/engine/backend_functions.py

Lines changed: 43 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,11 @@ class RequestFuncInput(BaseModel):
4343
top_p: Optional[float] = None
4444
top_k: Optional[int] = None
4545
run_id: Optional[str] = None
46+
json_response: bool = False
47+
custom_prompt: str = ""
48+
disable_thinking: bool = False
49+
json_schema: Optional[Dict[str, Any]] = None
50+
include_schema_in_prompt: bool = False
4651

4752

4853
class RequestFuncOutput(BaseModel):
@@ -448,13 +453,50 @@ async def async_request_openai_chat_completions(
448453
with otel_span as span:
449454
async with aiohttp.ClientSession(timeout=AIOHTTP_TIMEOUT) as session:
450455
assert not request_func_input.use_beam_search
456+
457+
# Apply custom prompt and schema formatting
458+
append_msg = ""
459+
460+
# 1. Append custom prompt when provided
461+
if request_func_input.custom_prompt:
462+
append_msg += request_func_input.custom_prompt
463+
464+
# 2. Include schema in prompt if requested
465+
if request_func_input.include_schema_in_prompt and request_func_input.json_schema:
466+
if append_msg:
467+
append_msg += "\n\n"
468+
append_msg += "Please follow this JSON schema for your response:\n```json\n"
469+
append_msg += json.dumps(request_func_input.json_schema, indent=2)
470+
append_msg += "\n```"
471+
472+
# Apply the combined message to content
473+
if append_msg:
474+
if isinstance(content_body, str):
475+
content_body += append_msg
476+
else:
477+
content_body[-1]["text"] += append_msg
478+
451479
payload = {
452480
"model": request_func_input.model,
453481
"messages": [{"role": "user", "content": content_body}],
454482
"max_tokens": request_func_input.output_len,
455483
"stream": request_func_input.stream,
456484
"ignore_eos": request_func_input.ignore_eos,
457485
}
486+
487+
# Add JSON response format if flag is enabled
488+
if request_func_input.json_schema:
489+
payload["response_format"] = {
490+
"type": "json_schema",
491+
"json_schema": {"name": "response", "schema": request_func_input.json_schema, "strict": True},
492+
}
493+
elif request_func_input.json_response:
494+
payload["response_format"] = {"type": "json_object"}
495+
496+
# Add thinking control if flag is enabled
497+
if request_func_input.disable_thinking:
498+
payload["chat_template_kwargs"] = {"enable_thinking": False}
499+
458500
if request_func_input.stream:
459501
payload["stream_options"] = {"include_usage": True}
460502
apply_sampling_params(payload, request_func_input, always_top_p=False)
@@ -505,7 +547,7 @@ async def async_request_openai_chat_completions(
505547
delta = None
506548
content = None
507549
reasoning_content = None
508-
if request_func_input.stream and len(data["choices"]) > 0:
550+
if request_func_input.stream and "choices" in data and len(data["choices"]) > 0:
509551
delta = data["choices"][0]["delta"]
510552
content = delta.get("content", None)
511553
reasoning_content = delta.get("reasoning_content", None)

src/flexible_inference_benchmark/engine/client.py

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,11 @@ def __init__(
4444
top_p: Optional[float] = None,
4545
top_k: Optional[int] = None,
4646
run_id: Optional[str] = None,
47+
json_response: bool = False,
48+
custom_prompt: str = "",
49+
disable_thinking: bool = False,
50+
json_schema: Optional[Dict[str, Any]] = None,
51+
include_schema_in_prompt: bool = False,
4752
):
4853
self.backend = backend
4954
self.api_url = api_url
@@ -66,6 +71,11 @@ def __init__(
6671
self.top_p = top_p
6772
self.top_k = top_k
6873
self.run_id = run_id or str(uuid.uuid4())
74+
self.json_response = json_response
75+
self.custom_prompt = custom_prompt
76+
self.disable_thinking = disable_thinking
77+
self.json_schema = json_schema
78+
self.include_schema_in_prompt = include_schema_in_prompt
6979

7080
@property
7181
def request_func(
@@ -178,6 +188,11 @@ async def benchmark(
178188
top_p=self.top_p,
179189
top_k=self.top_k,
180190
run_id=self.run_id,
191+
json_response=self.json_response,
192+
custom_prompt=self.custom_prompt,
193+
disable_thinking=self.disable_thinking,
194+
json_schema=self.json_schema,
195+
include_schema_in_prompt=self.include_schema_in_prompt,
181196
)
182197
for (data_sample, media_sample) in zip(data, requests_media)
183198
]
@@ -221,6 +236,12 @@ async def validate_url_endpoint(
221236
temperature=self.temperature,
222237
top_p=self.top_p,
223238
top_k=self.top_k,
239+
run_id=self.run_id,
240+
json_response=self.json_response,
241+
custom_prompt=self.custom_prompt,
242+
disable_thinking=self.disable_thinking,
243+
json_schema=self.json_schema,
244+
include_schema_in_prompt=self.include_schema_in_prompt,
224245
)
225246
return await self.send_request(-1, data, 0, None, None)
226247

@@ -239,6 +260,15 @@ async def start_torch_profiler(self) -> Union[RequestFuncOutput, Any]:
239260
stream=self.stream,
240261
cookies=self.cookies,
241262
logprobs=self.logprobs,
263+
temperature=self.temperature,
264+
top_p=self.top_p,
265+
top_k=self.top_k,
266+
run_id=self.run_id,
267+
json_response=self.json_response,
268+
custom_prompt=self.custom_prompt,
269+
disable_thinking=self.disable_thinking,
270+
json_schema=self.json_schema,
271+
include_schema_in_prompt=self.include_schema_in_prompt,
242272
)
243273
return await self.signal_profiler(0, data, 0, None, None)
244274

@@ -257,5 +287,14 @@ async def stop_torch_profiler(self) -> Union[RequestFuncOutput, Any]:
257287
stream=self.stream,
258288
cookies=self.cookies,
259289
logprobs=self.logprobs,
290+
temperature=self.temperature,
291+
top_p=self.top_p,
292+
top_k=self.top_k,
293+
run_id=self.run_id,
294+
json_response=self.json_response,
295+
custom_prompt=self.custom_prompt,
296+
disable_thinking=self.disable_thinking,
297+
json_schema=self.json_schema,
298+
include_schema_in_prompt=self.include_schema_in_prompt,
260299
)
261300
return await self.signal_profiler(0, data, 0, None, None)

src/flexible_inference_benchmark/main.py

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -470,6 +470,37 @@ def add_benchmark_subparser(subparsers: argparse._SubParsersAction) -> Any: # t
470470

471471
benchmark_parser.add_argument("--use-beam-search", action="store_true", help="Use beam search for completions.")
472472

473+
benchmark_parser.add_argument(
474+
"--json-response", action="store_true", help="Request responses in JSON format from the API."
475+
)
476+
477+
benchmark_parser.add_argument(
478+
"--json-prompt",
479+
type=str,
480+
default="",
481+
help="Custom prompt message to append when using JSON modes. "
482+
"Supports inline text or file input with @file syntax (e.g., --json-prompt @prompt.txt). "
483+
"Always appended when specified, regardless of JSON mode type.",
484+
)
485+
486+
benchmark_parser.add_argument(
487+
"--json-schema",
488+
type=str,
489+
help="JSON schema for structured output validation. "
490+
"Supports inline JSON string or file input with @file syntax (e.g., --json-schema @schema.json).",
491+
)
492+
493+
benchmark_parser.add_argument(
494+
"--include-schema-in-prompt",
495+
action="store_true",
496+
help="Include the JSON schema in the prompt text for better LLM comprehension. "
497+
"Requires --json-schema to be specified.",
498+
)
499+
500+
benchmark_parser.add_argument(
501+
"--disable-thinking", action="store_true", help="Disable thinking mode in chat templates."
502+
)
503+
473504
benchmark_parser.add_argument(
474505
"--output-file",
475506
type=str,
@@ -515,6 +546,114 @@ def add_benchmark_subparser(subparsers: argparse._SubParsersAction) -> Any: # t
515546
return benchmark_parser
516547

517548

549+
def validate_json_args(args: argparse.Namespace) -> None:
550+
"""Validate JSON-related arguments and load files."""
551+
if args.subcommand != 'benchmark':
552+
return
553+
554+
# Process JSON prompt with @file support
555+
custom_prompt = ""
556+
if args.json_prompt:
557+
if args.json_prompt.startswith("@"):
558+
# File-based prompt loading
559+
prompt_file_path = args.json_prompt[1:] # Remove @ prefix
560+
try:
561+
with open(prompt_file_path, 'r', encoding='utf-8') as f:
562+
custom_prompt = f.read().strip()
563+
if not custom_prompt:
564+
logger.error(f"Prompt file '{prompt_file_path}' is empty")
565+
sys.exit(1)
566+
logger.info(f"Loaded custom prompt from {prompt_file_path}")
567+
except FileNotFoundError:
568+
logger.error(f"Prompt file '{prompt_file_path}' does not exist")
569+
sys.exit(1)
570+
except UnicodeDecodeError as e:
571+
logger.error(f"Cannot read prompt file '{prompt_file_path}': {e}")
572+
sys.exit(1)
573+
except (OSError, PermissionError) as e:
574+
logger.error(f"Failed to load prompt file '{prompt_file_path}': {e}")
575+
sys.exit(1)
576+
else:
577+
# Inline prompt
578+
custom_prompt = args.json_prompt
579+
580+
# Store processed prompt back to args
581+
args.json_prompt = custom_prompt
582+
583+
# Process JSON schema if provided
584+
json_schema = None
585+
original_json_schema = getattr(args, 'json_schema', None)
586+
if args.json_schema:
587+
if args.json_schema.startswith("@"):
588+
# File-based schema loading
589+
schema_file_path = args.json_schema[1:] # Remove @ prefix
590+
try:
591+
with open(schema_file_path, 'r') as f:
592+
json_schema = json.load(f)
593+
# Basic validation that it's a valid JSON schema structure
594+
if not isinstance(json_schema, dict):
595+
logger.error("JSON schema must be a JSON object")
596+
sys.exit(1)
597+
logger.info(f"Loaded JSON schema from {schema_file_path}")
598+
except FileNotFoundError:
599+
logger.error(f"JSON schema file '{schema_file_path}' does not exist")
600+
sys.exit(1)
601+
except (OSError, PermissionError) as e:
602+
logger.error(f"Failed to load JSON schema file '{schema_file_path}': {e}")
603+
sys.exit(1)
604+
except json.JSONDecodeError as e:
605+
logger.error(f"Invalid JSON in schema file '{schema_file_path}': {e}")
606+
sys.exit(1)
607+
else:
608+
# Inline schema
609+
try:
610+
json_schema = json.loads(args.json_schema)
611+
# Basic validation that it's a valid JSON schema structure
612+
if not isinstance(json_schema, dict):
613+
logger.error("JSON schema must be a JSON object")
614+
sys.exit(1)
615+
logger.info("Loaded inline JSON schema")
616+
except json.JSONDecodeError as e:
617+
logger.error(f"Invalid JSON in inline schema: {e}")
618+
sys.exit(1)
619+
620+
# Store processed schema back to args
621+
args.json_schema = json_schema
622+
623+
# Comprehensive input validation
624+
# 1. Check for contradictory flag combinations
625+
if json_schema and args.json_response:
626+
logger.error("Cannot use both --json-response and --json-schema together")
627+
logger.error("Suggestion: Choose either --json-response or --json-schema")
628+
sys.exit(2)
629+
630+
# 2. Check for schema-dependent flags without schema
631+
if args.include_schema_in_prompt:
632+
if not json_schema:
633+
logger.error("--include-schema-in-prompt requires a JSON schema")
634+
logger.error("Suggestion: Add --json-schema <schema> or --json-schema @file")
635+
sys.exit(3)
636+
637+
# 3. File size warnings (optional)
638+
if original_json_schema and original_json_schema.startswith("@"):
639+
schema_file_path = original_json_schema[1:]
640+
try:
641+
file_size = os.path.getsize(schema_file_path)
642+
if file_size > 1024 * 1024: # 1MB
643+
logger.warning(f"Large schema file ({file_size / (1024*1024):.1f}MB) may impact performance")
644+
except OSError:
645+
pass # File size check is optional
646+
647+
if args.json_prompt and args.json_prompt.startswith("@"):
648+
prompt_file_path = args.json_prompt[1:]
649+
try:
650+
file_size = os.path.getsize(prompt_file_path)
651+
if file_size > 100 * 1024: # 100KB
652+
logger.warning(f"Large prompt file ({file_size / 1024:.1f}KB) may impact performance")
653+
except OSError:
654+
pass # File size check is optional
655+
656+
518657
def parse_args() -> argparse.Namespace:
519658

520659
parser = argparse.ArgumentParser(description="CentML Inference Benchmark")
@@ -630,6 +769,9 @@ def fail(msg: str) -> None:
630769
if args.num_trials > MAX_TRIALS:
631770
logger.warning(f"High num_trials value ({args.num_trials}) may slow down prompt generation")
632771

772+
# Validate JSON-related arguments
773+
validate_json_args(args)
774+
633775
return args
634776

635777

@@ -716,6 +858,10 @@ def run_main(args: argparse.Namespace) -> None:
716858
endpoint = args.endpoint.strip("/")
717859
args.api_url = f"{base_url}/{endpoint}"
718860

861+
# JSON processing and validation handled in parse_args()
862+
custom_prompt = args.json_prompt
863+
json_schema = getattr(args, 'json_schema', None)
864+
719865
client = Client(
720866
args.backend,
721867
args.api_url,
@@ -736,6 +882,11 @@ def run_main(args: argparse.Namespace) -> None:
736882
args.top_p,
737883
args.top_k,
738884
run_id=run_id,
885+
json_response=args.json_response,
886+
custom_prompt=custom_prompt,
887+
disable_thinking=args.disable_thinking,
888+
json_schema=json_schema,
889+
include_schema_in_prompt=getattr(args, 'include_schema_in_prompt', False),
739890
)
740891
# disable verbose output for validation of the endpoint. This is done to avoid confusion on terminal output.
741892
client_verbose_value = client.verbose

tests/data/simple_schema.json

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
{
2+
"type": "object",
3+
"properties": {
4+
"answer": {
5+
"type": "string",
6+
"minLength": 2000,
7+
"description": "The answer to the question"
8+
},
9+
"reasoning": {
10+
"type": "string",
11+
"description": "Explanation of the reasoning"
12+
}
13+
},
14+
"required": ["answer"],
15+
"additionalProperties": false
16+
}

0 commit comments

Comments
 (0)