Skip to content

Commit 11fc254

Browse files
zhangdw156Dawei DW12 Zhang
andauthored
[BFCL] Support custom Base URL, API Key, and Tokenizer for remote OpenAI-compatible endpoints #1280 (#1281)
**Description** This PR addresses limitations when evaluating models served via remote OpenAI-compatible endpoints (e.g., vLLM deployed on cloud GPU clusters, RunPod, or behind enterprise gateways). Previously, the handler assumed a rigid host:port structure and lacked authentication support. Additionally, when connecting to a remote endpoint, the handler would fail to load the tokenizer if it tried to access a path that only exists on the remote server. **Key Changes** Custom Authentication & Routing: Added support for REMOTE_OPENAI_BASE_URL to allow full control over the endpoint URL (resolves issues with SSL/HTTPS and custom sub-paths). Added support for REMOTE_OPENAI_API_KEY to enable authentication for secured endpoints. Remote Tokenizer Support: Added REMOTE_OPENAI_TOKENIZER_PATH. Since the OSSHandler needs to load the tokenizer locally for prompt formatting, this variable allows users to point to a local Hugging Face path or model ID, preventing OSError when the handler tries to load a non-existent local path derived from the remote server configuration. Documentation: Updated .env.example and README.md to document these new configuration options. **Related Issue** Fixes #1280 **Type of Change** - [x] New feature (non-breaking change which adds functionality) - [x] Documentation update **Checklist** - [x] My code follows the style guidelines of this project - [x] I have performed a self-review of my own code - [x] I have updated the documentation accordingly - [x] Existing local server setups remain backward compatible --------- Co-authored-by: Dawei DW12 Zhang <zhangdw12@Lenovo.com> Co-authored-by: zhangdw <zhangdw.cs@gmail.com>
1 parent 9b8a520 commit 11fc254

File tree

3 files changed

+42
-5
lines changed

3 files changed

+42
-5
lines changed

berkeley-function-call-leaderboard/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,14 @@ LOCAL_SERVER_ENDPOINT=localhost
245245
LOCAL_SERVER_PORT=1053
246246
```
247247

248+
For remote deployments (e.g., via RunPod, ngrok, or enterprise gateways) that require custom authentication or use non-standard base URLs, you can specify a full base URL and API key:
249+
250+
```bash
251+
REMOTE_OPENAI_BASE_URL=https://your-vllm-server.com/v1
252+
REMOTE_OPENAI_API_KEY=your-api-key-here
253+
REMOTE_OPENAI_TOKENIZER_PATH=/path/to/local/tokenizer # Optional: specify local tokenizer for local/remote endpoints
254+
```
255+
248256
#### (Alternate) Script Execution for Generation
249257

250258
For those who prefer using script execution instead of the CLI, you can run the following command:

berkeley-function-call-leaderboard/bfcl_eval/.env.example

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,5 +44,11 @@ NOVITA_API_KEY=sk-XXXXXX
4444
LOCAL_SERVER_ENDPOINT=localhost
4545
LOCAL_SERVER_PORT=1053
4646

47+
# [OPTIONAL] For custom local/remote OpenAI-compatible server configuration (e.g., vLLM deployments)
48+
# These allow custom base URL and API key for OpenAI-compatible endpoints
49+
# REMOTE_OPENAI_BASE_URL=https://your-vllm-server.com/v1
50+
# REMOTE_OPENAI_API_KEY=your-api-key-here
51+
# REMOTE_OPENAI_TOKENIZER_PATH=/path/to/local/tokenizer # Optional: specify local tokenizer for local/remote endpoints
52+
4753
# [OPTIONAL] For WandB to log the generated .csv in the format 'entity:project
48-
WANDB_BFCL_PROJECT=ENTITY:PROJECT
54+
WANDB_BFCL_PROJECT=ENTITY:PROJECT

berkeley-function-call-leaderboard/bfcl_eval/model_handler/local_inference/base_oss_handler.py

Lines changed: 27 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,11 @@ def __init__(
4242
self.local_server_endpoint = os.getenv("LOCAL_SERVER_ENDPOINT", "localhost")
4343
self.local_server_port = os.getenv("LOCAL_SERVER_PORT", LOCAL_SERVER_PORT)
4444

45-
self.base_url = f"http://{self.local_server_endpoint}:{self.local_server_port}/v1"
46-
self.client = OpenAI(base_url=self.base_url, api_key="EMPTY")
45+
# Support custom base_url and api_key for remote/local OpenAI-compatible deployments (e.g., vLLM)
46+
# Use REMOTE_OPENAI_* variables to avoid conflicts with main OPENAI_* variables
47+
self.base_url = os.getenv("REMOTE_OPENAI_BASE_URL", f"http://{self.local_server_endpoint}:{self.local_server_port}/v1")
48+
self.api_key = os.getenv("REMOTE_OPENAI_API_KEY", "EMPTY")
49+
self.client = OpenAI(base_url=self.base_url, api_key=self.api_key)
4750

4851
@override
4952
def inference(
@@ -111,8 +114,28 @@ def spin_up_local_server(
111114
"trust_remote_code": True,
112115
}
113116

114-
self.tokenizer = AutoTokenizer.from_pretrained(**load_kwargs)
115-
config = AutoConfig.from_pretrained(**load_kwargs)
117+
# For remote OpenAI-compatible endpoints, use specified tokenizer path if provided
118+
is_remote_endpoint = bool(os.getenv("REMOTE_OPENAI_BASE_URL"))
119+
tokenizer_path = os.getenv("REMOTE_OPENAI_TOKENIZER_PATH", self.model_path_or_id)
120+
121+
if is_remote_endpoint and os.getenv("REMOTE_OPENAI_TOKENIZER_PATH"):
122+
# Use specified tokenizer for remote endpoints
123+
tokenizer_kwargs = {
124+
"pretrained_model_name_or_path": tokenizer_path,
125+
"trust_remote_code": True,
126+
}
127+
try:
128+
self.tokenizer = AutoTokenizer.from_pretrained(**tokenizer_kwargs)
129+
config = AutoConfig.from_pretrained(**tokenizer_kwargs)
130+
print(f"Loaded tokenizer from REMOTE_OPENAI_TOKENIZER_PATH: {tokenizer_path}")
131+
except Exception as e:
132+
print(f"Failed to load tokenizer from {tokenizer_path}, falling back to model path: {e}")
133+
self.tokenizer = AutoTokenizer.from_pretrained(**load_kwargs)
134+
config = AutoConfig.from_pretrained(**load_kwargs)
135+
else:
136+
# Standard loading for local models or when no specific tokenizer path is provided
137+
self.tokenizer = AutoTokenizer.from_pretrained(**load_kwargs)
138+
config = AutoConfig.from_pretrained(**load_kwargs)
116139

117140
if hasattr(config, "max_position_embeddings"):
118141
self.max_context_length = config.max_position_embeddings

0 commit comments

Comments
 (0)