[BFCL] Support custom Base URL, API Key, and Tokenizer for remote OpenAI-compatible endpoints #1280 (#1281)

zhangdw156 · Dawei DW12 Zhang · web-flow · commit 11fc254b5cd9 · 2026-01-17T00:02:34.000+08:00
**Description** This PR addresses limitations when evaluating models served via remote OpenAI-compatible endpoints (e.g., vLLM deployed on cloud GPU clusters, RunPod, or behind enterprise gateways). Previously, the handler assumed a rigid host:port structure and lacked authentication support. Additionally, when connecting to a remote endpoint, the handler would fail to load the tokenizer if it tried to access a path that only exists on the remote server. **Key Changes** Custom Authentication & Routing: Added support for REMOTE_OPENAI_BASE_URL to allow full control over the endpoint URL (resolves issues with SSL/HTTPS and custom sub-paths). Added support for REMOTE_OPENAI_API_KEY to enable authentication for secured endpoints. Remote Tokenizer Support: Added REMOTE_OPENAI_TOKENIZER_PATH. Since the OSSHandler needs to load the tokenizer locally for prompt formatting, this variable allows users to point to a local Hugging Face path or model ID, preventing OSError when the handler tries to load a non-existent local path derived from the remote server configuration. Documentation: Updated .env.example and README.md to document these new configuration options. **Related Issue** Fixes #1280 **Type of Change** - [x] New feature (non-breaking change which adds functionality) - [x] Documentation update **Checklist** - [x] My code follows the style guidelines of this project - [x] I have performed a self-review of my own code - [x] I have updated the documentation accordingly - [x] Existing local server setups remain backward compatible --------- Co-authored-by: Dawei DW12 Zhang <zhangdw12@Lenovo.com> Co-authored-by: zhangdw <zhangdw.cs@gmail.com>
diff --git a/berkeley-function-call-leaderboard/README.md b/berkeley-function-call-leaderboard/README.md
@@ -245,6 +245,14 @@ LOCAL_SERVER_ENDPOINT=localhost
 LOCAL_SERVER_PORT=1053
 ```
 
+For remote deployments (e.g., via RunPod, ngrok, or enterprise gateways) that require custom authentication or use non-standard base URLs, you can specify a full base URL and API key:
+
+```bash
+REMOTE_OPENAI_BASE_URL=https://your-vllm-server.com/v1
+REMOTE_OPENAI_API_KEY=your-api-key-here
+REMOTE_OPENAI_TOKENIZER_PATH=/path/to/local/tokenizer  # Optional: specify local tokenizer for local/remote endpoints
+```
+
 #### (Alternate) Script Execution for Generation
 
 For those who prefer using script execution instead of the CLI, you can run the following command:
diff --git a/berkeley-function-call-leaderboard/bfcl_eval/.env.example b/berkeley-function-call-leaderboard/bfcl_eval/.env.example
@@ -44,5 +44,11 @@ NOVITA_API_KEY=sk-XXXXXX
 LOCAL_SERVER_ENDPOINT=localhost
 LOCAL_SERVER_PORT=1053
 
+# [OPTIONAL] For custom local/remote OpenAI-compatible server configuration (e.g., vLLM deployments)
+# These allow custom base URL and API key for OpenAI-compatible endpoints
+# REMOTE_OPENAI_BASE_URL=https://your-vllm-server.com/v1
+# REMOTE_OPENAI_API_KEY=your-api-key-here
+# REMOTE_OPENAI_TOKENIZER_PATH=/path/to/local/tokenizer  # Optional: specify local tokenizer for local/remote endpoints
+
 # [OPTIONAL] For WandB to log the generated .csv in the format 'entity:project
-WANDB_BFCL_PROJECT=ENTITY:PROJECT
+WANDB_BFCL_PROJECT=ENTITY:PROJECT
diff --git a/berkeley-function-call-leaderboard/bfcl_eval/model_handler/local_inference/base_oss_handler.py b/berkeley-function-call-leaderboard/bfcl_eval/model_handler/local_inference/base_oss_handler.py
@@ -42,8 +42,11 @@ def __init__(
         self.local_server_endpoint = os.getenv("LOCAL_SERVER_ENDPOINT", "localhost")
         self.local_server_port = os.getenv("LOCAL_SERVER_PORT", LOCAL_SERVER_PORT)
 
-        self.base_url = f"http://{self.local_server_endpoint}:{self.local_server_port}/v1"
-        self.client = OpenAI(base_url=self.base_url, api_key="EMPTY")
+        # Support custom base_url and api_key for remote/local OpenAI-compatible deployments (e.g., vLLM)
+        # Use REMOTE_OPENAI_* variables to avoid conflicts with main OPENAI_* variables
+        self.base_url = os.getenv("REMOTE_OPENAI_BASE_URL", f"http://{self.local_server_endpoint}:{self.local_server_port}/v1")
+        self.api_key = os.getenv("REMOTE_OPENAI_API_KEY", "EMPTY")
+        self.client = OpenAI(base_url=self.base_url, api_key=self.api_key)
 
     @override
     def inference(
@@ -111,8 +114,28 @@ def spin_up_local_server(
                 "trust_remote_code": True,
             }
 
-        self.tokenizer = AutoTokenizer.from_pretrained(**load_kwargs)
-        config = AutoConfig.from_pretrained(**load_kwargs)
+        # For remote OpenAI-compatible endpoints, use specified tokenizer path if provided
+        is_remote_endpoint = bool(os.getenv("REMOTE_OPENAI_BASE_URL"))
+        tokenizer_path = os.getenv("REMOTE_OPENAI_TOKENIZER_PATH", self.model_path_or_id)
+
+        if is_remote_endpoint and os.getenv("REMOTE_OPENAI_TOKENIZER_PATH"):
+            # Use specified tokenizer for remote endpoints
+            tokenizer_kwargs = {
+                "pretrained_model_name_or_path": tokenizer_path,
+                "trust_remote_code": True,
+            }
+            try:
+                self.tokenizer = AutoTokenizer.from_pretrained(**tokenizer_kwargs)
+                config = AutoConfig.from_pretrained(**tokenizer_kwargs)
+                print(f"Loaded tokenizer from REMOTE_OPENAI_TOKENIZER_PATH: {tokenizer_path}")
+            except Exception as e:
+                print(f"Failed to load tokenizer from {tokenizer_path}, falling back to model path: {e}")
+                self.tokenizer = AutoTokenizer.from_pretrained(**load_kwargs)
+                config = AutoConfig.from_pretrained(**load_kwargs)
+        else:
+            # Standard loading for local models or when no specific tokenizer path is provided
+            self.tokenizer = AutoTokenizer.from_pretrained(**load_kwargs)
+            config = AutoConfig.from_pretrained(**load_kwargs)
 
         if hasattr(config, "max_position_embeddings"):
             self.max_context_length = config.max_position_embeddings