RLAIF with custom score function and local model #5312

nnp02 · 2026-02-13T06:56:17Z

nnp02
Feb 13, 2026

Specs: 1 node, 4 A100 (40 GB) GPUs on GCP VM
VERL version: 0.8.0dev0
vLLM version: 0.14.0
Policy Model: Qwen2.5-3B-Instruct
Reward Model: Qwen2.5-7B-Instruct

I have the genRM on my local machine and have similar generate_aiohttp as verl-recipe/fapo/reward_fn_reasoning.py with request_url = f"http://{router_address}/v1/chat/completions" and an appropriate payload but that gave me the below error message.

ERROR:2026-02-13 03:46:45,605:HTTP error for generate: 404, message='Not Found', url='http://10.128.15.206:34341/v1/chat/completions'

My shell params associated with the reward model are as follows:

reward_model.enable=True \
reward_model.use_reward_loop=True \
reward_model.model.path=$RM_PATH \
reward_model.model.input_tokenizer=$RM_PATH \
reward_model.micro_batch_size_per_gpu=$MICRO_BATCH_SIZE \
reward_model.rollout.name=vllm \
reward_model.rollout.gpu_memory_utilization=0.4 \
reward_model.rollout.tensor_model_parallel_size=$RM_TP_SIZE \
reward_model.rollout.response_length=128 \
+reward_model.rollout.engine_kwargs.n=$SCORE_ROLLOUTS \
custom_reward_function.path=$REWARD_UTIL_PATH \
custom_reward_function.name=compute_score \

Please help me figure out the solution and make genRM work.

Answered by nnp02

Feb 13, 2026

For local LLMs, use the below helper function

async def generate_aiohttp(router_address: str, prompt_text: str, sampling_params: dict):
    models_url = f"http://{router_address}/v1/models"
    completions_url = f"http://{router_address}/v1/completions"
    
    async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=60)) as session:
        try:
            async with session.get(models_url) as model_resp:
                model_resp.raise_for_status()
                models_data = await model_resp.json()
                model_name = models_data["data"][0]["id"]
        except Exception as e:
            logger.error(f"Failed to fetch model: {e}")
            return {}
      …

View full answer

nnp02 · 2026-02-13T22:21:48Z

nnp02
Feb 13, 2026
Author

For local LLMs, use the below helper function

async def generate_aiohttp(router_address: str, prompt_text: str, sampling_params: dict):
    models_url = f"http://{router_address}/v1/models"
    completions_url = f"http://{router_address}/v1/completions"
    
    async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=60)) as session:
        try:
            async with session.get(models_url) as model_resp:
                model_resp.raise_for_status()
                models_data = await model_resp.json()
                model_name = models_data["data"][0]["id"]
        except Exception as e:
            logger.error(f"Failed to fetch model: {e}")
            return {}
            
        payload = {
            "model": model_name,
            "prompt": prompt_text,
            **sampling_params
        }
        
        try:
            async with session.post(completions_url, json=payload) as resp:
                if resp.status != 200:
                    error_text = await resp.text()
                    logger.error(f"vLLM Error ({resp.status}): {error_text}")
                    return {}
                
                return await resp.json()
        except Exception as e:
            logger.error(f"Request failed: {e}")
            return {}

To extract the responses, do choices = grm_outputs.get("choices", []); for choice in choices: content = choice.get("text", "")

Thank you VERL community!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RLAIF with custom score function and local model #5312

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

RLAIF with custom score function and local model #5312

Uh oh!

nnp02 Feb 13, 2026

Replies: 1 comment

Uh oh!

nnp02 Feb 13, 2026 Author

nnp02
Feb 13, 2026

nnp02
Feb 13, 2026
Author