rollout_probs_diff metric calcu bug

### Description

in agent_ppo_train.py

`rollout_probs_diff = calculate_log_prob_diff(actor_probs, rollout_probs, response_mask_bool),`
 while mask is：
                             ```
   attention_mask = batch.batch["attention_mask"]
                                responses = batch.batch["responses"]
                                response_length = responses.size(1)
                                response_mask = attention_mask[:, -response_length:]
```
In agentic rl，response part with tool call should be masked. If using attention mask[:-response_len], the metric would be inaccurate

### Steps to Reproduce

in agent_ppo_train.py

### Error Output / Traceback

```shell

```

### rLLM Version

0.2.1post

### Training Backend

verl

### Python Version

3.9

### GPU / CUDA Version

_No response_

### vLLM Version (if applicable)

_No response_

### Training Script / Config

```shell

```

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rollout_probs_diff metric calcu bug #497

Description

rLLM Version

Training Backend

Python Version

GPU / CUDA Version

vLLM Version (if applicable)

Training Script / Config

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

rollout_probs_diff metric calcu bug #497

Description

Description

rLLM Version

Training Backend

Python Version

GPU / CUDA Version

vLLM Version (if applicable)

Training Script / Config

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions