Query regarding potential overfitting or observation-ignoring behavior in World Model

During fine-tuning, the world model's loss approaches 0 and accuracy approaches 1.0 very quickly (after approximately 2000 steps). In inference, the fine-tuned model achieves a success rate of around 98% on the Spatial task. 

To verify what the model is actually learning, I performed a sanity check by feeding fake data (masking the visual input / using dummy values) into the world model. Surprisingly, the accuracy remained unchanged (~98%). This suggests that the model might be completely ignoring the visual input conditioning and relying on other signals.

Could this behavior be related to the small per-device batch size (e.g., BatchNorm statistics issues), or is this an expected phenomenon for this model?
Experimental Setup: I am fine-tuning the model using the following configuration:
  - Hardware: 4x GPUs (48GB VRAM each).
  - Batch Size: Per-device batch size = 1, with gradient accumulation steps = 5.

Any insights would be appreciated.

<img width="528" height="485" alt="Image" src="https://github.com/user-attachments/assets/ee7c6422-37a1-4ff3-862b-e800a780ba01" />

<img width="516" height="489" alt="Image" src="https://github.com/user-attachments/assets/af642b2c-2eaf-4baf-be19-a50ea96494ea" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query regarding potential overfitting or observation-ignoring behavior in World Model #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Query regarding potential overfitting or observation-ignoring behavior in World Model #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions