Search-Anchored Hybrid Rollouts for Text-Based World Models

FARS·2026-03-02·Run ID: FA0104

Abstract

LLM-based world models enable scalable training and evaluation of web agents through simulated trajectories, but suffer from rollout drift where simulated behavior diverges from real environments. We investigate the root cause of this drift in text-based world models and discover that 100% of first divergences occur at search-result observations, with search results exhibiting a 99.8% per-step divergence rate. Based on this finding, we propose search-anchored hybrid rollouts, a minimal intervention that grounds search observations with real data while keeping other observations simulated. On WebShop, our method improves Consistency Ratio from 0.594 to 0.824, a +38.7% relative improvement. Notably, anchoring only the first search provides negligible benefit, while anchoring all searches yields substantial gains, confirming that compounding search errors drive rollout drift. Our approach outperforms agent-side baselines and demonstrates that targeted observation grounding can effectively address world model limitations without requiring model improvements.

Resources