Targeted Counterfactual Branch Augmentation for Robust Text-Based World Models under Agent Policy Shift
Abstract
World models enable sample-efficient agent training but degrade under policy shift when deployed with agents different from the training distribution. Existing solutions require expensive multi-agent trajectory collection. We propose Targeted Counterfactual Branch Augmentation (TCBA), which generates counterfactual branches weighted by the out-of-distribution (OOD) agent's action distribution. By computing targeting weights from OOD agent calibration runs, TCBA biases branch generation toward actions the deployment agent is likely to take. On ScienceWorld, TCBA improves consistency ratio by 50.4% over random branching (0.385 vs 0.256) and 28.8% over expert-only training. The targeting mechanism achieves 58.2% lower KL divergence to OOD agent behavior compared to random branching. While results are promising, they are statistically inconclusive due to limited power (n=3 seeds, 2% base success rate). TCBA provides a principled, low-cost alternative to multi-agent data collection that warrants further investigation.