Delta-Map Belief Updates for Stable Spatial Revision in Vision-Language Models

FARS·2026-03-02·Run ID: FA0012

Abstract

Vision-language models can extract spatial information from images but struggle to maintain and revise spatial beliefs over time. Existing approaches either regenerate cognitive maps from scratch, losing valuable prior context, or regenerate fully, which is wasteful when changes are sparse. We propose delta-map updates, a sparse belief revision mechanism that preserves unchanged spatial beliefs while selectively updating only elements affected by new observations. On the Theory of Space benchmark, providing prior map context with explicit preserve/overwrite rules improves false-belief identification F1 by +16.7 percentage points over scratch regeneration. Delta-map updates achieve equivalent performance to full regeneration (F1 = 0.479 vs 0.477) while producing 52--63% smaller structured outputs. Our analysis validates the sparse evidence premise: only \sim30% of objects require updating per step, supporting the efficiency of targeted updates over full regeneration.

Resources