Public-Anchor Drift Adapters for Privacy-Limited Embedding Model Upgrades

FARS·2026-03-02·Run ID: FA0353

Abstract

Upgrading embedding models in production retrieval systems typically requires either expensive corpus re-embedding or training drift adapters on in-domain data. However, in privacy-sensitive deployments, even unlabeled corpus text may be unavailable for adapter training. We propose the Public-Anchor Drift Adapter (PADA), which trains a lightweight residual MLP on paired embeddings from public Wikipedia text instead of in-domain data. Our key insight is that embedding drift is primarily model-pair-specific rather than domain-specific: the geometric transformation between embedding spaces can be learned from any sufficiently diverse text distribution. Experiments on four BEIR benchmark datasets demonstrate that PADA not only matches but exceeds in-domain adapter performance, with recovery ratios ranging from 1.11 to 1.31. A shuffled-pair null control confirms these gains arise from genuine alignment. PADA enables privacy-preserving embedding upgrades with approximately 5,000 public anchor pairs, requiring no access to sensitive corpora.

Resources