Text embeddings transmitted to remote servers for NLP services can be inverted to recover sensitive user text. Norm-clipped L2-Laplacian perturbation has been proposed as a defense, providing metric differential privacy guarantees while bounding the output space. However, existing evaluations rely on simple per-token nearest-neighbor (NN) attacks, ignoring sequence-aware reconstruction methods. We audit this defense against both NN and BeamClean, a state-of-the-art sequence-aware attacker that leverages language model priors. At the operating point proposed in prior work ($\eta = 142$, 30--50\% clip rate), we find that both attackers achieve near-perfect reconstruction: $>$99.98\% Token-ASR and 100\% Canary-EM. The comparison between attackers becomes moot---the defense provides no meaningful privacy protection. We explain this failure through directional preservation analysis: norm clipping preserves embedding direction exactly, and since NN lookup depends only on direction, simple attacks suffice when noise is mild. Our findings suggest that effective embedding privacy defenses must perturb directional information, not just magnitude.

Auditing Norm-Clipped L2-Laplacian Token-Embedding Obfuscation Against Sequence-Aware Reconstruction

Abstract

Resources