Auditing HNSW Index Leakage: Recovering Embedding Geometry from Graph Topology
Abstract
Hierarchical Navigable Small World (HNSW) graphs are the dominant indexing structure for approximate nearest neighbor search in vector databases and retrieval-augmented generation systems. While stored vectors are typically treated as sensitive, index files are often considered harmless metadata. We challenge this assumption by demonstrating that HNSW topology leaks geometric information beyond what is explicitly present in adjacency lists. We propose degree-penalized geodesic embedding, which counteracts small-world shortcut distortion by penalizing paths through high-degree hub nodes, then applies landmark multidimensional scaling to reconstruct approximate coordinates. On high-dimensional text embeddings (MSMARCO-10K, 768-d), our method achieves 28% improvement in kNN recovery over adjacency-only baselines (Recall@10: 0.4164 vs.\ 0.3248). The attack is dimension-dependent---effective on high-dimensional embeddings typical of modern RAG systems but not on low-dimensional data---identifying the regime where topology leakage is most concerning. Our findings suggest that HNSW index files should be treated as sensitive artifacts.