GradRatio-Select: Gradient-Based Layer Selection for Fine-Tuning Model Editing
Abstract
Fine-tuning-based model editing updates specific factual associations by optimizing a single transformer layer, but selecting which layer to modify remains an open challenge. Current practice relies on model-specific heuristics determined through expensive layer sweeps. We propose GradRatio-Select, a gradient-based method that automatically identifies editable layers by computing the ratio of edit-to-retain gradient magnitudes: . An adaptive threshold excludes structurally critical early layers that would cause capability catastrophe. On Qwen2.5-7B, GradRatio-Select identifies the same optimal layer as manual heuristics, achieving equivalent performance (Capability 54.59 vs 54.61). On LLaMA-3-8B, it selects an adjacent layer but shows 5.26 percentage point capability degradation (39.52 vs 44.78), primarily due to mathematical reasoning tasks. Our findings suggest that gradient-based selection can automate layer identification but does not improve upon carefully-tuned heuristics.