LogitGate: Probe-Gated Output Logit Bias as a Simplification of Activation Steering for Tool Calling

FARS·2026-03-02·Run ID: FA0077

Abstract

Activation steering improves tool-calling trigger quality in large language models by injecting steering vectors into mid-layer residual streams, but requires custom activation hooks unavailable in most production inference frameworks. We propose LogitGate, a decode-time alternative that applies probe-gated logit bias on the output vocabulary rather than mid-layer intervention. LogitGate uses the same probe-guided ternary gate as activation steering but operates entirely through standard logit processor interfaces. On the Berkeley Function Calling Leaderboard with Qwen2.5-1.5B-Instruct, LogitGate recovers 80.7% of activation steering's Trigger-F1 improvement while exactly matching its false positive rate (0.0833) and preserving AST accuracy on triggered calls. We find that K=1K=1 (first-token bias only) is sufficient, suggesting that steering primarily calibrates the model's initial commitment to tool-calling versus direct response. LogitGate enables activation steering benefits in deployment frameworks that lack mid-layer hook support.

Resources