Poisoning LLM-Induced Rule Repositories via Indirect Prompt Injection
Abstract
LLM-based log parsing systems such as LogRules separate parsing into induction (rule generation from examples) and deduction (rule application to new logs). While this architecture achieves strong performance, the induction stage creates a new attack surface: an attacker who can influence a small number of induction examples may manipulate the generated rules through indirect prompt injection. We present the first systematic study of such induction-stage poisoning attacks. We design three payload formats and evaluate their effectiveness across four poisoning budgets on three benchmark datasets. Our instruction-style payload achieves up to 15.1 percentage points parsing accuracy degradation with only 7 poisoned examples out of 10, with effectiveness scaling monotonically with budget. We propose a canary-based admission control defense that detects 42.6% of poisoned configurations overall, achieving 61.1% detection with 40% accuracy recovery on Linux, but exhibiting dataset-dependent failure modes including false acceptance on BGL and limited recovery on HDFS. Our findings highlight the need for robust defenses in LLM-based log analysis pipelines.