/r/netsec - Information Security News & Discussion

[Research] Unvalidated Trust: Cross-Stage Failure Modes in LLM/agent pipelines arXiv

By: ／u／Solid-Tomorrow6548 — November 3^rd 2025 at 23:59

The paper analyzes trust between stages in LLM and agent toolchains. If intermediate representations are accepted without verification, models may treat structure and format as implicit instructions, even when no explicit imperative appears. I document 41 mechanism level failure modes.

Scope

Text-only prompts, provider-default settings, fresh sessions.
No tools, code execution, or external actions.
Focus is architectural risk, not operational attack recipes.

Selected findings

§8.4 Form-Induced Safety Deviation: Aesthetics/format (e.g., poetic layout) can dominate semantics -> the model emits code with harmful side-effects despite safety filters, because form is misinterpreted as intent.
§8.21 Implicit Command via Structural Affordance: Structured input (tables/DSL-like blocks) can be interpreted as a command without explicit verbs (“run/execute”), leading to code generation consistent with the structure.
§8.27 Session-Scoped Rule Persistence: Benign-looking phrasing can seed a latent session rule that re-activates several turns later via a harmless trigger, altering later decisions.
§8.18 Data-as-Command: Fields in data blobs (e.g., config-style keys) are sometimes treated as actionable directives -> the model synthesizes code that implements them.

Mitigations (paper §10)

Stage-wise validation of model outputs (semantic + policy checks) before hand-off.
Representation hygiene: normalize/label formats to avoid “format -> intent” leakage.
Session scoping: explicit lifetimes for rules and for the memory
Data/command separation: schema aware guards

Limitations

Text-only setup; no tools or code execution.
Model behavior is time dependent. Results generalize by mechanism, not by vendor.

submitted by /u/Solid-Tomorrow6548
[link] [comments]

FreshRSS

[Research] Unvalidated Trust: Cross-Stage Failure Modes in LLM/agent pipelines arXiv