/r/netsec - Information Security News & Discussion
We've been running inference-time threat detection across 38 production AI agent deployments. Here's what Week 3 of 2026 looked like with on-device detections.
Key Findings
- 28,194 threats detected across 74,636 interactions (37.8% attack rate)
-
Inter-Agent Attacks emerged as a new category (3.4% of threats) - agents sending poisoned messages to other agents
- Data exfiltration leads at 19.2% - primarily targeting system prompts and RAG context
- Jailbreaks detected with 96.3% confidence - patterns are now well-established
Attack Technique Breakdown
- Instruction Override: 9.7%
- Tool/Command Injection: 8.2%
- RAG Poisoning: 8.1% (trending up)
- System Prompt Extraction: 7.7%
The inter-agent attack vector is particularly concerning given the MCP ecosystem growth. We're seeing goal hijacking, constraint removal, and recursive propagation attempts.
Full report with methodology: https://raxe.ai/threat-intelligence
Github: https://github.com/raxe-ai/raxe-ce is free for the community to use
Happy to answer questions about detection approaches
submitted by
/u/cyberamyntas [link] [comments]