Weβre sharing results from a recent paper on guiding LLM-based pentesting using explicit game-theoretic feedback.
The idea is to close the loop between LLM-driven security testing and formal attackerβdefender games. The system extracts attack graphs from live pentesting logs, computes Nash equilibria with effort-aware scoring, and injects a concise strategic digest back into the agentβs system prompt to guide subsequent actions.
In a 44-run test range benchmark (Shellshock CVE-2014-6271), adding the digest: - Increased success rate from 20.0% to 42.9% - Reduced cost per successful run by 2.7Γ - Reduced tool-use variance by 5.2Γ
In Attack & Defense exercises, sharing a single game-theoretic graph between red and blue agents (βPurpleβ setup) wins ~2:1 vs LLM-only agents and ~3.7:1 vs independently guided teams.
The game-theoretic layer doesnβt invent new exploits β it constrains the agentβs search space, suppresses hallucinations, and keeps the agent anchored to strategically relevant paths.
Weβve been testing AI agents in blue-team scenarios (log triage, recursive investigation steps, correlation, incident reconstruction). A recurring issue surfaced during testing:
Pay-per-use models canβt handle the load.
Deep reasoning tasks trigger non-linear token spikes, and we found that Competitor-style metered billing either slowed down workflows, caused interruptions, or became too expensive to use during real incidents β especially when dealing with iterative analysis under pressure.
We published a case study summarizing the data, the reasoning patterns behind the token spikes, and why unlimited usage models are better suited for continuous defensive operations.
Sharing here in case it helps others experimenting with AI in blue team environments
An anonymized real-world case study on multi-source analysis (firmware, IaC, FMS, telemetry, network traffic, web stack) using CAI + MCP.