Game-theoretic feedback loops for LLM-based pentesting: doubling success rates in test ranges

By: ／u／Obvious-Language4462

We’re sharing results from a recent paper on guiding LLM-based pentesting using explicit game-theoretic feedback.

The idea is to close the loop between LLM-driven security testing and formal attacker–defender games. The system extracts attack graphs from live pentesting logs, computes Nash equilibria with effort-aware scoring, and injects a concise strategic digest back into the agent’s system prompt to guide subsequent actions.

In a 44-run test range benchmark (Shellshock CVE-2014-6271), adding the digest: - Increased success rate from 20.0% to 42.9% - Reduced cost per successful run by 2.7× - Reduced tool-use variance by 5.2×

In Attack & Defense exercises, sharing a single game-theoretic graph between red and blue agents (“Purple” setup) wins ~2:1 vs LLM-only agents and ~3.7:1 vs independently guided teams.

The game-theoretic layer doesn’t invent new exploits — it constrains the agent’s search space, suppresses hallucinations, and keeps the agent anchored to strategically relevant paths.

PDF: https://arxiv.org/pdf/2601.05887

Code: https://github.com/aliasrobotics/cai

submitted by /u/Obvious-Language4462
[link] [comments]

🏷️ My labels
- ❌
Article tags
- ❌
- r/netsec
January 12^th 2026 at 18:16

Before yesterday/r/netsec - Information Security News & Discussion

/r/netsec - Information Security News & Discussion
Empirical Analysis: Non-Linear Token Consumption in AI Security Agents
December 11^th 2025 at 17:11

Empirical Analysis: Non-Linear Token Consumption in AI Security Agents

By: ／u／Obvious-Language4462

We’ve been testing AI agents in blue-team scenarios (log triage, recursive investigation steps, correlation, incident reconstruction). A recurring issue surfaced during testing:

Pay-per-use models can’t handle the load.

Deep reasoning tasks trigger non-linear token spikes, and we found that Competitor-style metered billing either slowed down workflows, caused interruptions, or became too expensive to use during real incidents — especially when dealing with iterative analysis under pressure.

We published a case study summarizing the data, the reasoning patterns behind the token spikes, and why unlimited usage models are better suited for continuous defensive operations.

Sharing here in case it helps others experimenting with AI in blue team environments

submitted by /u/Obvious-Language4462
[link] [comments]

🏷️ My labels
- ❌
Article tags
- ❌
- r/netsec
December 11^th 2025 at 17:11

/r/netsec - Information Security News & Discussion
Anonymized case study: autonomous security assessment of a 500-AMR fleet using AI + MCP
November 28^th 2025 at 14:12

Anonymized case study: autonomous security assessment of a 500-AMR fleet using AI + MCP

By: ／u／Obvious-Language4462

An anonymized real-world case study on multi-source analysis (firmware, IaC, FMS, telemetry, network traffic, web stack) using CAI + MCP.

submitted by /u/Obvious-Language4462
[link] [comments]

🏷️ My labels
- ❌
Article tags
- ❌
- r/netsec
November 28^th 2025 at 14:12

FreshRSS

Game-theoretic feedback loops for LLM-based pentesting: doubling success rates in test ranges

Empirical Analysis: Non-Linear Token Consumption in AI Security Agents

Anonymized case study: autonomous security assessment of a 500-AMR fleet using AI + MCP