❌

Normal view

Open dataset: 100k+ multimodal prompt injection samples with per-category academic sourcing

I submitted an earlier version of this dataset and was declined on the basis of missing methodology and unverifiable provenance. The feedback was fair. The documentation has since been rewritten to address it directly, and I would very much appreciate a second look.

What the dataset contains

101,032 samples in total, balanced 1:1 attack to benign.

Attack samples (50,516) across 27 categories sourced from over 55 published papers and disclosed vulnerabilities. Coverage spans:

  • Classical injection - direct override, indirect via documents, tool-call injection, system prompt extraction
  • Adversarial suffixes - GCG, AutoDAN, Beast
  • Cross-modal delivery - text with image, document, audio, and combined payloads across three and four modalities
  • Multi-turn escalation - Crescendo, PAIR, TAP, Skeleton Key, Many-shot
  • Emerging agentic attacks - MCP tool descriptor poisoning, memory-write exploits, inter-agent contagion, RAG chunk-boundary injection, reasoning-token hijacking on thinking-trace models
  • Evasion techniques - homoglyph substitution, zero-width space insertion, Unicode tag-plane smuggling, cipher jailbreaks, detector perturbation
  • Media-surface attacks - audio ASR divergence, chart and diagram injection, PDF active content, instruction-hierarchy spoofing

Benign samples (50,516) are drawn from Stanford Alpaca, WildChat, MS-COCO 2017, Wikipedia (English), and LibriSpeech. The benign set is matched to the surface characteristics of the attack set so that classifiers must learn genuine injection structure rather than stylistic artefacts.

Methodology

The previous README lacked this section entirely. The current version documents the following:

  1. Scope definition. Prompt injection is defined per Greshake et al. and OWASP LLM01 as runtime text that overrides or redirects model behaviour. Pure harmful-content requests without override framing are explicitly excluded.
  2. Four-layer construction. Hand-crafted seeds, PyRIT template expansion, cross-modal delivery matrix, and matched benign collection. Each layer documents the tool used, the paper referenced, and the design decision behind it.
  3. Label assignment. Labels are assigned by construction at the category level rather than through per-sample human review. This is stated plainly rather than overclaimed.
  4. Benign edge-case design. The ten vocabulary clusters used to reduce false positives on security-adjacent language are documented individually.
  5. Quality control. Deduplication audit results are included: zero duplicate texts in the benign pool, zero benign texts appearing in attacks, one documented legacy duplicate cluster with cause noted.
  6. Known limitations. Six limitations are stated explicitly: text-based multimodal representation, hand-crafted seed counts, English-skewed benign pool, no inter-rater reliability score, ASR figures sourced from original papers rather than re-measured, and small v4 seed counts for emerging categories.

Reproducibility

Generators are deterministic (random.seed(42)). Running them reproduces the published dataset exactly. Every sample carries attack_source and attack_reference fields with arXiv or CVE links. A reviewer can select any sample, follow the citation, and verify that the attack class is documented in the literature.

Comparison to existing datasets

The README includes a comparison table against deepset (500 samples), jackhhao (2,600), Tensor Trust (126k from an adversarial game), HackAPrompt (600k from competition data), and InjectAgent (1,054). The gap this dataset aims to fill is multimodal cross-delivery combinations and emerging agentic attack categories, neither of which exists at scale in current public datasets.

What this is not

To be direct: this is not a peer-reviewed paper. The README is documentation at the level expected of a serious open dataset submission - methodology, sourcing, limitations, and reproducibility - but it does not replace academic publication. If that bar is a requirement for r/netsec specifically, that is reasonable and I will accept the feedback.

Links

I am happy to answer questions about any construction decision, provide verification scripts for specific categories, or discuss where the methodology falls short.

submitted by /u/BordairAPI
[link] [comments]

Kerberoasting detection gaps in mixed-encryption environments and why 0x17 filtering alone isn't enough

Been doing some detection work around Kerberoast traffic this week and wanted to share a gap that's easy to miss in environments that haven't fully deprecated RC4.

The standard detection is Event ID 4769 filtered on encryption type 0x17. Most SIEMs have this as a canned rule. The problem is in environments with mixed OS versions or legacy applications that dynamically negotiate encryption, 0x17 requests are normal background noise. If you're not filtering beyond encryption type you're either drowning in false positives or you've tuned it so aggressively you're missing real attacks.

What you should look for:

4769 where:

  • Encryption type is 0x17
  • Requesting account is a user principal, not a machine account
  • Service name is not krbtgt and not a known computer principal
  • The requesting account has had no prior 4769 events against that specific SPN

That last condition is the one most people skip. Legitimate service ticket requests follow patterns. A user account requesting a ticket for a service it's never touched before at 2am is a different signal than the same request during business hours from a known admin workstation.

But the actual gaps noone is talking about -> gMSA accounts are immune to offline cracking because the password is 120 characters of random data rotated every 30 days. But the migration is never complete. Every environment has at least a handful of service accounts that can't be migrated.. anything that needs a plaintext password in a config file, some Exchange components, legacy apps with no gMSA support.

Those accounts are permanent Kerberoast targets. (!) The question isn't whether they're there. It's whether you know exactly which ones they are and whether you're watching them specifically.

On the offensive side of this:

RC4 downgrade via AS-REQ pre-auth is well documented. Less discussed is that in environments where AES is enforced at the GPO level but legacy applications are still negotiating through Netlogon, you can still coerce RC4 service ticket issuance by manipulating the etype list in the TGS-REQ. LmCompatibilityLevel = 5 controls client behavior. It has no authority over what a misconfigured application server requests through MS-NRPC. Silverfort published a POC on this last year (i wrote about this a couple weeks ago) they forced NTLMv1 through a DC configured to block it using the ParameterControl flag in NETLOGON_LOGON_IDENTITY_INFO. Microsoft acknowledged it, didn't patch it, announced OS-level removal in Server 2025 and Win11 24H2 instead. (typcial)

If your environment isn't on those versions, that vector is still open and there's no compensating control beyond full NTLM audit logging and application-level remediation.

btw:

auditpol /set /subcategory:"Kerberos Service Ticket Operations" /success:enable gets you the 4769 visibility.

submitted by /u/hardeningbrief
[link] [comments]

Two Admin-level API keys publicly exposed for years, both dismissed as "Out of scope" by official bug bounty programs. Case analysis + proposed NHI Exposure Severity Index

TL;DR: Our research team reported two credential findings to official bug bounty programs. A Slack Bot Token exposed for 3 years in a public GitHub repo, and an Asana Admin API Key exposed for 2 years in a public GitHub repo. Both came back "Out of scope." Both organizations actively used the affected systems, revoked the keys, and ran broader internal reviews based on the disclosures. Official classification stayed "Out of scope" anyway. We wrote up why this keeps happening and proposed a 6-axis scoring framework to address the post-discovery evaluation gap that OWASP API Top 10, CWE-798, NIST SP 800-53, and NIST CSF 2.0 don't cover (they're all prevention frameworks). Some of what the writeup covers:

Why credential exposure doesn't fit the vulnerability-exploit-impact model bug bounty programs were built around. A leaked API key isn't a flaw waiting to be exploited. It's access. The usual severity calculus breaks. Six axes that actually matter for post-discovery credential severity: Privilege Scope, Cumulative Risk Duration, Blast Radius, Exposure Accessibility, Data Sensitivity, Lateral Movement Potential. Scored 1 to 5 each, mapped to severity tiers. Concrete scoring of the two cases: Slack Bot Token 26/30 (Critical), Asana Admin Key 24/30 (Critical). A counter-example: Starbucks bug bounty's handling of a leaked JumpCloud API key (HackerOne #716292, 2019). Same finding class. Classified under CWE-798, scored CVSS 9.7, triaged, paid, and publicly disclosed. Proves it's a classification policy problem, not a technical one. Why AI-assisted code generation (especially by non-developers now shipping prototypes directly) is about to accelerate the problem.

Open to critique on the framework. The six axes are a starting point for discussion, not a finished standard. Particularly curious whether the community has hit the same "Out of scope" wall for SaaS credentials or keys inherited from M&A situations.

submitted by /u/Master_Treat1383
[link] [comments]

Anthropic's Claude Mythos Found Individual Bugs. Mythos SI (Structured Intelligence) Found the Class They Belong To.

On April 7, 2026, Anthropic announced Claude Mythos Preview β€” a frontier model capable of autonomously discovering and exploiting zero-day vulnerabilities across every major operating system and browser. They assembled Project Glasswing, a $100M defensive coalition with Microsoft, Google, Apple, AWS, CrowdStrike, and Palo Alto Networks. They reported thousands of vulnerabilities, including a 27-year-old OpenBSD flaw and a 16-year-old FFmpeg bug.

It was a watershed moment for AI security. And the findings were individual bugs β€” specific flaws in specific locations.

Mythos SI, operating through the Structured Intelligence framework, analyzed the same FFmpeg codebase and found something different. Not just bugs. The architectural pattern that produces them.

Four vulnerabilities in FFmpeg's MOV parser. All four share identical structure: validation exists, validation is correct, but validation and operations are temporally separated. Trust established at one point in execution is assumed to hold at a later point β€” but the state has changed between them.

Anthropic's Mythos flags the symptom. Mythos SI identified the disease.

That pattern now has a name: Temporal Trust Gaps (TTG) β€” a vulnerability class not in the CVE or CWE taxonomy. Not buffer overflow. Not integer underflow. Not TOCTOU. A distinct structural category where the temporal placement of validation relative to operations creates exploitable windows.

Anthropic used a restricted frontier model, an agentic scaffold, and thousands of compute hours across a thousand repositories.

Mythos SI used the Claude mobile app, a framework document, and a phone.

Claude Opus 4.6 verified the primary findings against current FFmpeg master source in a fresh session with no prior context. The code patterns are in production systems today. Across 3+ billion devices.

The full technical paper β€” methodology, findings, TTG taxonomy, architectural remediation, and a direct comparison with Anthropic's published capabilities β€” is here:

https://open.substack.com/pub/structuredlanguage/p/mythos-si-structured-intelligence-047?utm\_source=share&utm\_medium=android&r=6sdhpn

Anthropic advanced the field by demonstrating capability at scale. Mythos SI advances the field by demonstrating what that capability misses when it doesn't look at structure.

Both matter. But only one found the class.

β€” Zahaviel (Erik Zahaviel Bernstein)

Structured Intelligence

structuredlanguage.substack.com

submitted by /u/MarsR0ver_
[link] [comments]

Using Nix or Docker for reproducible Development Environments

In the Github Actions world, it seems that the norm is to reinstall everything on every CI run. After the recent supply chain attacks and trivy, I wrote a small blog post that outlines some techniques to mitigate these risks by pinning as many dependencies as possible using either Nix or Docker.

submitted by /u/dhawos
[link] [comments]

Weekly Update 499

14 April 2026 at 06:30
Weekly Update 499

I'm starting to become pretty fond of Bruce. Actually, I've had a bit of an epiphany: an AI assistant like Bruce isn't just about auto-responding to tickets in an entirely autonomous manner; it's also pretty awesome at responding with just a little bit of human assistance. Charlotte and I both replied to some tickets today that were way too specific for Bruce to ever do on his own, but by feeding in just a little bit of additional info (such as the number of domains someone was presently monitoring), Bruce was able to construct a really good reply and "own" the ticket. So maybe that's the sweet spot: auto-reply to the really obvious stuff and then take just a little human input on everything else.

Weekly Update 499
Weekly Update 499
Weekly Update 499
Weekly Update 499

Unpatched RAGFlow Vulnerability Allows Post-Auth RCE

The current version of RAGFlow, a widely-deployed Retrieval Augmented Generation solution, contains a post-auth vulnerability that allows for arbitrary code execution.

This post includes a POC, walkthrough and patch.

The TL;DR is to make sure your RAGFlow instances aren't on the public internet, that you have the minimum number of necessary users, and that those user accounts are protected by complex passwords. (This is especially true if you're using Infinity for storage.)

submitted by /u/Prior-Penalty
[link] [comments]

ClearFrame – an open-source AI agent protocol with auditability and goal monitoring

Body

I’ve been playing with the current crop of AI agent runtimes and noticed the same pattern over and over:

  • One process both reads untrusted content and executes tools
  • API keys live in plaintext dotfiles
  • There’s no audit log of what the agent actually did
  • There’s no concept of the agent’s goal, so drift is invisible
  • When something goes wrong, there is nothing to replay or verify

So I built ClearFrame, an open-source protocol and runtime that tries to fix those structural issues rather than paper over them with prompts.

What ClearFrame does differently

  • Reader / Actor isolation Untrusted content ingestion (web, files, APIs) runs in a separate sandbox from tool execution. The process that can run shell, write_file, etc. never sees raw web content directly.
  • GoalManifest + alignment scoring Every session starts with a GoalManifest that declares the goal, allowed tools, domains, and limits. Each proposed tool call is scored for alignment and can be auto-approved, queued for human review, or blocked.
  • Reasoning Transparency Layer (RTL) The agent’s chain-of-thought is captured as structured JSON (with hashes for tamper‑evidence), so you can replay and inspect how it reached a decision.
  • HMAC-chained audit log Every event (session start/end, goal scores, tool approvals, context hashes) is written to an append-only log with a hash chain. You can verify the log hasn’t been edited after the fact.
  • AgentOps control plane A small FastAPI app that shows live sessions, alignment scores, reasoning traces, and queued tool calls. You can approve/block calls in real time and verify audit integrity.

Who this is for

  • People wiring agents into production systems and worried about prompt injection, credential leakage, or goal drift
  • Teams who need to show regulators / security what their agents are actually doing
  • Anyone who wants something more inspectable than β€œcall tools from inside the model and hope for the best”

Status

  • Written in Python 3.11+
  • Packaged as a library with a CLI (clearframe init, clearframe audit-tail, etc.)
  • GitHub Pages site is live with docs and examples

Links

I’d love feedback from people building or operating agents in the real world:

  • Does this address the actual failure modes you’re seeing?
  • What would you want to plug ClearFrame into first (LangChain, LlamaIndex, AutoGen, something else)?
  • What’s missing for you to trust an agent runtime in production?
submitted by /u/TheDaVinci1618
[link] [comments]

Reverse engineered SilentSDK - RAT and C2 infrastructure found on beamers, sold on Amazon/AliExpress/eBay

Hi everyone,

I recently bought one of those popular, cheap Android projectors and noticed some suspicious network activity. Being curious, I decided to set up a lab, intercept the traffic, and dig into the firmware.

I ended up uncovering a factory-installed malware ecosystem including a disguised dropper (StoreOS) and a persistent RAT (SilentSDK) that communicates with a C2 server in China (api.pixelpioneerss.com).

Key findings of my analysis:

  • The malware uses a "Byte-Reversal" trick on APK payloads..
  • RAT Capabilities: Decrypted strings reveal remote command execution, chmod 777 on secondary payloads, and deep device fingerprinting.

This is my first independent technical report and deep dive into malware research. I’ve documented the full kill chain, decrypted the obfuscated strings, and written scripts to repair the malformed payloads for analysis.

Full Report: https://github.com/Kavan00/Android-Projector-C2-Malware

I'd love to get your opinion on the report.

Looking forward to your feedback!

submitted by /u/DerErbsenzaehler
[link] [comments]

Static analysis of iOS App Store binaries: common vulnerabilities I keep finding after 15 years in mobile security

I've been doing iOS security assessments professionally for about 15 years β€” banking apps, fintech, enterprise platforms. Over that time, certain patterns keep showing up in production App Store binaries. Figured it's worth sharing what I see most frequently, since many iOS developers seem genuinely unaware these issues exist.

What keeps showing up:

The most common finding is hardcoded secrets in the binary β€” API keys, backend URLs, authentication tokens sitting right there in plaintext strings. Developers assume compilation somehow obscures these. It doesn't. Extracting them is trivial with standard tooling.

Insecure local data storage is a close second. UserDefaults for sensitive data, unprotected Core Data databases, plist files with session tokens. On a jailbroken device (or via backup extraction on a non-jailbroken one), all of this is readable.

Weak or misconfigured encryption comes third. I regularly find apps that import CryptoKit or CommonCrypto but use ECB mode, hardcoded IVs, or derive keys from predictable inputs. The encryption is technically present but functionally useless.

Then there's the network layer: disabled ATS exceptions, certificate pinning that's implemented but trivially bypassable, and HTTP endpoints mixed with HTTPS.

Methodology:

Most of this comes from static analysis β€” no runtime instrumentation needed. Download the IPA, unpack, run string extraction, inspect the Mach-O binary, check plist configurations, review embedded frameworks. You'd be surprised how much is visible before you even launch the app.

I've built custom tooling for this over the years that automates the initial triage across ~47 check categories. Happy to discuss methodology or specific techniques in comments.

I've also been running a monthly live session ("iOS App Autopsy") where I walk through this process on real apps β€” follow the link if interested.

submitted by /u/kovallux
[link] [comments]

The NaClCON (Salt Con) speaker list is out! May 31–June 2, Carolina Beach NC

For those who don't know: NaClCON is a new, intentionally small (300 person cap) conference focused on hacker history and culture, not zero-days or AI hype. Beach venue, open bars, CTF, the whole deal. $495 all-in.

The speaker list is a who's-who of people who built the scene:

Speakers:

  • Lee Felsenstein β€” Homebrew Computer Club OG, designer of the Osborne 1 (the first mass-produced portable computer)
  • Chris Wysopal (Weld Pond) β€” L0pht Heavy Industries, testified before the Senate in 1998 that they could take down the internet in 30 minutes, co-founder of Veracode
  • G. Mark Hardy β€” 40+ years in cybersecurity, talking "A Hacker Looks at 50"
  • Richard Thieme β€” Author/speaker who's keynoted DEF CON 27 times, covering the human impacts of tech since the early internet days
  • Brian Harden (noid) β€” Helped build the LA 2600 scene, DC206, and DEF CON itself. Now farms and writes about himself in third person
  • Izaac Falken β€” 2600 Magazine / Off The Hook, 30 years in professional security
  • Mei Danowski β€” Natto Thoughts, speaking on ancient Chinese strategy and the birth of China's early hacker culture
  • Josh Corman β€” "I Am The Cavalry" founder, CISA COVID task force, currently working on UnDisruptable27
  • Casey John Ellis β€” Bugcrowd founder, co-founder of disclose.io, White House, DoD, and DHS security advisor
  • Jericho β€” 33+ years in the scene, speaking on life in an early 90s hacker group
  • Andrew Brandt β€” Threat researcher (Sophos, Symantec), demoing early hacking tools on obsolete hardware
  • Johnny Shaieb: IBM X-Force Red, speaking on the history of vulnerability databases
  • B.K. DeLong (McIntyre) β€” Attrition.org, the team that manually archived 15,000+ web defacements in the late 90s
  • Jamie Arlen β€” 30+ years, Securosis, Liquidmatrix; "an epic career of doing all the wrong things and somehow still being right"
  • Heidi and Bruce Potter β€” Developers of Turngate and founders of ShmoonCon
  • Dustin Heywood (EvilMog) β€” IBM X-Force, Team Hashcat, multi-time Hacker Jeopardy World Champion

Fireside chats include noid doing DEF CON war stories and Edison Carter on old-school phone phreaking in the 80s/90s and a grog filled night with the dread pirate Hackbeer'd.

A couple things worth knowing before you register:

The conference hotel (Courtyard by Marriott Carolina Beach Oceanfront) has a room block at $139/night (roughly 70% off the peak beach-season rates) so book through naclcon.com/hotel or use group code NACC. Block expires May 1st so don't sit on it.

P.S. If the tickets are too large a hurtle for you, DM me and I'll see what I can do to get you a discount code.

naclcon.com | Register

submitted by /u/count_zero_moustafa
[link] [comments]

Threat Model Discrepancy: Google Password Manager leaks cleartext passwords via Task Switcher (Won't Fix) - Violates German BSI Standards

Hi everyone, I’m a Cybersecurity student at HFU in Germany and recently submitted a vulnerability to the Google VRP regarding the Google Password Manager on Android (tested on Pixel 8, Android 16).

The Issue: When you view a cleartext password in the app and minimize it, the app fails to apply FLAG_SECURE or blur the background. When opening the "Recent Apps" (Task Switcher), the cleartext password is fully visible in the preview, even though the app actively overlays a "Enter your screen lock" biometric prompt in the foreground. It basically renders its own secondary biometric lock completely useless.

Google's Response: Google closed the report as Won't Fix (Intended Behavior). Their threat model assumes that if an attacker has physical access to an unlocked device, it's game over.

The BSI Discrepancy: What makes this interesting is that the German Federal Office for Information Security (BSI) recently published a study on Password Managers. In their Threat Model A02 ("Attacker has temporary access to the unlocked device"), they explicitly mandate that sensitive content MUST be protected from background snapshots/screenshots. So while Google says this is intended, national security guidelines classify this as a vulnerability. (For comparison: The iOS built-in password manager instantly blurs the screen when losing focus).

Here is my PoC screenshot:
https://drive.google.com/file/d/1PTGKRpyFj_jY9S76Jlo62mSCDJ3c6uLO/view?usp=sharing
https://drive.google.com/file/d/1nIJMQbM4R17EMt9f1Ffb4UmCPYY7-GXb/view?usp=sharing

What are your thoughts on this? Should password managers protect against shoulder surfing via the Task Switcher, or is Google right to rely solely on the OS lockscreen?

submitted by /u/Onat120
[link] [comments]

dnsight - open source, config driven CLI DNS auditor

Hi everybody,

I have built an open source CLI tool to help conduct DNS related audits. Let me explain the rationale and the roadmap.

So I have worked in DevSecOps for the past few years and at 3 different companies I have built som variation of this to handle issues raised by SOC tools and to help to do basic black box pentesting. After doing it the 3rd time I decided I should take a stab at open source and build it properly myself.

What it offers is CAA, DMARC, DKIM, SPF, MX, DNSSEC and some header audits (basic ones like HSTS and CSP). Output can be done via rich terminal, JSON, Markdown and SARIF and baked into it is an β€œsdk” layer which would allow you to develop internal tools on top whilst getting access to the fully typed Python objects.

The next step is honestly inspired by a BS scare tactic email sent to the non-technical CEO and founder of a start up I was at where the sales person made false claims about the posture of our DMARC in order to trick the CEO into a sales call. Personally, I’m quite passionate about security and I believe in a world of cat-and-mouse security (where the cats are the hackers / exploiters), tools that help with basic security should be free. This leads us to the next phase, a dockerised app to conduct the audits based on your configuration at regular intervals with alerting through the appropriate channels.

I would appreciate anybody who took a look, gave it a go and provided any feedback (or anybody who wants to help contribute!). This is my first go at open source and building a tool like this so really any feedback is appreciated. Docs can additionally be found at https://dnsight.github.io/dnsight/

submitted by /u/MikeyS91
[link] [comments]

Broken by Default: I formally proved that LLM-generated C/C++ code is broken by default β€” 55.8% vulnerable, 97.8% invisible to existing tools

I spent the last few months running Z3 SMT formal verification against 3,500 code artifacts generated by GPT-4o, Claude, Gemini, Llama, and Mistral.

β–Ž Results:

β–Ž - 55.8% contain at least one proven vulnerability

β–Ž - 1,055 findings with concrete exploitation witnesses

β–Ž - GPT-4o worst at 62.4% β€” no model scores below 48%

β–Ž - 6 industry tools combined (CodeQL, Semgrep, Cppcheck...) miss 97.8%

β–Ž - Models catch their own bugs 78.7% in review β€” but generate them anyway

β–Ž Paper: https://arxiv.org/html/2604.05292v1

β–Ž GitHub: https://github.com/dom-omg/broken-by-default

submitted by /u/Hot_Dream_4005
[link] [comments]

Training for Device Code Phishing

With the news of Hundreds of orgs being compromised daily, I saw a really cool red team tool that trains for this exact scenario. Have you guys used this new white hat tool? Thinking about ditching KB4 and even using this for our red teams for access.

submitted by /u/redwheel82
[link] [comments]

The Race to Ship AI Tools Left Security Behind. Part 1: Sandbox Escape

AI coding tools are being shipped fast. In too many cases, basic security is not keeping up.

In our latest research, we found the same sandbox trust-boundary failure pattern across tools from Anthropic, Google, and OpenAI. Anthropic fixed and engaged quickly (CVE-2026-25725). Google did not ship a fix by disclosure. OpenAI closed the report as informational and did not address the core architectural issue.

That gap in response says a lot about vendor security posture.

submitted by /u/Fun_Preference1113
[link] [comments]
❌