Demonstration: prompt-injection failures in a simulated help-desk LLM

By: ／u／thePROFITking

I built this as a small demonstration to explore prompt-injection and instruction-override failure modes in help-desk-style LLM deployments.

The setup mirrors common production patterns (role instructions, refusal logic, bounded data access) and is intended to show how those controls can be bypassed through context manipulation and instruction override.

I’m interested in feedback on realism, missing attack paths, and whether these failure modes align with what others are seeing in deployed systems.

This isn’t intended as marketing - just a concrete artefact to support discussion.

submitted by /u/thePROFITking
[link] [comments]

FreshRSS

Demonstration: prompt-injection failures in a simulated help-desk LLM