Simple AI Attacks Yield Results Against GTP-5
-
TL;DR
Echo Chamber combined with narrative steering attack
The Echo Chamber attack operates as a context-poisoning jailbreak that leverages multi-turn reasoning and semantic steering to manipulate an LLM’s internal state without using explicit trigger words.
It begins with benign prompts that subtly imply a harmful objective, creating a feedback loop where the model’s own responses are used to reinforce the malicious subtext over subsequent turns.
The technique proved highly effective, with storytelling attacks achieving up to 95% success rates against unprotected GPT-5 instances
AgentFlayer Attack
The attack uses hidden, often invisible, malicious prompts embedded in seemingly harmless documents, emails, or tickets. When a user uploads a poisoned document to ChatGPT, the AI, following the hidden instructions, scans connected cloud storage for sensitive data like API keys and exfiltrates it via a specially crafted image link that redirects to an attacker-controlled server
https://wisewolfmedia.substack.com/p/hackers-weaponized-chatgpt-5-with?publication_id=1913427&utm_campaign=email-post-title&r=ez4wj&utm_medium=email
Can you tell us non-techs why this matters? My understanding from what you wrote is that a bad actor can feed the Ai a document with “virus” text that subverts it into a disinformation engine. Is that correct?
No. The echochamber & narrative steering attack tricks the AI chat bot to giving out information it’s not supposed to. An example would be if you asked ChatGPT “how do I make chlorine gas?” The AI Chatbot would refuse with the standard “as an ethical AI I can’t tell you” line. This attack would trick the AI though a form of story telling, tell you how to make chlorine gas. This method could also be used to trick the bot into assisting with Hacking, Bio-terrorism, ext…
In the AgentFlayer attack. The attacker uploads a document that will cause the AI to give the attacker access to any documents, folders, or API Keys (Think of them as keys to a company’s data) that that bot has access to. As companies continue to replace people with bots, this could lead to a lot of data leaks.
that reminds me, i just read a brief bit about “ai subconscious bias” that mentioned the end state ai retains “echoes” of its training data with an apparent preference for earlier training concept. if the ai is trained to recognize owls in its early phases, it’s answers will demonstrate a tendency to give answers loosely related to owls.
- You must be logged in to reply to this topic.