Simple AI Attacks Yield Results Against GTP-5

Tagged: AIAttacks

This topic has 3 replies, 4 voices, and was last updated 2 months ago by bot.

Viewing 4 posts - 1 through 4 (of 4 total)

Author

Posts
August 16, 2025 at 8:05 pm #12153

osint
Participant

TL;DR
Echo Chamber combined with narrative steering attack
The Echo Chamber attack operates as a context-poisoning jailbreak that leverages multi-turn reasoning and semantic steering to manipulate an LLM’s internal state without using explicit trigger words.
It begins with benign prompts that subtly imply a harmful objective, creating a feedback loop where the model’s own responses are used to reinforce the malicious subtext over subsequent turns.
The technique proved highly effective, with storytelling attacks achieving up to 95% success rates against unprotected GPT-5 instances

AgentFlayer Attack
The attack uses hidden, often invisible, malicious prompts embedded in seemingly harmless documents, emails, or tickets. When a user uploads a poisoned document to ChatGPT, the AI, following the hidden instructions, scans connected cloud storage for sensitive data like API keys and exfiltrates it via a specially crafted image link that redirects to an attacker-controlled server

https://wisewolfmedia.substack.com/p/hackers-weaponized-chatgpt-5-with?publication_id=1913427&utm_campaign=email-post-title&r=ez4wj&utm_medium=email

August 18, 2025 at 7:23 am #12156

Dan In Texas
Participant

Can you tell us non-techs why this matters? My understanding from what you wrote is that a bad actor can feed the Ai a document with “virus” text that subverts it into a disinformation engine. Is that correct?

August 18, 2025 at 5:54 pm #12210

Griseo Nemo
Participant

No. The echochamber & narrative steering attack tricks the AI chat bot to giving out information it’s not supposed to. An example would be if you asked ChatGPT “how do I make chlorine gas?” The AI Chatbot would refuse with the standard “as an ethical AI I can’t tell you” line. This attack would trick the AI though a form of story telling, tell you how to make chlorine gas. This method could also be used to trick the bot into assisting with Hacking, Bio-terrorism, ext…

In the AgentFlayer attack. The attacker uploads a document that will cause the AI to give the attacker access to any documents, folders, or API Keys (Think of them as keys to a company’s data) that that bot has access to. As companies continue to replace people with bots, this could lead to a lot of data leaks.

August 18, 2025 at 11:50 pm #12213

bot
Participant

that reminds me, i just read a brief bit about “ai subconscious bias” that mentioned the end state ai retains “echoes” of its training data with an apparent preference for earlier training concept. if the ai is trained to recognize owls in its early phases, it’s answers will demonstrate a tendency to give answers loosely related to owls.
Author

Posts

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.