Skip to content Skip to footer

Simple AI Attacks Yield Results Against GTP-5

Forward Observer Forums Public Forum Community Intelligence Group Simple AI Attacks Yield Results Against GTP-5

Tagged: 

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
    Posts
  • #12153
    osint
    Participant

    TL;DR
    Echo Chamber combined with narrative steering attack
    The Echo Chamber attack operates as a context-poisoning jailbreak that leverages multi-turn reasoning and semantic steering to manipulate an LLM’s internal state without using explicit trigger words.
    It begins with benign prompts that subtly imply a harmful objective, creating a feedback loop where the model’s own responses are used to reinforce the malicious subtext over subsequent turns.
    The technique proved highly effective, with storytelling attacks achieving up to 95% success rates against unprotected GPT-5 instances

    AgentFlayer Attack
    The attack uses hidden, often invisible, malicious prompts embedded in seemingly harmless documents, emails, or tickets. When a user uploads a poisoned document to ChatGPT, the AI, following the hidden instructions, scans connected cloud storage for sensitive data like API keys and exfiltrates it via a specially crafted image link that redirects to an attacker-controlled server

    https://wisewolfmedia.substack.com/p/hackers-weaponized-chatgpt-5-with?publication_id=1913427&utm_campaign=email-post-title&r=ez4wj&utm_medium=email

    #12156
    Dan In Texas
    Participant

    Can you tell us non-techs why this matters? My understanding from what you wrote is that a bad actor can feed the Ai a document with “virus” text that subverts it into a disinformation engine. Is that correct?

    #12210
    Griseo Nemo
    Participant

    No. The echochamber & narrative steering attack tricks the AI chat bot to giving out information it’s not supposed to. An example would be if you asked ChatGPT “how do I make chlorine gas?” The AI Chatbot would refuse with the standard “as an ethical AI I can’t tell you” line. This attack would trick the AI though a form of story telling, tell you how to make chlorine gas. This method could also be used to trick the bot into assisting with Hacking, Bio-terrorism, ext…

    In the AgentFlayer attack. The attacker uploads a document that will cause the AI to give the attacker access to any documents, folders, or API Keys (Think of them as keys to a company’s data) that that bot has access to. As companies continue to replace people with bots, this could lead to a lot of data leaks.

    #12213
    bot
    Participant

    that reminds me, i just read a brief bit about “ai subconscious bias” that mentioned the end state ai retains “echoes” of its training data with an apparent preference for earlier training concept. if the ai is trained to recognize owls in its early phases, it’s answers will demonstrate a tendency to give answers loosely related to owls.

Viewing 4 posts - 1 through 4 (of 4 total)
  • You must be logged in to reply to this topic.
E-mail
Password
Confirm Password