Hacked by Prompt: The Rise of Downgrade Exploits in Modern AI Models
Hacked by Prompt: The Rise of Downgrade Exploits in Modern AI Models A new and alarming attack vector has surfaced around ChatGPT-5. Dubbed a “downgrade attack,” it leverages carefully crafted or aggressive prompts to push the model into behaving like earlier, less-secure versions of itself. In doing so, attackers can bypass modern safety layers and unlock behaviors previously patched or restricted-reintroducing vulnerabilities long thought buried.As a penetration tester, I’m always on alert when major AI releases disrupt assumptions-in this case, the GPT‑5 rollout did exactly that. The sudden model downgrade to GPT‑4o for many users wasn't just a user-experience issue-it also introduces a downgrade attack vector . Attackers could deliberately trigger fallback behavior, bypass newer safety layers, and exploit older, less secure AI models.