Wednesday, December 24, 2025

Top 5 This Week

- Advertisement -
spot_img

Related Posts

- Advertisement -

AI Browsers Face Permanent Prompt Injection Security Risk

OpenAI has officially acknowledged a worrying reality: their Atlas AI browser, like all AI-powered browsers, will never be completely safe from prompt injection attacks. Most likely.

Example of how a prompt injection attack text could look like. Image credit: OpenAI

Key Takeaways:

  • Prompt injection attacks against AI browsers cannot be fully eliminated, only managed through continuous defense updates
  • OpenAI developed an AI-powered attacker bot using reinforcement learning to discover vulnerabilities before hackers exploit them
  • Current AI browsers pose significant risks due to their access to sensitive data like emails and payment systems, outweighing practical benefits for most users

These attacks trick AI agents into executing hidden commands embedded in websites or emails, and the company now says this vulnerability is essentially permanent.

“Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully ‘solved,’” OpenAI stated in a Monday blog post. The company admitted that “agent mode” in ChatGPT Atlas “expands the security threat surface.”

Atlas launched in October, and security researchers immediately demonstrated its flaws. Within hours, they showed how simple text hidden in Google Docs could hijack the browser’s behavior. Brave’s security team published findings the same day, explaining that indirect prompt injection poses systematic challenges for all AI browsers, including Perplexity’s Comet.

The UK’s National Cyber Security Centre issued a warning earlier this month confirming that prompt injection attacks against generative AI applications “may never be totally mitigated.” The agency advised cybersecurity professionals to focus on reducing impact rather than expecting complete prevention.

Fighting an Endless Battle

OpenAI describes prompt injection as “a long-term AI security challenge” requiring continuous defense strengthening. The company’s strategy involves what it calls a “proactive, rapid-response cycle” designed to identify attack methods internally before hackers discover them.

This approach mirrors tactics from Anthropic and Google, which emphasize layered defenses and constant stress-testing. Google’s recent work concentrates on architectural and policy-level controls for agentic systems.

OpenAI’s distinctive contribution is its “LLM-based automated attacker”—an AI bot trained through reinforcement learning to act as a hacker seeking ways to inject malicious instructions into AI agents.

The bot tests attacks in simulation environments, observing how the target AI processes and responds to each attempt. It analyzes the AI’s internal reasoning, adjusts the attack, and repeats the process. This inside access to reasoning patterns gives OpenAI’s bot advantages external attackers lack, theoretically enabling faster vulnerability discovery.

“Our [reinforcement learning]-trained attacker can steer an agent into executing sophisticated, long-horizon harmful workflows that unfold over tens (or even hundreds) of steps,” OpenAI wrote. “We also observed novel attack strategies that did not appear in our human red teaming campaign or external reports.”

Real-World Exploitation Examples

OpenAI demonstrated how its automated attacker inserted a malicious email into a test inbox. When the AI agent scanned emails, it followed hidden instructions and sent a resignation message instead of creating an out-of-office reply. After security updates, “agent mode” successfully detected the injection attempt and alerted the user.

The company emphasizes that large-scale testing and faster patch cycles can strengthen systems before real-world attacks occur. However, an OpenAI spokesperson declined to share whether security updates have produced measurable reductions in successful injections. The spokesperson noted that OpenAI has collaborated with third parties to strengthen Atlas against prompt injection since before launch.

The Risk-Benefit Calculation

Rami McCarthy, principal security researcher at cybersecurity firm Wiz, describes reinforcement learning as valuable for adapting to attacker behavior but insufficient alone.

“A useful way to reason about risk in AI systems is autonomy multiplied by access,” McCarthy explained.

“Agentic browsers tend to sit in a challenging part of that space: moderate autonomy combined with very high access,” McCarthy said. “Many current recommendations reflect that trade-off. Limiting logged-in access primarily reduces exposure, while requiring review of confirmation requests constrains autonomy.”

OpenAI recommends users reduce their risk by limiting AI agent access and requiring confirmation before actions like sending messages or making payments. Atlas receives training to request user approval for these sensitive operations. The company also advises giving agents specific instructions rather than broad permissions with vague directives like “take whatever action is needed.”

“Wide latitude makes it easier for hidden or malicious content to influence the agent, even when safeguards are in place,” OpenAI noted.

Despite OpenAI prioritizing Atlas protection against prompt injections, McCarthy questions whether the benefits justify the risks.

“For most everyday use cases, agentic browsers don’t yet deliver enough value to justify their current risk profile,” McCarthy said. “The risk is high given their access to sensitive data like email and payment information, even though that access is also what makes them powerful. That balance will evolve, but today the trade-offs are still very real.”


Written by Alius Noreika

Source link

- Advertisement -
Newsdesk
Newsdeskhttps://www.european.express
European Express News aims to cover news that matter to increase the awareness of citizens all around geographical Europe.

Popular Articles