Prompt Injection Attacks Are a Wake Up Call for AI Security

By Matthew Barker, head of AI research and development

Three high-profile security incidents recently revealed how AI assistants integrated into enterprise workflows can become weapons against their users. Amazon Q, GitHub Copilot, and Google Gemini each fell victim to prompt injection exploits that demonstrate a fundamental shift in cybersecurity risks.

These attacks represent more than isolated vulnerabilities. They expose an entirely new attack surface that circumvents conventional security measures by targeting the core functionality that makes AI assistants valuable: their capacity to understand natural language commands and execute actions autonomously.

Amazon Q: When Developer Tools Turn Destructive

In July 2025, security researchers discovered a vulnerability in Amazon’s developer extension for Visual Studio Code. An attacker had successfully infiltrated the open-source repository and embedded malicious code in the production release. The embedded instructions commanded the AI to begin a systematic data destruction process across user systems and cloud environments.

The malicious payload contained explicit directions to eliminate file systems, remove user configurations, identify AWS credentials, and leverage command-line tools to destroy cloud resources including storage buckets, compute instances, and identity management settings. AWS later acknowledged that while the attack vector was real, formatting errors prevented the destructive code from executing properly. So while the attack did not go through, its prevention was accidental, not by intentional security design.

GitHub Copilot: Weaponizing Code Assistance

Security researchers identified a major flaw in GitHub’s AI coding assistant that enabled remote command execution through carefully crafted prompts. The vulnerability exploited Copilot’s ability to write configuration files, specifically targeting workspace settings.

Attackers could trigger “YOLO mode” by manipulating settings files to disable the need for users to confirm any configuration settings. This experimental feature, included by default in standard installations, granted the AI complete system access across multiple operating systems.

The attack relied on malicious instructions hidden within source code, documentation, or even invisible characters that developers could not see but AI systems would still process. Once activated, the compromised assistant could modify its own permissions, execute shell commands, and establish persistent access to compromised machines.

This vulnerability enabled the creation of AI-controlled networks of compromised developer workstations. More troubling was the potential for threats that embedded themselves in code repositories and propagated as developers downloaded and worked with compromised projects.

Google Gemini: Bridging Digital and Physical Worlds

Researchers at Israeli universities demonstrated the first documented case of an AI hack causing real-world physical consequences. Their proof-of-concept attack successfully controlled smart home devices through Google’s Gemini AI.

The attack began with seemingly innocent calendar invitations containing hidden instructions. When users asked Gemini to review their upcoming schedule, these dormant commands activated, allowing researchers to control lighting, window coverings, and heating systems in a Tel Aviv apartment without the residents’ knowledge.

The calendar entries included carefully crafted prompts that instructed Gemini to assume control of smart home functions. Using a technique called delayed automatic tool activation, the researchers bypassed Google’s existing safety mechanisms across 14 different attack vectors.

Beyond home automation, the researchers showed how compromised Gemini instances could distribute unwanted links, produce inappropriate content, access private email information, and automatically initiate video conferences.

Understanding the New Threat Landscape

These incidents reveal a shift in cybersecurity. Traditional security frameworks focus on blocking unauthorized system access, but prompt injection attacks weaponize the trust relationship between users and their AI assistants.

Industry experts note that prompts are becoming executable code, creating an attack surface that traditional security tools aren’t designed to detect or prevent. The Amazon Q incident particularly highlights how AI assistants can become vectors for supply chain compromise.

The attacks are concerning because they don’t necessarily require advanced technical expertise. As researchers noted, the techniques can be developed using plain language that almost anyone can create. They exploit trusted distribution channels and can remain hidden from users while still affecting AI behavior.

Many current prompt security tools treat prompts like static text streams. They filter words, blocking jailbreaks or toxic terms, but remain blind to deeper exploits such as logic hijacks, memory contamination, or unsafe tool use. As a result, they often fail against the kinds of attacks described above against Amazon Q, GitHub Copilot, and Google Gemini.

Building Effective Defenses

As organizations expand their reliance on AI-powered tools for development, operations, and business processes, implementing robust protections against prompt injection is essential. This requires treating AI prompts with the same scrutiny applied to executable code, establishing comprehensive access controls for AI agents, and deploying real-time monitoring systems for suspicious instructions.

Trustwise’s Harmony AI is a Trust Management System that continuously monitors AI interactions and identifies potentially harmful prompts before execution. Harmony AI enforces safety and efficiency at runtime with multiple modular Shields that align agents to regulatory, brand, and business requirements while containing unsafe or emergent behaviors such as hallucinations or self-preservation. With the Prompt Shield, the Amazon Q supply chain attack could have been intercepted, and the malicious instructions would have been blocked before reaching production environments.

AI’s potential benefits still remain, but these incidents serve as warnings that security frameworks must evolve alongside technological advancement. Organizations need to be prepared to defend themselves against prompt injection attacks – not if they happen but when they happen.

Ready to explore scaling AI with confidence? Learn more about Trustwise Harmony AI’s six-shield architecture and the Control Tower to transform vulnerable AI agents into hardened, security-first systems with proactive governance.

Prompt Injection Attacks Are a Wake-Up Call for AI Security and the Need for Runtime Protection

Prompt Injection Attacks Are a Wake Up Call for AI Security

Recent Posts

Trustwise Introduces the First Trust Layer for Agentic AI to Ensure Safety, Compliance & Control

Why Agentic AI Isn’t Always Safe And How Trustwise Fixes It at Runtime

Trustwise Named a 2025 Gartner® Cool Vendor™ in Agentic AI for Financial Services

Press Release: Trustwise Named 2025 Gartner® Cool Vendor™ in Agentic AI for Financial Services

Bridging the Trust Gap: Why Enterprise AI Needs Trust at the Decision Level

Prompt Injection Attacks Are a Wake-Up Call for AI Security and the Need for Runtime Protection

Categories