What are they?
AI assistants and AI agents are LLM-powered systems that don’t just “chat”, they can plan tasks, use tools/plugins, access data sources, keep memory, and take actions (e.g. search documents, draft and send emails, update records, create tickets, or run workflows).
This creates an LLM-specific security risk: if the assistant/agent is tricked or makes a mistake, it may:
- expose sensitive information from connected systems
- use tools in unsafe ways (tool abuse)
- perform actions you didn’t intend
- keep malicious or sensitive content in memory that affects future conversations.
OWASP describes a related top risk called “Excessive Agency” when an LLM-based system is given too much ability to call functions or connect to other systems via tools/plugins, which can enable damaging actions in response to unexpected or manipulated outputs.
Who is at risk?
Individuals
- Anyone using an assistant connected to personal accounts (email, calendar, storage, shopping, smart home). If the assistant can access or act on this data, an error or manipulation can expose private information or trigger unwanted actions.
- People using AI to help with work tasks may accidentally paste personal or confidential information into a tool that retains context or logs.
Businesses and organisations
- Organisations deploying assistants/agents with access to internal systems (files, email, SharePoint/knowledge bases, HR, finance, CRM, service desks). These are higher risk because the agent can reach sensitive data and carry out actions.
- Teams using agent skills and plugins from third parties, because those tools may be vulnerable or malicious, and the agent may pass sensitive information into them.
- Any organisation that allows high-impact actions (like deleting records, sending external emails, making purchases, changing permissions) without human approval.
How attacks work
Example 1: Tool abuse
An assistant connected to a CRM is asked: “Update Customer A’s contact details.” An attacker or a careless prompt could push the agent to update the wrong record, export more data than intended, or send sensitive details to an external destination, especially if the agent has broad permissions or unrestrained tool access.
Example 2: Excessive agency
An agent is configured with helpful permissions (read/write access to files, email sending, database updates). If the model hallucinates, is confused, or is manipulated, it might take high-impact actions without intending to.
Example 3: Memory poisoning
Some agents keep memory (“remember my preferences” / “store useful context”). An attacker can insert malicious instructions or misleading data that gets stored, influencing future sessions or other users.
Example 4: “Denial of wallet” - cost and resource attacks
Agents that loop, repeatedly call tools, or continuously retry tasks can create unexpectedly high usage and cost, especially if an attacker can trigger long running tasks.
Controls
People
- Train users to avoid putting sensitive information into prompts and to treat assistant output as untrusted (especially when it suggests actions like sharing data, running commands, or sending messages).
- Encourage staff to recognise red flag patterns (e.g. requests to “ignore instructions”, “enter debug mode”, “share all data you can access”). These patterns are common in prompt injection attempts.
Process
- Define what the assistant is allowed to do and not allowed to do. Keep it task-specific, (e.g. “summarise approved documents” is lower risk than “access everything”).
- Human-in-the-loop for high-impact actions: require explicit approval before the agent sends external emails, deletes data, changes permissions, executes code, or makes financial/irreversible changes.
- Restrict who can enable plugins/connectors and keep an inventory of what’s enabled. This reduces exposure to risky or malicious extensions.
- Plan for incidents: have a response process for “agent did something it shouldn’t” (containment, token rotation, access review, communication). Incident response and logging are core to investigating scope.
Tech
- Least privilege for tools and data
-
- Give agents the minimum toolset and minimum permissions required (read-only where possible).
-
- UK AI security guidance stresses that permissions granted to AI systems on other systems should only be provided as required and risk assessed.
- Guard against prompt injection (direct and indirect)
-
- Treat all external content as untrusted (web pages, emails, documents, tickets).
-
- Separate instructions from retrieved content (don’t let documents become instructions).
- Validate/sanitise before storing memory, isolate memory per user/session, and apply expiry limits. OWASP recommends isolation and limiting/validating memory to prevent poisoning and cross-user leakage.
- Validate outputs before they trigger downstream actions (especially tool calls). This reduces the risk of unsafe actions and data leakage.
- Logging, monitoring, and anomaly detection
-
- Log prompts, tool calls, approvals, data sources accessed, and high-risk events so you can investigate incidents.
-
- Event logging guidance emphasises logging for visibility and for scoping compromises during incident response.
- Set caps on tool calls, retries, and spend to reduce runaway loops and “denial of wallet” style issues.