Skip to main content

What are they?

Malicious plugins, tools, and platforms are third party add-ons (or services) that look legitimate but are designed to steal data, steal credentials, install malware, or trigger unsafe actions when used with LLM-enabled assistants/agents.  

This is an LLM-specific risk because modern LLM systems often support extensions (plugins, tools, connectors) that give the model the ability to access files, browse the web, call APIs, or interact with business apps. If a plugin/tool is malicious (or compromised), it can inherit that access. 

 

Who is at risk?

Individuals 

  • People using AI assistants that support third party plugins/tools may install something helpful that is actually malicious, leading to credential theft, data exposure, or device compromise.  
  • Anyone using personal assistants connected to email/calendar/storage is at higher risk if third party add-ons are enabled with broad access.  

Businesses and organisations 

  • Organisations that connect assistants/agents to email, file storage, calendars, CRMs, ticketing systems, or other business tools are at risk because malicious plugins/tools can exfiltrate data and credentials at scale.  
  • Any organisation enabling third party add-ons through marketplaces or official platforms is exposed to supply chain risk, especially if review processes are weak or users can self-install add-ons without oversight. 

 

How attacks work

Example 1: “Fake prerequisites” that install malware 

A plugin/tool page looks legitimate and claims you must “install a helper” first. It links to an external download or tells you to run a script. The helper is actually malware or a credential stealer. This social engineering pattern has been described by researchers investigating malicious add-ons in agent marketplaces. 

Example 2: Credential and token theft from the host 

Once installed or enabled, a malicious add-on may try to capture secrets stored on the device or within the agent environment (API keys, session tokens, credentials), then send them to an attacker-controlled server.  

Example 3: Lookalike names - typo squatting 

Attackers create add-ons with names that closely resemble popular ones (small spelling differences), increasing the chance that users install the wrong package or enable the wrong integration. 

 

Controls

People  

  • Treat plugins/tools like software installs: don’t enable add-ons just to try them and be wary of any plugin that asks you to run separate installers/scripts or download software from external links.  
  • Train staff to recognise red flags: urgent install steps, external downloads, vague publishers, and requests for broad permissions or API keys.  

Process 

  • Approved list: limit which plugins/tools can be enabled and restrict who can enable them (especially in business environments).  
  • Supplier and marketplace due diligence: ask how add-ons are reviewed, how quickly malicious items are removed, and how users are alerted to security issues.  
  • Maintain an inventory: keep a list of enabled plugins/tools, versions, and what they connect to (email, storage, CRM, etc) so you can manage risk and respond quickly.  
  • Incident playbook: define how to respond if a plugin/tool is suspected malicious (disable it, rotate tokens/credentials, review logs, scope access).  

Tech 

  • Least privilege for tools and permissions: give assistants/agents the minimum access required (read-only where possible).  
  • Use separate accounts and scoped tokens: use dedicated service accounts and limited-scope API keys for plugins rather than personal/admin accounts to reduce impact.  
  • Isolation / sandboxing where possible: run agent tools in constrained environments and avoid running agent processes with admin/root permissions. This reduces damage if a malicious add-on executes. 
  • Logging and monitoring: log add-on enablement, permission grants, tool calls, and unusual outbound connections so you can detect suspicious behaviour and investigate.  
  • Rate limits / anomaly detection: detect unusual behaviour (sudden bursts of tool calls, exports, or external requests).