Skip to main content

AI incidents can look like normal cyber incidents (malware, account compromise, data breach), but the trigger and evidence can be different when LLMs are involved (e.g.  prompt injection, unsafe tool use, or a compromised plugin). Because AI tools are increasingly embedded into business workflows and connected to data sources, incident response planning and logging are critical to understand what happened, contain damage, and recover quickly.  

This page gives a practical incident response approach for LLM-enabled systems, including: 

  • Why AI-focused incident response matters 
  • What to log so you can investigate properly 
  • How to triage and contain AI-related incidents 

 

Why it matters

  1. AI incidents scale quickly

When an LLM is connected to email, file storage, internal documents, or business systems, the potential impact of a single failure can increase (e.g. more data exposure or unintended actions).  

  1. AI incidents are often “text-driven”

Some AI attacks and failures are triggered by natural language (prompts, documents, web pages, tickets), which can manipulate the model (e.g. prompt injection) or cause unsafe outputs/actions.  

  1. Evidence can disappear if you don’t plan ahead 

If you don’t collect the right logs (prompts, tool calls, retrieved sources, plugin changes), it can be hard to prove what data was accessed, what actions occurred, or how the system was manipulated.  

  1. Security guidance stresses preparation + monitoring
  • Our guidance on incident response & recovery for smaller businesses highlights the importance of planning, identifying what’s happening, and using logs to help determine cause and impact. 
  • NIST’s incident response guidance is designed to help organisations prepare, reduce impact, and improve detection/response/recovery effectiveness.  
  • The UK’s AI Cyber Security Code of Practice includes requirements to create and maintain incident management and recovery plans and to monitor system behaviour. 

 

What to log so you can investigate and determine impact

  1. Identity and access 
  • user identity (account ID), role, and authentication method 
  • source IP/device ID (where possible) 
  • session IDs / correlation IDs to link activity together 
  1. Prompt and conversation data 
  • the prompt text (or a redacted version) 
  • conversation/session ID and timestamps 
  • model/system prompt version (so you know what rules existed at the time) 

Practical tip: If storing full prompts is too sensitive, store: 

  • a hashed copy + metadata (time/user/model) 
  • redacted content (mask PII/secrets) 

This balances forensic value and privacy, consistent with the idea that logs should support investigations while being protected.  

  1. Retrieval and data source traces 

If you use chat with documents: 

  • which documents/records were retrieved (IDs, paths, URLs) 
  • which connectors were used (SharePoint, file shares, email) 
  • access decisions (allowed/blocked) 
  1. Tool/connector/plugin actions 

For assistants/agents with tools: 

  • every tool call (tool name, parameters, target system) 
  • whether the call was approved (human-in-the-loop) 
  • result status (success/fail) and returned data size 
  1. Output handling 
  • model output (or redacted output) 
  • where it was rendered/used (web UI, email template, database query builder, script runner) 
  1. Configuration and change events 
  • plugin/connector enablement/disablement 
  • permission changes (scopes widened, new service accounts) 
  • model version changes, policy/guardrail changes 
  • key rotation events 
  1. Security telemetry 
  • authentication failures and unusual sign-in patterns 
  • DLP alerts (if used) 
  • network egress anomalies (unexpected uploads) 
  • endpoint alerts on agent hosts (malware/stealers) 

 

How to triage security incidents

Step 1 — Confirm the incident type 

Start by classifying the report into one (or more) of these: 

  • Data exposure   
  • Prompt injection / manipulation   
  • Agent/tool abuse   
  • Output-handling exploit (XSS/SSRF/code execution paths) 
  • Supply chain issue   

Step 2 — Decide severity  

Use impact-based severity questions: 

  • Confidentiality: Was sensitive data accessed or shared externally?  
  • Integrity: Did the assistant change records, permissions, configurations, or content?  
  • Availability: Is the system down, looping, or causing unexpected cost spikes?  

Also ask: 

  • Is the incident ongoing?  
  • Does the agent have broad permissions?  

Step 3 — Immediate containment  

Choose containment actions based on type.

If it looks like data exposure:

  • Disable the affected connector/source (e.g. stop indexing or querying the dataset).  
  • Restrict access to the AI feature to a smaller group until understood.  

If it looks like prompt injection:

  • Block/strip the malicious content source (document/URL/email) and prevent it being retrieved again.  
  • Add temporary guardrails: higher scrutiny, stricter output filtering, or human approval gates.  

If it looks like agent/tool abuse:

  • Disable tool access (or switch to read-only tools) and require approvals for high-impact actions.  
  • Revoke/rotate tokens and credentials used by the agent’s connectors.  

If it looks like insecure output handling:

  • Disable any feature that renders/executes output (HTML preview, script execution, query execution) until outputs are sanitised/encoded.  

If it looks like supply chain compromise:

  • Disable the suspicious plugin/tool/platform integration immediately.  
  • Rotate keys/tokens the integration could have accessed and monitor for misuse.  

Tip: Containment should be reversible and prioritise stopping ongoing harm first. This aligns with standard incident response outcomes (detect → respond → recover).  

Step 4 — Scope the incident  

Use the logs you collected to answer: 

  • Which users/sessions were involved?  
  • What prompts were used (or what content was retrieved)?  
  • Which documents/records were accessed?  
  • Which tools were called and what actions occurred?  
  • Was data sent externally (emails, webhooks, uploads)?  
  • Were there configuration changes (plugins enabled, permissions broadened)?  

Step 5 — Eradicate and recover  

Typical recovery steps include: 

  • Patch/update affected components (including AI orchestration layers, connectors, and dependencies).  
  • Reset credentials/tokens and reduce permissions (least privilege).  
  • Restore from known good state if data integrity was impacted.  
  • Re-enable features gradually with stronger controls (approvals, filtering, monitoring). 

 

Building your incident response

A) Prepare

  • Identify critical systems and data your AI tools can access. 
  • Maintain contact lists and escalation routes (IT provider, vendors, CSC reporting, regulator if relevant).  
  • Define how to disable or isolate AI features quickly.

B) Enable evidence collection

  • Centralise logs and ensure sufficient retention to investigate.  
  • Ensure logs are protected from unauthorised access/modification.  

C) Test your response

  • Run tabletop exercises: prompt injection scenario, data disclosure scenario, compromised connector scenario.