Skip to main content

What is it?

Overreliance is a vulnerability that happens when people or systems treat an LLM’s output as trusted and correct, without appropriate checks. This can lead to poor decisions, security mistakes, or unsafe actions, especially because LLMs can generate convincing text that is incorrect, incomplete, or misleading.  

LLMs are particularly risky to over-trust because: 

  • They can produce false information (hallucinations) while sounding confident  
  • They can be gullible and influenced by leading prompts  
  • They can be manipulated by attackers (e.g. via prompt injection) into unsafe outputs 
  • People may assume “the AI checked it,” when it may not have.  

Overreliance becomes a bigger issue when LLM outputs are used to: 

  • Make business or security decisions 
  • Produce customer-facing responses 
  • Generate code/configuration 
  • Trigger actions through automated workflows 

 

Who is at risk?

Individuals 

  • People using AI for advice may be misled into unsafe steps (clicking links, installing tools, sharing information, or following inaccurate guidance).  
  • Anyone who treats AI summaries as fact without checking original sources is at higher risk of misunderstanding or acting on incorrect information.  

Businesses and organisations 

  • Teams using LLMs for research, reports, policy, or decision support, as errors can create financial, legal, or reputational risk.  
  • Developers and IT staff using LLMs for troubleshooting or generating code, where incorrect or unsafe instructions can weaken security.  
  • Organisations deploying customer chatbots or internal assistants, where confident but wrong outputs can mislead users or cause operational harm. 

 

How attacks work

Example 1: “Confident nonsense” used to mislead 

A user asks an LLM for urgent advice (“What should I do about this suspicious email?”). The LLM provides a confident but incorrect answer (e.g. “It looks safe. click the link”). If the user trusts it without checking, they may fall for a scam. LLMs are known to produce incorrect statements that appear plausible. 

Example 2: Unsafe instructions accepted as best practice 

A user asks an LLM for steps to fix a system problem and gets a risky suggestion (e.g. disabling security settings, copying secrets into a tool, running commands they don’t understand). If followed blindly, this can create vulnerabilities. Treating LLM outputs as untrusted is important because LLMs can be wrong and can be influenced by adversarial inputs. 

Example 3: Bad information flows into business decisions 

A manager asks an LLM to summarise a document or topic and uses the summary for decisions. If the LLM misses key details, invents facts, or misrepresents the source, this can lead to incorrect actions. 

 

Controls

People 

  • Set the mindset: LLM outputs are suggestions, not facts. Encourage users to verify important claims and treat outputs as untrusted, especially for security, legal, medical, or financial topics.  
  • Train staff on common failure modes: hallucinations, persuasive tone, and susceptibility to manipulation.  

Process  

  • Define “safe uses” vs “high-risk uses”:  
    • Safer: drafting generic text, summarising non-sensitive public material (with checks).  
    • Higher risk: decisions, security configuration, code deployment, legal/HR decisions, anything with sensitive data.  
  • Require human review for high-impact outputs (customer advice, policy decisions, system changes, external communications).  
  • Use escalation paths: if an answer affects safety, money, legal position, or security, it should be checked by a qualified person or trusted source.  

Tech 

  • Make verification easy: link to sources, keep citations, and encourage users to open the original document rather than trusting summaries alone (especially for internal decision-making). This directly mitigates the incorrect information risk.  
  • Prevent auto-action based on AI output: avoid systems where LLM responses automatically trigger changes, emails, payments, scripts, or security configuration without checks. This reduces the impact of wrong or manipulated outputs.  
  • Use guardrails for sensitive domains: implement internal policies or technical controls that block risky behaviours (e.g. pasting secrets, sharing customer data, or generating high-risk instructions). Overreliance risks increase when LLMs are connected to tools and data, so restricting access reduces impact.