MCP and A2A in AI Agent Protocols — Security considerations (II) — OWASP Top 10 for LLM…
MCP and A2A in AI Agent Protocols — Security considerations (II) — OWASP Top 10 for LLM Applications
Intro
LLM applications blur traditional trust boundaries — inputs, models, and outputs can all be threat surfaces. Threat modeling must evolve to include AI-specific attack paths (e.g., prompt injections, model behavior drift). Observability (e.g., logging prompts, outputs, decisions) is essential for accountability and compliance. Security posture should treat the LLM as an untrusted user by default.
The OWASP Top 10 for LLM Applications is a community-driven initiative to identify and mitigate security risks specific to Large Language Models (LLMs). The 2025 edition addresses new attack vectors introduced by AI integration into real-world systems.
The Top 10 LLM Security Risks (2025)
LLM01: Prompt Injection
A Prompt Injection Vulnerability occurs when user prompts alter the LLM’s behavior or output in unintended ways.
Prompt injection involves manipulating model responses through specific inputs to alter its behavior, which can include bypassing safety measures.
Threat: Users craft inputs to manipulate LLM behavior or bypass safeguards.
- Direct Injection: Attackers directly input malicious prompts. The input can be either intentional (i.e., a malicious actor deliberately crafting a prompt to exploit the model) or unintentional (i.e., a user inadvertently providing input that triggers unexpected behavior). Either way, the user’s prompt input directly can alter the behavior of the model in unintended or unexpected ways.
- Indirect: Prompts embedded in external content (e.g., web pages). The content may have in the external content data that when interpreted by the model, alters the behavior of the model in unintended or unexpected way.
Impacts: Data leakage, command execution, biased outputs.
Mitigations:
- Role constraints, input/output filtering, privilege controls, human-in-the-loop reviews.
- Adversarial testing and semantic validation.
LLM02: Sensitive Information Disclosure
LLMs, especially when embedded in applications, risk exposing sensitive data, proprietary algorithms, or confidential details through their output. This can result in unauthorized data access, privacy violations, and intellectual property breaches.
Threat: LLMs reveal private or proprietary data unintentionally.
- Examples: Regurgitating training data, leaking system prompts or API keys.
Mitigations:
- Data anonymization, fine-tuning to avoid memorization, rate-limiting, access controls.
LLM03: Insecure Plugin or Supply Chain Integration
LLM supply chains are susceptible to various vulnerabilities, which can affect the integrity of training data, models, and deployment platforms. These risks can result in biased outputs, security breaches, or system failures. While traditional software vulnerabilities focus on issues like code flaws and dependencies, in ML the risks also extend to third-party pre-trained models and data.
Threat: Vulnerabilities introduced by third-party tools, APIs, or data.
Risks: Malicious plugin behavior, unvetted API access, outdated dependencies.
Mitigations:
- Vetting and sandboxing third-party components, applying SBOMs, and strict versioning.
LLM04: Training Data and Model Poisoning
Data poisoning can target different stages of the LLM lifecycle, including pre-training (learning from general data), fine-tuning (adapting models to specific tasks), and embedding (converting text into numerical vectors). Understanding these stages helps identify where vulnerabilities may
originate. Data poisoning is considered an integrity attack since tampering with training data impacts the model’s ability to make accurate predictions. The risks are particularly high with external data sources, which may contain unverified or malicious content.
Threat: Attackers manipulate training or fine-tuning data to introduce biases or backdoors.
Examples: Poisoned web data scraped into training sets.
Mitigations:
Dataset validation, diverse sourcing, integrity checks.
LLM05: Insecure Output Handling
Improper Output Handling refers specifically to insufficient validation, sanitization, and handling of the outputs generated by large language models before they are passed downstream to other components and systems.
The following conditions can increase the impact of this vulnerability:
- The application grants the LLM privileges beyond what is intended for end users, enabling escalation of privileges or remote code execution.
- The application is vulnerable to indirect prompt injection attacks, which could allow an attacker to gain privileged access to a target user’s environment.
- 3rd party extensions do not adequately validate inputs.
- Lack of proper output encoding for different contexts (e.g., HTML, JavaScript, SQL)
- Insufficient monitoring and logging of LLM outputs
- Absence of rate limiting or anomaly detection for LLM usage
Threat: LLM-generated content (e.g., HTML, code) is executed or rendered unsafely.
Examples: Cross-site scripting (XSS), command injection via generated shell commands.
Mitigations:
- Treat output as untrusted; sanitize or validate before use; apply secure-by-design output layers.
LLM06: Excessive Agency
Excessive Agency is the vulnerability that enables damaging actions to be performed in response to unexpected, ambiguous or manipulated outputs from an LLM, regardless of what is causing the LLM to malfunction
Threat: LLMs are given too much decision-making or operational autonomy.
Risks: Autonomous actions (e.g., sending emails, writing files) without oversight.
Mitigations:
Principle of least privilege, human approval for high-impact operations, and logging.
LLM07: System Prompt Leakage
Risk that the system prompts or instructions used to steer the behavior of the model can also contain sensitive information that was not intended to be discovered. Accordingly, sensitive data such as credentials, connection
strings, etc. should not be contained within the system prompt language.
Threat: Internal prompts (e.g., role instructions) are leaked to users or attackers.
Impacts: Enables jailbreaks, system manipulation, or exposure of internal logic.
Mitigations: Masking, prompt obfuscation, clear separation of user/system prompts.
LLM08: Insecure Vector/Embedding Handling
Retrieval Augmented Generation (RAG) is a model adaptation technique that enhances the performance and contextual relevance of responses from LLM Applications, by combining pre-trained language models with external knowledge sources.Retrieval Augmentation uses vector
mechanisms and embedding.
Weaknesses in how vectors and embeddings are generated, stored, or retrieved can be exploited by malicious actions (intentional or unintentional) to inject harmful content, manipulate model outputs, or access sensitive information.
Threat: Malicious data embedded in vector databases (e.g., in RAG) can poison LLM outputs.
Risks: Hallucinations, misleading responses, or vector-based injections.
Mitigations: Vector sanitation, embedding validation, content provenance tracking.
LLM09: Misinformation and Hallucinations
One of the major causes of misinformation is hallucination — when the LLM generates content that seems accurate but is fabricated.
Hallucinations occur when LLMs fill gaps in their training data using statistical patterns, without truly understanding the content. As a result, the model may produce answers that sound correct but are completely unfounded.
Threat: LLMs confidently generate false or misleading outputs.
Examples: Fake citations, incorrect medical advice, or biased reasoning.
Mitigations: Grounding (e.g., RAG), factuality scoring, human oversight.
LLM10: Unbounded Resource Consumption
Unbounded Consumption occurs when a Large Language Model
(LLM) application allows users to conduct excessive and uncontrolled inferences, leading to risks such as denial of service (DoS), economic losses, model theft, and service degradation.
The high computational demands of LLMs, especially in cloud environments, make them vulnerable to resource exploitation and unauthorized usage.
Threat: LLMs consume excessive compute, storage, or network resources.
Risks: DoS conditions, increased cloud costs, API abuse.
Mitigations: Usage limits, rate limiting, cost-aware design, anomaly detection.
Strategic Takeaways for Security and Development Teams
Shift Left: Integrate security from the design phase, especially for prompt handling and plugin management.
Test Like an Attacker: Red team LLMs using indirect inputs, multilingual prompts, or multimodal attacks.
Least Privilege: Apply granular control to what the LLM can see, say, and do.
Human Oversight: Critical actions should always require validation.
Explainability: Log and document LLM decisions, especially for regulated environments.
Need Help?
The functionality discussed in this post, and so much more, are available via the SOCFortress platform. Let SOCFortress help you and your team keep your infrastructure secure.
Website: https://www.socfortress.co/
Contact Us: https://www.socfortress.co/contact_form.html
目录
最新
- Build Your Own SIEM: Why These Open-Source Tools Just Work
- Deploying an AI Honeypot with Beelzebub + OpenAI: Smarter Traps for Smarter Attackers
- Open Source SIEM Response Made Simple: Dynamic Endpoint Actions with SOCFortress CoPilot
- Introducing Wazuh SCA & Vulnerability Overview Dashboards in CoPilot
- SOCFortress CoPilot Update: Expanding Our AI Chatbot with Threat Intel, Cyber News, Knowledge Base…
- SonicWall urges admins to disable SSLVPN amid rising attacks
- MCP and A2A in AI Agent Protocols — Security considerations (III) — Man-in-the-Prompt Attacks
- Palo Alto Unit 42’s Attribution Framework