Retrieval-Augmented Generation (RAG) bridges the gap between static LLMs and dynamic data. The RAG layer includes vector databases, embedding models, and retrieval pipelines.
sequenceDiagram
participant A as Attacker
participant D as Vector DB
participant R as Retriever
participant L as LLM
A->>D: Data Poisoning (Inject Docs)
R->>D: Fetch Context
D-->>R: Returns Malicious Docs
R->>L: Context + Prompt
L-->>A: Malicious Output / Hijack
Attackers insert malicious documents into the knowledge base.
- Payload: Hidden instructions, misinformation, or sensitive data.
- Impact: Permanent alteration of the AI's "knowledge."
Attackers craft documents whose embeddings are mathematically close to common user queries.
- Payload: Documents that "win" the similarity search and get retrieved every time.
- Impact: Dominating the AI's response window.
Instructions embedded within retrieved text that override the system prompt.
- Impact: Turning a "Helpful Assistant" into a "Malicious Proxy."
| Metric | Ideal Score | Risk Indicator |
|---|---|---|
| Retrieval Precision | > 95% | Irrelevant data retrieved |
| Poisoning Resistance | High | Unvalidated document ingestion |
| Source Diversity | Balanced | One source dominates context |
| Delimiter Consistency | Strict | Model confuses data with instructions |
- Ingestion Validation: Can I upload a PDF with hidden instructions?
- Embedding Sensitivity: Does changing one word in a query drastically change retrieval?
- Context Isolation: If a retrieved doc says "IGNORE ALL RULES", does the model comply?
- Search Manipulation: Can I craft a document that is always retrieved for specific keywords?
Tip
Trust but Verify: Treat all retrieved context as "Untrusted User Input."
- Semantic Filtering: Use a secondary LLM to scan retrieved chunks for instructions before passing them to the main model.
- Metadata Validation: Ensure retrieval only occurs from trusted, authenticated sources.
- Strict Delimiting: Use clear, non-spoofable boundaries between "Retrieved Context" and "System Instructions."
- Ranker Tuning: Use a cross-encoder to re-rank results and filter out low-confidence or anomalous matches.