Fetches relevant documents from a knowledge base before generating an answer โ grounding the LLM in real data instead of potentially hallucinated training memory.
When a user asks a question, RAG converts it into an embedding vector and searches a vector database for the most semantically similar documents. Those documents are injected into the LLM prompt as context. The LLM then generates an answer grounded in that real, retrieved data โ not its potentially outdated training memory.
This is the most important pattern for enterprise AI because it separates the knowledge store from the reasoning engine. You can update your documents without retraining the model. The LLM stays the same; your data changes freely.
Think of it as giving the LLM a "cheat sheet" before every answer. Without RAG, the LLM guesses from memory. With RAG, it reads the relevant facts first.