Retrieval and grounding: how AI uses your data
Retrieval-augmented generation feeds a model the relevant documents at answer time so it responds from your sources instead of memory — the dominant enterprise pattern, and what "AI over our knowledge base" almost always means.
Working and Deep are Pro — free during launch.
A general-purpose model arrives knowing a great deal about the world and nothing specific about your organization. It has never seen your contracts, your policy manual, last quarter's board deck, or the support ticket a customer opened this morning. Worse, even for the public facts it did absorb during training, it stores them as statistical patterns rather than as a lookup table, so asking it to recall a precise number or a clause from memory is unreliable. Retrieval-augmented generation, almost always shortened to RAG, is the dominant way enterprises close that gap without retraining the model: when a question comes in, the system first searches a collection of your documents, pulls back the few passages that actually bear on the question, places them in front of the model, and instructs it to answer from that supplied material. The phrase to internalize is that the model is doing an open-book exam, not a from-memory one. This single architecture is what is meant, the vast majority of the time, by 'AI over our knowledge base' or 'an assistant trained on our docs' — in practice almost nothing is trained; the documents are retrieved at the moment of the question.
It helps to recognize the moving parts by name, because vendors will use these words and you want to know what they are describing. First, your documents are split into chunks — passages of a few hundred words rather than whole files — so that retrieval can return a tight, relevant slice instead of a 90-page PDF. Each chunk is converted into an embedding, a long list of numbers that encodes the chunk's meaning, and those numbers are stored in a vector database, a system built to find the numerically closest items fast. When a question arrives it is embedded too, and the database returns the chunks whose meaning sits nearest to the question's meaning. This is called semantic search: it matches on meaning rather than exact keywords, so a question about 'time off' can surface a policy titled 'paid leave' even though the words never overlap. The retrieved chunks are pasted into the model's working context along with the question, and only then does the model write the answer. Because the source passages are right there, a well-built system can also show you which documents it leaned on — the citations that make the answer checkable.
The reason this pattern won the enterprise, rather than the obvious-seeming alternative of training the model on your data, is mechanical and worth understanding. A model's knowledge lives in its weights as diffuse patterns, not as discrete facts you can edit; teaching it something new means an expensive retraining run, the new fact is still not reliably retrievable on demand, and you cannot point to where the answer came from. Retrieval keeps the knowledge in a store you can inspect, update, and audit. Change a policy document this morning and a RAG system reflects the change this afternoon, with no model work at all. The mature division of labor that most serious teams have settled on is to use training or fine-tuning to shape how a model behaves and formats answers, and to use retrieval for the facts themselves — keeping the facts in a place that is current, citable, and governed rather than dissolved into parameters where none of that is possible.
None of this makes the output automatically trustworthy, and this is the part that most often surprises leaders the first time a pilot disappoints. 'We use RAG' is the beginning of a diligence conversation, not the end of one. Quality turns on two distinct things that fail independently. The first is retrieval quality: did the search actually surface the passages that contain the answer? If the right document never makes it into the model's view, the model will answer anyway, smoothly and wrongly, because it has no way to say 'the relevant page wasn't retrieved.' The second is faithfulness: given the passages it did receive, did the answer stay true to them, or did the model blend in something plausible from its general training that the sources never said? Grounding lowers the rate of invented answers — that is its central justification, and the surveyed literature treats reducing hallucination, stale knowledge, and untraceable reasoning as RAG's founding motivations — but it does not drive that rate to zero, which is why the sibling concept on why models make things up matters here too.
A third question sits underneath both and is the one most likely to become a headline if it is ignored: governance. A retrieval index is, by construction, a fast way to surface any document inside it. If the index does not enforce who is allowed to see what, then a file that was merely forgotten in a shared drive becomes an instant, fluent answer for someone who should never have seen it. Security guidance for these systems flags sensitive-information disclosure and prompt injection — where text hidden inside a retrieved document tries to hijack the model's instructions — as first-order risks precisely because retrieval widens the surface. The well-run version respects existing access controls at retrieval time, so the model can only draw on what the asker is already entitled to read. This connects directly to the data-boundaries concept, where what an AI system is allowed to see is treated as a governance decision rather than a technical default.
The practical questions an executive can carry into any review of a grounded system are short and revealing. What is the source corpus, and how fresh is it kept — a knowledge base that quietly stops updating is the most common way these systems decay. Does the system cite, and can a reviewer actually click through and confirm the cited passage says what the answer claims? Does retrieval honor the access permissions that already exist elsewhere in the business, or was the index built by dumping everything into one pool? And how is 'the right answer' being measured — against a real test set of questions with known answers — rather than assumed because the demo went well? None of these requires technical depth to ask, and the quality of the answers tells you a great deal about whether you are looking at a governed system or a promising prototype.
Where this comes from.
- Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks", 2020
- Gao et al., "Retrieval-Augmented Generation for Large Language Models: A Survey", 2023-2024
- Mikolov et al., "Efficient Estimation of Word Representations in Vector Space" (word2vec), 2013
- Liu et al., "Lost in the Middle: How Language Models Use Long Contexts", 2023
- OWASP Top 10 for Large Language Model Applications
- NIST AI Risk Management Framework