RAG and the value of grounding

Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!

Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.

Large language models (LLMs) are able to generate coherent answers, but when you need real and updated information, they might hallucinate (make up data) and give unreliable answers. To prevent this, we use grounding to provide the models with specialized, use-case-specific, and context-relevant information that goes beyond the LLM’s training.

Grounding is the process by which you connect specific data sources to a model to “ground” it to truthful content instead of only relying on the patterns learned during the model’s training, thus giving more reliable and accurate answers.

Grounding helps reduce model hallucinations, generate responses based on your data sources, and allows you to examine the answers by providing citations for them.

Retrieval Augmented Generation (RAG) is a grounding technique where you use search algorithms to retrieve relevant information from external sources, then you use that information as context for the LLM, and finally the model uses the augmented context together with their original training data to generate an answer.

Flow diagram of how RAG works:

RAG allows you to easily scale by updating or expanding the external data sources the model has access to. It is also a cost-effective alternative to fine-tuning LLMs since you can just add data without extensive customization.

And since RAG can access and utilize up-to-date information, it is ideal for use cases when the latest information is key.

Hallucination example

For this example, we’ll use DeepSeek and ask, “Who is the author who won the Chilean National Literature Prize in 1932?” This is a tricky question since the prize was created in 1942. Let’s see how the model answers:

As you can see, since the AI did not have all the information, it hallucinated and provided a made-up answer. Though the information is real in the sense that both the author and work exist, the other parts of the answer are wrong.

Now, let’s see how the model does when we ground it using RAG. For this, we will upload the Wikipedia page about the Chilean National Prize for Literature:

Now, let’s ask the same question and check the answer:

As you can see, with RAG we got the right answer. It says there was no prize in 1932 and asks for clarification from the user.

Using RAG in Playground

By using Elasticsearch, you can easily scale with only having your cluster capacity as a limit. You can use different data sources and connectors to get access to the data you need. Additionally, you have total ownership of your data since it stays in your infrastructure and is not uploaded to a 3rd party service; if you run a local LLM, your data won’t even leave your network. Finally, you have control over search by designing the queries and how to filter data based on access control (RBAC).

We will use Playground, our low-code platform that allows you to quickly and simply create a RAG application using your Elasticsearch content.

Here’s a step-by-step guide on how to upload your PDFs or other documents into Playground. You can also read more about it here and try Playground here.

Upload the PDF

We’ll index into Kibana the same PDF file we provided to DeepSeek. If you followed the instructions in the article above and created the semantic_text field, you’ll be creating a vector database with its corresponding embeddings, ready to be used.

Ask the question

Ask the following question:

“Who is the author who won the Chilean National Literature Prize in 1932?”

Playground sends this query to Elasticsearch, which in turn, runs a semantic search and localizes the fragments with information that is relevant to the question. Then, these fragments are included as context in the prompt sent to the LLM to ground the answer to the information source we provided.

Finally, Playground generated an answer saying that there was no prize in 1932 and provides citations for the relevant fragments as evidence.

Playground also offers two very useful features to understand the RAG system underlying components:

Query

You can see the query Elasticsearch is running to retrieve the relevant documents, and you can enable/disable fields based on your needs.

View code

If you can deploy your RAG application, Playground got you covered. Under the View Code tab, you can see the code used under the hood to create the entire RAG workflow. You can choose between two Python alternatives: Elasticsearch Client with OpenAI, or a Langchain based implementation.

If you want to customize the experience and deploy the code elsewhere, you can use this snippet as a starting point.

Conclusion

Grounding is a process that connects LLMs to external data sources so they can go beyond their training to provide more accurate and trustworthy answers. Retrieval Augmented Generation (RAG) is a grounding method that is scalable, cost-effective, and ensures access to up-to-date information.

Tools like Playground simplify RAG implementation by enabling large-scale indexing, customized searches, and responses with citations, which allow you to easily verify an answer and make sure you’re getting accurate and trustworthy results.

If you want to read more in-depth articles about RAG features, you can start with this one to get a more technical definition of RAG. You can also check Rag vs. fine-tuning: When RAG is the best decision, How to leverage document security using RAG and RAG systems in production.

Report an issue