How to ingest data to Elasticsearch through LlamaIndex

Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!

Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.

In this article, we will implement a search engine for FAQs using LlamaIndex to index the data. Elasticsearch will serve as our vector database, enabling vector search, while RAG (Retrieval-Augmented Generation) will enrich the context, providing more accurate responses.

Building an FAQ search with LlamaIndex and Elasticsearch

What is LlamaIndex?

LlamaIndex is a framework that facilitates the creation of agents and workflows powered by Large Language Models (LLMs) to interact with specific or private data. It allows the integration of data from various sources (APIs, PDFs, databases) with LLMs, enabling tasks such as research, information extraction, and generation of contextualized responses.

Key concepts:

Agents: Intelligent assistants that use LLMs to perform tasks, ranging from simple responses to complex actions.
Workflows: Multi-step processes that combine agents, data connectors, and tools for advanced tasks.
Context augmentation: A technique that enriches the LLM with external data, overcoming its training limitations.

LlamaIndex integration with Elasticsearch:

Elasticsearch can be used in various ways with LlamaIndex:

Data source: Use the Elasticsearch Reader to extract documents.
Embeddings model: Encode data into vectors for semantic searches.
Vector storage: Use Elasticsearch as a repository for searching vectorized documents.
Advanced storage: Configure structures such as document summaries or knowledge graphs.

Building an FAQ search with LlamaIndex and Elasticsearch

Data Preparation

We will use the Elasticsearch Service FAQ as an example. Each question was extracted from the website and saved in an individual text file. You can use any approach to organize the data; in this example, we chose to save the files locally.

Example file:

After saving all the questions, the directory will look like this:

Installation of Dependencies

We will implement the ingestion and search using the Python language, the version I used was 3.9. As a prerequisite, it will be necessary to install the following dependencies:

Elasticsearch and Kibana will be created with Docker, configured via docker-compose.yml to run version 8.16.2. This makes it easier to create the local environment.

Document Ingestion

The documents will be indexed into Elasticsearch using LlamaIndex. First, we load the files with SimpleDirectoryReader, which allows loading files from a local directory. After loading the documents, we will index them using the VectorStoreIndex.

Vector Stores in LlamaIndex are responsible for storing and managing document embeddings. LlamaIndex supports different types of Vector Stores, and in this case, we will use Elasticsearch. In the StorageContext, we configure the Elasticsearch instance. Since the context is local, no additional parameters were required. For configurations in other environments, refer to the documentation to check the necessary parameters: ElasticsearchStore Configuration.

By default, LlamaIndex uses the OpenAI text-embedding-ada-002 model to generate embeddings. However, in this example, we will use the text-embedding-3-small model. It is important to note that an OpenAI API key will be required to use the model.

Below is the complete code for document ingestion.

After execution, the documents will be indexed in the faq index as shown below:

Search with RAG

To perform searches, we configure the ElasticsearchStore client, setting the index_name and es_url fields with the Elasticsearch URL. In retrieval_strategy, we defined the AsyncDenseVectorStrategy for vector searches. Other strategies, such as AsyncBM25Strategy (keyword search) and AsyncSparseVectorStrategy (sparse vectors), are also available. More details can be found in the official documentation.

Next, a VectorStoreIndex object will be created, where we configure the vector_store using the ElasticsearchStore object. With the as_retriever method, we perform the search for the most relevant documents for a query, setting the number of results returned to 5 through the similarity_top_k parameter.

The next step is RAG. The results of the vector search are incorporated into a formatted prompt for the LLM, enabling a contextualized response based on the retrieved information.

In the PromptTemplate, we define the prompt format, which includes:

Context ({context_str}): documents retrieved by the retriever.
Query ({query_str}): the user's question.
Instructions: guidelines for the model to respond based on the context, without relying on external knowledge.

Finally, the LLM processes the prompt and returns a precise and contextual response.

The complete code is below:

Now we can perform our search, for example, "Elastic services are free?" and get a contextualized response based on the FAQ data itself.

To generate this response, the following documents were used:

Conclusion

Using LlamaIndex, we demonstrated how to create an efficient FAQ search system with support for Elasticsearch as a vector database. Documents are ingested and indexed using embeddings, enabling vector searches. Through a PromptTemplate, the search results are incorporated into the context and sent to the LLM, which generates precise and contextualized responses based on the retrieved documents.

This workflow integrates information retrieval with contextualized response generation to deliver accurate and relevant results.