Filtering in ES|QL using full text search

8.17 included match and qstr functions in ES|QL, that can be used to perform full text filtering. 8.18 removed limitations on their usage. This article describes what they do, how they can be used, the difference with the existing text filtering methods, current limitations and future improvements.

Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.

ES|QL now includes full text functions that can be used to filter your data using text queries. We will review the available text filtering methods and understand why these functions provide a better alternative. We will also look at the future improvements for full text functions in ES|QL.

Filtering text with ES|QL

Text data in logs is critical for understanding, monitoring, and troubleshooting systems and applications. The unstructured nature of text allows for flexibility in capturing all sorts of information.

Being unstructured, we need ways of isolating specific patterns, keywords, or phrases. Be it searching for an error message, narrowing down results using tags, or looking for a specific host name, are things that we do all the time to refine our results and eventually obtain the information we're looking for.

ES|QL provides different methods to help you work with text. Elasticsearch 8.17 adds the full text functions match and qstr in tech preview to help tackle more complex search use cases.

Limitations of text filtering

ES|QL already provided text filtering capabilities, including:

Text filtering is useful - but it can fall short on text oriented use cases:

Multivalued fields

Using ES|QL functions with multivalued fields can be tricky - functions return null when applied to a multivalued field.

If you need to apply a function to a multivalued field, you first need to transform the value to a single value using MV_CONCAT so you can match on a single value:

Analyzed text

Analyzers are incredibly useful for full text search as they allow transforming text. They allow us to extract and modify the indexed text, and modify the queries so we can maximize the possibility of finding what we're looking for.

Text is not analyzed when using text filtering. This means for example that you need to match the text case when searching, or create regexes / patterns that address possible case differences.

This can become more problematic when looking for multilingual text (so you can't use ASCII folding), trying to match on parts of paths (path hierarchy), or removing stopwords.

Performance

Pattern matching and regexes take time. Lucene can do a lot of the heavy lifting by creating finite automata to match using the indexed terms dictionary, but nonetheless it's a computationally intensive process.

As you can see in our 8.17 release blog, using regular expressions can be up to 50-1000x slower than using full text functions for text filtering, depending on your data set.

Enter full text functions

Elasticsearch 8.17 and Serverless introduced two new functions as tech preview for text matching: MATCH and query string (abbreviated QSTR).

These functions address some of the limitations that existed for text filtering:

  • They can be used directly on multivalued fields. They will return results when any of the values in a multivalued field matches the query.
  • They use analyzers for text fields. The query will be analyzed using any existing analyzers for the target fields, which will allow matching regardless of case. This also unlocks ASCII folding, removing stopwords, and even using synonyms.
  • They are performant. Instead of relying on pattern matching or regular expressions, they can directly use Lucene index structures to locate specific terms in your data.

MATCH function

MATCH allows matching a value on a specific field:

Match function uses a match query under the hood. This means that it will create a boolean query when multiple terms are used, with OR as the default operator for combining them.

Match function currently has some limitations:

  • It does not provide a way to specify parameters. It will use the defaults for the match query.
  • It can only be used in WHERE clauses.
  • It can't be used after a STATS or LIMIT command

The following limitations exist in 8.17 version:

  • Only text or keyword fields can be used with MATCH.
  • MATCH can be combined with other conditions as part of an AND expression, but not as part of an OR expression. WHERE match(message, "connection lost") AND length(message) > 10 can be used, but not WHERE match(message, "connection lost") OR length(message) > 10.

These limitations have been removed in 8.18 version and in Elastic Cloud Serverless, which is continuously up to date with our new work.

Match operator

The match operator (:) is equivalent to the match function above, but it offers a more succinct syntax:

It is more convenient to use the match operator, but you can use whichever makes more sense to you.

Match operator has the same limitations as the match function.

Query string function

Query string function (QSTR) uses the query string syntax to perform complex queries on one or several fields:

Query string syntax allows to specify powerful full text options and operations, including fuzzy search, proximity searches and the use of boolean operators. Refer to the docs for more details.

Query string is a very powerful tool, but currently has some limitations, very similar to the MATCH function:

  • It does not provide a way to specify parameters like the match type or specifying the default fields to search for.
  • It can only be used in WHERE clauses.
  • It can't be used after STATS or LIMIT commands
  • It can't be used after commands that modify columns, like SHOW, ROW, DISSECT, DROP, ENRICH, EVAL, GROK, KEEP, MV_EXPAND, or RENAME

Similar to the MATCH function, we have a limitation for the OR conditions in version 8.17, which has been removed in 8.18 and Elastic Cloud Serverless.

What's next

What's coming for full text search? Quite a few things have been introduced in 8.18:

  • Adding tuning options for the behaviour of MATCH and QSTR functions
  • An additional KQL function that can be used to port your existing Kibana queries to ES|QL

We also added scoring, so you can start using ES|QL for relevance matching and not just for filtering. This is quite exciting as this will define how the future of text search will be like in Elasticsearch!

Check out ES|QL - Introducing scoring and semantic search for an overview of changes in 8.18 for text search.

Give it a try

MATCH and QSTR are available as tech preview on Elasticsearch 8.17, and of course they are always up to date in Serverless.

What are you looking for in terms of text filtering? Let us know your feedback!

Happy full text filtering!

Related content

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself