ELK Search Operations| Part5

aditya goel
17 min readDec 29, 2021

--

In case you are landing here directly, it’s recommended to read through this documentation first.

Following are the topics, which we shall be touching through this blog :-

  • QueryLite with ES.
  • Mandatory URL encoding while using QueryLite with ES.
  • RequestBody based Search with ES.
  • Boolean-Query to combine Search & Filter operations.
  • Types of Filters like term, range, exists, missing, bool.
  • Types of Queries like match, multi_match, all_match.
  • Text & English Analyser.
  • Match Phrase Search.
  • Slop Based Search using Match Phrase Search.

Question: What is QueryLite with ElasticSearch ?

Answer: You can actually issue a search request without having any request body at all. You can squish it all into a URL, which makes life a little bit easier when you’re just messing around with curl and stuff like that.

Question: Can you show some simple example of querying on “movies” Index using QueryLite ?

Answer: Below query performs searches on title field having value as star.

curl --location --request GET 'localhost:9200/movies/_search?q=title:trek'
{
"took": 26,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.456388,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 1.456388,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
}
]
}
}

NOTE: Recall that, we d specified the type of ‘title’ attribute as ‘text’ and that’s why the proper full search is working as shown above.

curl --location --request GET 'http://localhost:9200/movies/_mappings'
{
"movies": {
"mappings": {
"properties": {
"genre": {
"type": "keyword"
},
"id": {
"type": "integer"
},
"title": {
"type": "text",
"analyzer": "english"
},
"year": {
"type": "date"
}
}
}
}
}

Question: Can you show some simple example of querying on “movies” Index using QueryLite, provided field (while we defined the Index — mapping) is of type “keyword” ?

Answer: In case the particular attribute is of type ‘keyword’, then it would allow proper exact search only i.e. even the case-mismatch would not be entertained.

Refer above that, we had a document in our Index “movies” which did had a movie🍿 , with genre’s value as ‘Action’, whereas in above/below query, we are passing the value of genre as ‘action’ and therefore we see ZERO search query results.

curl --location --request GET 'localhost:9200/movies/_search?q=genre:action'{
"took": 25,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}

Question: Can you show little more complex query using QueryLite ?

In aforesaid example, it would actually search for movies that both have a release year greater than the year 2010 AND have trek in the title field.

curl --location --request GET 'localhost:9200/movies/_search?q=+year:%3E2010+title:trek'{
"took": 365,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 2.456388,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 2.456388,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
},
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 1.0,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
},
{
"_index": "movies",
"_type": "_doc",
"_id": "109487",
"_score": 1.0,
"_source": {
"id": "109487",
"title": "Interstellar",
"year": 2014,
"genre": [
"Sci-Fi",
"IMAX"
]
}
}
]
}
}

Question: Note that, the results we received above, doesn’t really sounds expected. It’s evident that, we even got some records, which even doesn’t have word ‘trek’ in their title. How do you explain this situation ?

Answer: This observation is very important. The reason of Incorrect — Results being received from aforesaid query is that, such sort of syntax through browser doesn’t works, because it has all those special-characters that need to be your URL encoded.

For example, To do that you’d have to really get into some crazy syntax here, as shown below. Note that, now the query-results thus obtained looks good, as shown below :-

curl --location --request GET 'localhost:9200/movies/_search?q=%2Byear%3A%3E2010+%2Btitle%3Atrek'{
"took": 8,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 2.456388,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 2.456388,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
}
]
}
}

Question: Should we be using the QueryLite based syntax with ElasticSearch ?

Answer: You really shouldn’t be using this in production, it’s meant more for experimentation and it’s sort of a holdover from earlier versions of elastic search that they really do not encourage you to use anymore these days.

Question: Should we be OK to use the QueryLite in production environment ?

Answer: There are also some other reasons why you should not use query light in production for sure :-

  • These queries can get pretty cryptic and tough to debug, so while it is powerful and you can cram in just about any kind of query that you want onto that URL parameter, it gets pretty ugly, pretty quickly. So you’re always better off having a structured json request, where you can see what’s going on and are it’s more structured and in a manner that makes sense for lack of a better word.
  • It can also be a security issue. So, if you’re actually allowing end users to input these URLs somehow, you never want to give any user the ability to just send arbitrary data to your server. So — you know — that can be a dangerous thing. Any user could very easily create a search query string that, on that URL, does some incredibly intensive operation that brings down your cluster. So, definitely, you want to make sure that you don’t open this up to end users.
  • It’s also fragile. Again, these parameters can get very cryptic very quickly and one wrong character and your host. It’s tough to figure out what’s going on oftentimes, it’s really getting back to the first point of it

Question: Let’s proceed to learn something about the RequestBody based Search ? How does a sample request looks like ?

Answer:- The request-body looks like as below :-

curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match": {
"title": "star"
}
}
}'

And the response of the above query looks like :-

{
"took": 118,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.919734,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 0.919734,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
},
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 0.666854,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
}
]
}
}

Question: Wait..wait.. Can a have body within the GET request as well ?

Answer: When you talk about GET requests on http requests, we’re talking about a web page being retrieved from a web server…And usually, there’s no body with that request at all, but you actually can send a body along with a get request as well, that is legal, and sometimes that trips people up. It’s a legitimate thing to do.

Question: What all thing can be done with the help of RequestBody Based Search-Query ?

Answer: There’s two different things you can do in a query :-

  • Queries → Queries are usually used for returning data in terms of relevance. When we’re doing something like searching for the search term star, you would use a query because you want to get back results in orders of relevance as to how relevant the term star was to that given document.
  • Filter → However, if you have a binary operation where the answer required is basically yes or no, then you would want to use a filter instead of filters, because filters are much more efficient than queries. Not only are they faster but the results can be cached by ElasticSearch, so that if you do another query using the same filter, they’ll get back the results even faster.

Question: Can you show an example for a Boolean-Query and how it works ?

Answer:- A Boolean-Query is a bool-query, that means that you can combine things together.

  • By using must term in the title field having value as trek, we are saying that this query must contain the term trek, within the title to be a valid result.
  • But we are going further to filter that result by having a range filter that contains the year greater than or equal to 2010.
curl — location — request GET ‘localhost:9200/movies/_search’ \
— header ‘Content-Type: application/json’ \
— data-raw ‘{
“query”: {
“bool”: {
“must” : {“term” : {“title” : “trek”}},
“filter” : {“range” : {“year” : {“gte” : 2010}}}
}
}
}’

Here is the response of aforesaid query to ElasticSearch :-

{
"took": 15,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.456388,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 1.456388,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
}
]
}
}

So if you have a look at it again you can see we have a query that contains a boolean expression, where you must have track in the title AND you must also have the filter past the condition of the year being greater than or equal to 2010.

Question: What are the different kinds of filters, being available from ElasticSearch ?

Answer:- Now there are many different kinds of filters. Range is just one of them.

Example Term-filter :- So if you need to filter by some exact value of a term you can do that with the term filter. It would look like term year 2014 for example to filter out only things that contain a year that equals 2014.

curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"bool": {
"must": {"term": {"title": "trek"}},
"filter": {"term": {"year": "2016"}}
}
}
}'

And the response for the same looks like :- We have got those results, for which the field ‘title’ do necessarily contains the value ‘trek’ AND year is equal to value 2016.

{
"took": 15,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.456388,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 1.456388,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
}
]
}
}

Question: What are the different kinds of Queries, being available from ElasticSearch ?

Question: How do we remember the syntax for ElasticSearch ?

NOTE: Remember that, in case we are planning to combine two different aspects, using “bool” block is the way forward.

**** ***** ********** PHRASE SEARCH ********* ***** ***** *****

Question: What is Phrase Search in ElasticSearch ?

Answer:- Sometimes you don’t want to search for individual search terms like Star or trek or wars, you may so happen to search for phrases that are search terms together in a certain order like “Star Trek” or “star wars”.

Question: Let’s demonstrate the behaviour of how Phrase-Search differs from Normal Search ?

Part1:- Here is how the mapping for the Index “movies” looks like :-

curl --location --request GET 'http://localhost:9200/movies/_mappings'
{
"movies": {
"mappings": {
"properties": {
"genre": {
"type": "keyword"
},
"id": {
"type": "integer"
},
"title": {
"type": "text",
"analyzer": "english"
}
,
"year": {
"type": "date"
}
}
}
}
}

Part2:- What’s the meaning of type & analyzer being applied at title attribute?

Answer:- For attribute “title”, it’s type is text. It actually have an analyser applied to it and we can do things like :-

  • Partial matches.
  • Normalising for lowercase and uppercase.
  • Synonyms based search.

For attribute “title”, it’s analyser is english and we can do things like :-

  • Apply stop words.
  • Synonyms that might be specific to the English language.

Part3:- Can you demonstrate an example for Plain Simple Search on the “title” field ?

Answer:- Let’s perform the simple-search on the title having “Star Trek” ?

Answer:- This shall be the query, we would be issuing to the ES :-

curl — location — request GET ‘localhost:9200/movies/_search’ \
— header ‘Content-Type: application/json’ \
— data-raw ‘{
“query” : {
“match” : {
“title” : “Star Trek”
}
}
}’

Following are the results, we have obtained on this search :-

{
"took": 676,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 2.129195,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 2.129195,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
},
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 0.5935682,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
}
]
}
}

So we got back : both Star Trek and Star Wars movies as a result of that query for Star Trek. And again the reason is, because when we have an analysed text field like our titles here we can actually have partial matches come back.

Conclusion :- So, by searching for Star Trek, we got Star Trek Beyond the top hit but also Star Wars because that was a partial hit on the search terms within Star Trek, however the score was a little bit lower and that’s a good thing at-least.

Question: How does this searching works ? Why at all, the movie having “Star Wars” in title, did appeared upon search of “Star Trek” ?

Answer:- The search term that we put in Star Trek got brought broken up into two unique search terms and looking at the inverted index for the index, that map back to Star Wars and Star Trek, because those both match the term star at least.

Part4:- Can you show-case yet another example for Plain Simple Search on the “title” field ?

Answer:- Let’s perform the simple-search on the title having “star wars”. This shall be the query, we would be issuing to the ES :-

curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match": {
"title": "star wars"
}
}
}'

And the results thus obtained are :-

{
"took": 137,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.7228094,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 1.7228094,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
},
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 0.919734,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
}
]
}
}

And the explanation for thus obtained results are :-

  • We get back 2 results having following words in the title : Star Trek and Star Wars, because the terms star and wars in our match query are being treated independently, and any title that has either the term star or the term wars, is considered a potential hit.
  • The relevance will favour the documents that have both star and wars in it, but we’re still getting Star Trek even though that we search for Star Wars.

Part5:- Can you show-case an example for Phrase Search on the “title” field ?

Answer:- Let’s perform the phrase-search on the title having “star wars”. This shall be the query, we would be issuing to the ES :-

curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match_phrase": {
"title": "star wars"
}

}
}'

And the results thus obtained are :-

{
"took": 61,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.7228093,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 1.7228093,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
}
]
}
}

And the explanation for thus obtained results are :-

  • We did not get the Star Trek movie, we only got the Star Wars movie because we are requiring the phrase Star Wars.
  • Those two terms need to occur right next to each other in order to get a hit back on match phrase. So, that’s the difference between match and match phrase. Note that, match will just treat those terms independently, whereas match phrase requires that they occur together.

Part6:- Can you show-case yet another example for Phrase Search on the “title” field ?

Answer:- Let’s perform the phrase-search on the title having “star beyond”. This shall be the query, we would be issuing to the ES :-

curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match_phrase": {
"title": "star beyond"
}

}
}'

And the results thus obtained are :-

{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}

And the explanation for thus obtained results are :- We did not get any result because, match phrase requires these 2 terms(“star” & “beyond”) to occur together.

Question: How does Phrase Search works exactly in ElasticSearch ?

Answer:- The way this works is that, in your inverted index it doesn’t only store that a given search term occurs inside a document, it also stores the order in which those terms occur. So, you can actually use that information about the ordering of the occurrence of each term to piece together where phrases exist, that’s how it works under the hood.

Question: Can we perform the Slop Based Search with ElasticSearch too ?

Answer: Let’s understand, what exactly is Slop Based Search first — Let’s say you care about the order of search terms, but you don’t care that they’re exactly right next to each other, there is something called the slop value and the slop represents how far you’re willing to let a term move to satisfy a phrase in either direction.

Question: Share some example for Slop Based Search with ElasticSearch ?

Answer: For example :-

  • Example #1: I want to search for “quick fox”, but I wanted the phrase “quick brown fox” to actually still match, I could say quick fox with a slop of one.
  • Example #2: By saying “star beyond” with a slop of one, that would actually match “Star Trek Beyond”, but it would also match “Star Wars beyond” if such a film existed.

Following is the sample query :-

curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match_phrase": {
"title": {
"query" : "star beyond",
"slop" : 1
}
}
}
}'

Following are the results thus obtained :-

{
"took": 26,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.5607002,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 1.5607002,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
}
]
}
}

And the explanation for thus obtained results are :-

  • These 2 terms(“star” & “beyond”) don’t directly occurs together, but they occur at a difference of 1 unit i.e. ONE word.
  • This phrase-query on star beyond, with a slop of one → So, if there’s another word in the middle there somewhere or maybe it’s the other order, that’s okay, that will still match.

Question: Can Slop Based Search with ElasticSearch also help us with Reverse-Search too ?

Answer: Yes, definitely. It can also let you do things like allow a reversal of a phrase. So actually “star beyond”, would actually match “beyond star” if the slop was set to one.

Question: Can Slop Based Search with ElasticSearch also help us to perform the Proximity Based Search (Slop value too high e.g. 100) ?

Answer: Slop Based Search can also come in handy if you’re just trying to do basically a proximity query, where maybe it’s not really a phrase you’re searching for, but you want to give higher relevance to documents that have these two terms close together. Here is an example for the same :- Let’s say that, I just want to get back results in order of relevance where words star and force appear close together. If I said star force with a slop of 100 for example, a really high number, that would give me back any document that has the term star and force within 100 terms of each other, but it would assign a higher relevance score to documents that have them closer together, so that can also come in handy. Here the query is :-

curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match_phrase": {
"title": {
"query" : "star force",
"slop" : 100
}
}
}
}'

Response thus obtained is :-

{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.47656298,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 0.47656298,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
}
]
}
}

And the explanation for thus obtained results are :- These 2 terms(“star” & “force”) don’t directly occurs together, but they occur at a difference of 5 units i.e. at a gap of 5 words, which is lesser than 100.

That’s all in this section. If you liked reading this blog, kindly do press on clap button multiple times, to indicate your appreciation. We would see you in next series.

References :-

--

--

aditya goel
aditya goel

Written by aditya goel

Software Engineer for Big Data distributed systems

No responses yet