ELK Search Operations| Part5

  • QueryLite with ES.
  • Mandatory URL encoding while using QueryLite with ES.
  • RequestBody based Search with ES.
  • Boolean-Query to combine Search & Filter operations.
  • Types of Filters like term, range, exists, missing, bool.
  • Types of Queries like match, multi_match, all_match.
  • Text & English Analyser.
  • Match Phrase Search.
  • Slop Based Search using Match Phrase Search.
curl --location --request GET 'localhost:9200/movies/_search?q=title:trek'
{
"took": 26,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.456388,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 1.456388,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
}
]
}
}
curl --location --request GET 'http://localhost:9200/movies/_mappings'
{
"movies": {
"mappings": {
"properties": {
"genre": {
"type": "keyword"
},
"id": {
"type": "integer"
},
"title": {
"type": "text",
"analyzer": "english"
},
"year": {
"type": "date"
}
}
}
}
}
curl --location --request GET 'localhost:9200/movies/_search?q=genre:action'{
"took": 25,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
curl --location --request GET 'localhost:9200/movies/_search?q=+year:%3E2010+title:trek'{
"took": 365,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 2.456388,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 2.456388,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
},
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 1.0,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
},
{
"_index": "movies",
"_type": "_doc",
"_id": "109487",
"_score": 1.0,
"_source": {
"id": "109487",
"title": "Interstellar",
"year": 2014,
"genre": [
"Sci-Fi",
"IMAX"
]
}
}
]
}
}
curl --location --request GET 'localhost:9200/movies/_search?q=%2Byear%3A%3E2010+%2Btitle%3Atrek'{
"took": 8,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 2.456388,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 2.456388,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
}
]
}
}
  • These queries can get pretty cryptic and tough to debug, so while it is powerful and you can cram in just about any kind of query that you want onto that URL parameter, it gets pretty ugly, pretty quickly. So you’re always better off having a structured json request, where you can see what’s going on and are it’s more structured and in a manner that makes sense for lack of a better word.
  • It can also be a security issue. So, if you’re actually allowing end users to input these URLs somehow, you never want to give any user the ability to just send arbitrary data to your server. So — you know — that can be a dangerous thing. Any user could very easily create a search query string that, on that URL, does some incredibly intensive operation that brings down your cluster. So, definitely, you want to make sure that you don’t open this up to end users.
  • It’s also fragile. Again, these parameters can get very cryptic very quickly and one wrong character and your host. It’s tough to figure out what’s going on oftentimes, it’s really getting back to the first point of it
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match": {
"title": "star"
}
}
}'
{
"took": 118,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.919734,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 0.919734,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
},
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 0.666854,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
}
]
}
}
  • Queries → Queries are usually used for returning data in terms of relevance. When we’re doing something like searching for the search term star, you would use a query because you want to get back results in orders of relevance as to how relevant the term star was to that given document.
  • Filter → However, if you have a binary operation where the answer required is basically yes or no, then you would want to use a filter instead of filters, because filters are much more efficient than queries. Not only are they faster but the results can be cached by ElasticSearch, so that if you do another query using the same filter, they’ll get back the results even faster.
  • By using must term in the title field having value as trek, we are saying that this query must contain the term trek, within the title to be a valid result.
  • But we are going further to filter that result by having a range filter that contains the year greater than or equal to 2010.
curl — location — request GET ‘localhost:9200/movies/_search’ \
— header ‘Content-Type: application/json’ \
— data-raw ‘{
“query”: {
“bool”: {
“must” : {“term” : {“title” : “trek”}},
“filter” : {“range” : {“year” : {“gte” : 2010}}}
}
}
}’
{
"took": 15,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.456388,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 1.456388,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
}
]
}
}
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"bool": {
"must": {"term": {"title": "trek"}},
"filter": {"term": {"year": "2016"}}
}
}
}'
{
"took": 15,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.456388,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 1.456388,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
}
]
}
}
curl --location --request GET 'http://localhost:9200/movies/_mappings'
{
"movies": {
"mappings": {
"properties": {
"genre": {
"type": "keyword"
},
"id": {
"type": "integer"
},
"title": {
"type": "text",
"analyzer": "english"
}
,
"year": {
"type": "date"
}
}
}
}
}
  • Partial matches.
  • Normalising for lowercase and uppercase.
  • Synonyms based search.
  • Apply stop words.
  • Synonyms that might be specific to the English language.
curl — location — request GET ‘localhost:9200/movies/_search’ \
— header ‘Content-Type: application/json’ \
— data-raw ‘{
“query” : {
“match” : {
“title” : “Star Trek”
}
}
}’
{
"took": 676,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 2.129195,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 2.129195,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
},
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 0.5935682,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
}
]
}
}
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match": {
"title": "star wars"
}
}
}'
{
"took": 137,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.7228094,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 1.7228094,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
},
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 0.919734,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
}
]
}
}
  • We get back 2 results having following words in the title : Star Trek and Star Wars, because the terms star and wars in our match query are being treated independently, and any title that has either the term star or the term wars, is considered a potential hit.
  • The relevance will favour the documents that have both star and wars in it, but we’re still getting Star Trek even though that we search for Star Wars.
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match_phrase": {
"title": "star wars"
}

}
}'
{
"took": 61,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.7228093,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 1.7228093,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
}
]
}
}
  • We did not get the Star Trek movie, we only got the Star Wars movie because we are requiring the phrase Star Wars.
  • Those two terms need to occur right next to each other in order to get a hit back on match phrase. So, that’s the difference between match and match phrase. Note that, match will just treat those terms independently, whereas match phrase requires that they occur together.
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match_phrase": {
"title": "star beyond"
}

}
}'
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
  • Example #1: I want to search for “quick fox”, but I wanted the phrase “quick brown fox” to actually still match, I could say quick fox with a slop of one.
  • Example #2: By saying “star beyond” with a slop of one, that would actually match “Star Trek Beyond”, but it would also match “Star Wars beyond” if such a film existed.
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match_phrase": {
"title": {
"query" : "star beyond",
"slop" : 1
}
}
}
}'
{
"took": 26,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.5607002,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 1.5607002,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
}
]
}
}
  • These 2 terms(“star” & “beyond”) don’t directly occurs together, but they occur at a difference of 1 unit i.e. ONE word.
  • This phrase-query on star beyond, with a slop of one → So, if there’s another word in the middle there somewhere or maybe it’s the other order, that’s okay, that will still match.
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match_phrase": {
"title": {
"query" : "star force",
"slop" : 100
}
}
}
}'
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.47656298,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 0.47656298,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
}
]
}
}

--

--

Software Engineer for Big Data distributed systems

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
aditya goel

aditya goel

Software Engineer for Big Data distributed systems