ELK Enhanced Search Operations | Part6

  • Boolean query example with ES.
  • Pagination with ES.
  • Sorting of results on fields.
  • Applying Filters on fields.
  • Query & Filter Examples
  • Fuzzy Search.
  • Prefix search.
  • Wildcard search.
  • Auto-suggestion.
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"bool": {
"must": {"match_phrase": {"title": "Star Wars"}},
"filter": {"range": {"year": {"gte": 1980}}}
}
}
}'
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.7228093,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 1.7228093,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,

"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
}
]
}
}
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"from" : 2,
"size" : 2,

"query" : {
"match" : {
"genre" : "Sci-Fi"
}
}
}'
{
"took": 17,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,

"relation": "eq"
},
"max_score": 0.40025333,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 0.40025333,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
},
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 0.40025333,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
}
]
}
}
  • You should enforce an upper bound on how many results you’ll return to your users, otherwise some nasty person will abuse your system and bring your system to its knees.
  • Even Web-sites like Google, have upper bounds on how many results they return for this reason.
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"from" : 2,
"size" : 2,

"query" : {
"match" : {
"genre" : "Sci-Fi"
}
}
}'
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,

"relation": "eq"
},
"max_score": 0.40025333,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "109487",
"_score": 0.40025333,
"_source": {
"id": "109487",
"title": "Interstellar",
"year": 2014,
"genre": [
"Sci-Fi"
,
"IMAX"
]
}
},
{
"_index": "movies",
"_type": "_doc",
"_id": "1924",
"_score": 0.40025333,
"_source": {
"id": "1924",
"title": "Plan 9 from Outer Space",
"year": 1959,
"genre": [
"Horror",
"Sci-Fi"
]
}
}
]
}
}
curl --location --request GET 'localhost:9200/movies/_search?sort=year'
  • If you have a text field, like the “title” in our movie dataset, those are going to be analysed for full text search so that you can do partial matches and get back fuzzy queries.
  • You can’t use that, for sorting documents, because the inverted index just contains the individual terms of that title.
  • We can do partial-matching, but the actual entire string as a whole is not being stored, so we can’t sort by the actual movie title itself.
  • Here, we have title field, we’re saying that the title field itself remains as a text type. That means that it is analysed for full text search.
  • Along with that, we’re also creating a field within that called raw and that is being analyzed as a keyword type, which, as you may recall, is not analyzed — that just stores a straight up copy of the title in the raw field.
curl --location --request DELETE 'localhost:9200/movies'
curl --location --request PUT 'localhost:9200/movies' \
--header 'Content-Type: application/json' \
--data-raw '{
"mappings": {
"properties": {
"title" : {
"type" : "text",
"fields" : {"raw" : {"type" : "keyword"}}
}
}
}
}'
curl --location --request PUT 'localhost:9200/_bulk' \
--header 'Content-Type: application/json' \
--data-binary '@/Users/aditya/Documents/LEARNINGS/MEDIUM-BLOG/ElasticSearch/ml-latest-small/movies.json'
{
"movies": {
"mappings": {
"properties": {
"genre": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},

"year": {
"type": "long"
}
}
}
}
}
curl --location --request GET 'localhost:9200/movies/_search?sort=title.raw'{
"took": 803,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "58559",
"_score": null,
"_source": {
"id": "58559",
"title": "Dark Knight, The",
"year": 2008,
"genre": [
"Action",
"Crime",
"Drama",
"IMAX"
]
},
"sort": [
"Dark Knight, The"
]
},
{
"_index": "movies",
"_type": "_doc",
"_id": "109487",
"_score": null,
"_source": {
"id": "109487",
"title": "Interstellar",
"year": 2014,
"genre": [
"Sci-Fi",
"IMAX"
]
},
"sort": [
"Interstellar"
]
},
{
"_index": "movies",
"_type": "_doc",
"_id": "1924",
"_score": null,
"_source": {
"id": "1924",
"title": "Plan 9 from Outer Space",
"year": 1959,
"genre": [
"Horror",
"Sci-Fi"
]
},
"sort": [
"Plan 9 from Outer Space"
]
},
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": null,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
},
"sort": [
"Star Trek Beyond"
]
},
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": null,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
},
"sort": [
"Star Wars: Episode VII - The Force Awakens"
]
}
]
}
}
  • We have, here a must clause, that means that the query must match the genre “Sci-Fi” AND
  • It must not match the title term trek AND
  • It must also have a range filter, in the year between 2010 and 2015.
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"bool": {
"must": {
"match": {
"genre": "Sci-Fi"
}
},
"must_not": {
"match": {
"title": "trek"
}
},
"filter": {
"range": {
"year": {
"gte": 2010,
"lt": 2015
}
}
}
}
}
}'
{
"took": 36,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.640912,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "109487",
"_score": 0.640912,
"_source": {
"id": "109487",
"title": "Interstellar",
"year": 2014,
"genre": [
"Sci-Fi",

"IMAX"
]
}
}
]
}
}
curl --location --request GET 'localhost:9200/movies/_search?sort=title.raw' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"bool": {
"must": {
"match": {
"genre": "Sci-Fi"
}
},
"filter": {
"range": {
"year": {
"lt": 1960
}
}
}
}
}
}'
{
"took": 53,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "1924",
"_score": null,
"_source": {
"id": "1924",
"title": "Plan 9 from Outer Space",
"year": 1959,
"genre": [
"Horror",
"Sci-Fi"
]
},
"sort": [
"Plan 9 from Outer Space"
]
}
]
}
}
  • For substitution of characters → Tthat would catch things where someone just typed in the wrong character by mistake. So for example, if someone misspelled interstellar as intersteller, with an ‘e’, instead of an ‘a’, that would still match if we were willing to tolerate Levenshtein edit distance of one, because there was one character that was substituted for what it really should have been.
  • For Insertion of characters → If I were to mistakenly insert an extra character that shouldn’t have been there. If I went from interstellar to insterstellar, you know, put in an extra ‘s’ there, that shouldn’t have been there, that would still match if I were willing to tolerate Levenshtein edit distance of one because one extra character was inserted that shouldn’t have been there.
  • For Deletion of characters → Deletions work the same way. If I misspelled interstellar to have one ‘l’, instead of two, that could also match as well because that too has Levenshtein edit distance of one.
  • If the input-string is of max length upto 2, then we can’t tolerate any mistake.
  • If the input-string length is between 3 & 5, then we can tolerate mistake of upto 1 character.
  • If the input-string length is between 5 & above, then we can tolerate mistake of upto 2 characters max.
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match": {
"title": "intersteller"
}
}
}'
{
"took": 29,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,

"relation": "eq"
},
"max_score": null,
"hits": []
}
}
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"fuzzy": {
"title": {
"value": "intersteller",
"fuzziness": 1
}
}
}
}'
{
"took": 307,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,

"relation": "eq"
},
"max_score": 1.8191156,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "109487",
"_score": 1.8191156,
"_source": {
"id": "109487",
"title": "Interstellar",
"year": 2014,
"genre": [
"Sci-Fi",
"IMAX"
]
}
}
]
}
}
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"fuzzy": {
"title": {
"value": "intursteller",

"fuzziness": 1
}
}
}
}'
{
"took": 53,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,

"relation": "eq"
},
"max_score": null,
"hits": []
}
}
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"fuzzy": {
"title": {
"value": "intursteller",
"fuzziness": 2

}
}
}
}'
{
"took": 65,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,

"relation": "eq"
},
"max_score": 1.6537415,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "109487",
"_score": 1.6537415,
"_source": {
"id": "109487",
"title": "Interstellar",
"year": 2014,
"genre": [
"Sci-Fi",
"IMAX"
]
}
}
]
}
}
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"fuzzy": {
"title": {
"value": "warz",
"fuzziness": 1

}
}
}
}'
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,

"relation": "eq"
},
"max_score": 0.77331555,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 0.77331555,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
}
]
}
}
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"fuzzy": {
"title": {
"value": "warz",
"fuzziness": 2

}
}
}
}'
{
"took": 26,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,

"relation": "eq"
},
"max_score": 0.77331555,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 0.77331555,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
},
{
"_index": "movies",
"_type": "_doc",
"_id": "58559",
"_score": 0.75846994,
"_source": {
"id": "58559",
"title": "Dark Knight, The",
"year": 2008,
"genre": [
"Action",
"Crime",
"Drama",
"IMAX"
]
}
}
]
}
}
curl --location --request GET 'http://localhost:9200/movies/_mappings'{
"movies": {
"mappings": {
"properties": {
"genre": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"year": {
"type": "text"
}

}
}
}
}
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"prefix": {
"year": "201"
}
}
}'
{
"took": 114,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 1.0,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
},
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 1.0,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
},
{
"_index": "movies",
"_type": "_doc",
"_id": "109487",
"_score": 1.0,
"_source": {
"id": "109487",
"title": "Interstellar",
"year": 2014,
"genre": [
"Sci-Fi",
"IMAX"
]
}
}
]
}
}
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"wildcard": {
"year": "19*"
}
}
}'
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "1924",
"_score": 1.0,
"_source": {
"id": "1924",
"title": "Plan 9 from Outer Space",
"year": 1959,
"genre": [
"Horror",
"Sci-Fi"
]
}
}
]
}
}
  • In this example, let’s just imagine that the user typed in the term Star Trek. You can use a specialised query called match phrase prefix and it’s just like match prefix that we’ve looked at before for prefix searches, but it works on the phrase level. So, by typing in Star Trek, it will search for any titles in this example that begin with the phrase Star Trek.
  • You can also specify a slop value with that query. So if you want to provide more flexibility with the ordering of the words in that phrase and things like that, you can specify a slop, and with that, you can actually get back results for people who searched for Trek star or titles that don’t quite match that phrase exactly and might have stuff in between the terms, if you want to.
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match": {
"title": "star fo"
}
}
}'
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.9579736,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 0.9579736,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
},
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 0.6511494,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
}
]
}
}
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"prefix": {
"title": "star fo"
}
}
}'
{
"took": 34,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match_phrase": {
"title": {
"query" : "star fo"
}
}
}
}'
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match_phrase_prefix": {
"title": {
"query" : "star fo"
}
}
}
}'
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,

"relation": "eq"
},
"max_score": null,
"hits": []
}
}
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match_phrase_prefix": {
"title": {

"query" : "star fo",
"slop" : 5
}
}
}
}'
{
"took": 19,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,

"relation": "eq"
},
"max_score": 0.46117926,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 0.46117926,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
}
]
}
}

--

--

--

Software Engineer for Big Data distributed systems

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
aditya goel

aditya goel

Software Engineer for Big Data distributed systems

More from Medium

ELK Search Operations| Part5

Kerberos based User Authentication and SSO in Web Application

Cara Install Let’s Encrypt dengan Apache di Ubuntu 20.04 / Debian 11 / Linux Mint

Cara Install Let's Encrypt dengan Apache di Ubuntu 20.04 / Debian 11 / Linux Mint

Testing WSO2 Identity Server on Oracle Database using docker