ELK Concurrency, Analysers and Data-Modelling | Part3

  • Optimistic Concurrency control.
  • Retry On Conflicts.
  • Simple Query based search.
  • Text & English Analyser.
  • Data-Modelling with ElasticSearch.
  • We have a sequence number and the primary shard that owns that sequenceBy taking the sequence number and primary term together, we have a unique chronological record of this given document.
  • Here, we have two different clients that are trying to retrieve the current view count for a given page document from elastic search and they both get the number ten back, but when you request something from elastic search, it also gives you back a sequence number for that document.
  • So, I now know that the view count of ten, is associated explicitly with a given sequence number of that document, and that sequence number in turn, is associated with a primary term. Let’s say, that sequence number is nine, just for the sake of argument.
  • So now, when these guys say that I want to write a new value for that page count, I can specify that I’m basing that on what I saw in sequence number nine from primary term one.
  • So when you do an update you can specify the sequence number and primary term that you want to update explicitly. So what would happen if two people tried to update the same sequence is only one of them would succeed let’s say the first one actually successfully wrote the count of 11 given the sequence number nine.
  • The other one would try again on that particular client. So I would just go back to try to reacquire the current view count for that page. Start over basically and then I’ll get back sequence 10 of that document which contains 11 and I could then increment that to twelve and write it again hopefully successfully this time.
  • _seq_no is 12.
  • _primary_term is 1.
curl — location — request GET ‘localhost:9200/movies/_search’ \
— header ‘Content-Type: application/json’ \
— data-raw ‘{
“query” : {
“match” : {
“title” : “Star Trek”
}
}
}’
{
"took": 676,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 2.129195,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "135569",
"_score": 2.129195,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
},
{
"_index": "movies",
"_type": "_doc",
"_id": "122886",
"_score": 0.5935682,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
}
]
}
}
  • For attribute “genre”, we’re going to say that’s going to be of type keyword and that means that we are only going to do exact matches on that field. No analyser will be run on the genre fields at all. If you want to get back search results on the genre it’s going to have to be an exact match case sensitive the whole works.
  • For attribute “title”, we are going to have it of type text and that actually will have an analyser applied to it and we can do things like partial matches you know normalising for lowercase and uppercase synonyms things like that.
  • On this field “title”, we can also specify the specific analyser, that we want to run on that text field. With “english” analyser, we can apply stop words and synonyms that might be specific to the English language.
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query" : {
"match" : {
"genre" : "sci"
}
}
}'
curl --location --request GET 'localhost:9200/movies/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query" : {
"match" : {
"genre" : "sci-fi"
}
}
}'
  • If you want a field to be analysed, make sure it’s a text field and that will allow you to do partial matching and be a little bit more forgiving on your search results.
  • But if you do want exact matches for search terms make sure you make your text fields a keyword field instead.
  • Normalised way of storing the data :- (Movie-Id && Movie-Title are stored in ONE index) AND (Movie-Id, User-id and Rating) are stored in 2 different indices.
  • De-Normalised way of storing the data :- All fields i.e. (Movie-Id, Movie-Title, User-id and Rating) are stored in 1 single Index.
{ “create” : { “_index” : “series”, “_id” : “1”, “routing” : 1} }
{ “id”: “1”,
“film_to_franchise”:
{“name”: “franchise”},
“title” : “Star Wars”}
{ "create" : { "_index" : "series", "_id" : "260", "routing" : 1} }
{ "id": "260",
"film_to_franchise":
{"name": "film", "parent": "1"},
"title" : "Star Wars: Episode IV - A New Hope",
"year":"1977",
"genre":["Action", "Adventure", "Sci-Fi"]}

--

--

--

Software Engineer for Big Data distributed systems

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Mobile Test Automation — What is it?

Hacking With Security Tools Part — 3

How to Choose Your IoT Platform — Should You Go Open-Source?

Zero downtime deploy of ASP.NET Core on Linux

DevSecOps Best Practices and Business Value

RabbitMQ vs. Kafka

Top 5 Data Science Libraries in Python and Why They Are So Great

How To Make Money Button Work on Twetch

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
aditya goel

aditya goel

Software Engineer for Big Data distributed systems

More from Medium

ELK Enhanced Search Operations | Part6

Getting Started with ELK Stack

IDP Brokering from Keycloak to Google, using SAML from #greptail

Build Customised Zeppelin Docker Image