In Conversation with ELK stack| Part1

  • It can be used for much more than full text search now.
  • It can actually handle structure data very well.
  • It can aggregate data very quickly.
ElasticSearch fundamentals
First Querying with ES
Kibana Introduction
  • It can do very complex aggregations of data.
  • It can graph your data and it can create charts.
  • It’s often used to do things like log analysis — so if you’re familiar with things like Google Analytics, the combination of elastic search and Kibana can be used as sort of a way to roll your own Google Analytics at a very large scale. So here’s an actual screenshot from Kibana looking at some real log data.
  • We can also visualise things like; where the hits on my web site are coming from, and where are the error response codes and how are they all broken down, and what’s my distribution of URLs, whatever you can dream up.
Complex data analysis with Kibana
  • The monitoring framework, that lets you quickly visualize what’s going on with your cluster.
  • To check, what’s my cpu utilisation system load?
  • How much memory you have available?
  • We’re saying curl dash H content type application json, that’s sending a http header that says that the data in the body is going to be in json format.
  • Dash X get, means that we’re using the get method or the get verb, depending on your terminology, meaning that we just want to retrieve information back from elastic search, we’re not asking you to change anything
  • The URL, as you can see, includes the host that we’re talking to, in this case 127.0.01 which is the local loop back address where your local host, elastic search runs on port 9200 by default, followed by the index name which is shakespeare, and then followed by underscore search, meaning that we want to process a search query as part of this request.
  • The question mark pretty is a query line parameter, that means that we want to get the results back in a nicely formatted human readable format, because we’re gonna be looking at it on the command line.
  • And finally, we have the request body itself specified after a dash D and between single quotes, and if you’ve never seen json before, this is what it looks like, it’s just a structured data format where each level is contained within curly brackets, so it’s always contained by curly brackets at the top level.
  • Then we’re saying we have a query level, and within those brackets, we’re saying we have a match phrase command that matches the text entry ‘to be or not to be’.
  • So in this one we’re using a put verb. Again, to 127.0.0.1 on point 9200.
  • This time we’re talking to an index called movies and a data type called movie, and it’s using a unique identifier for this new entry called 109487.
  • Under movie I.D 109487 we’re including the following information in the message body. The genre is actually a list of genres, and in json that will be a comma delimited list of stuff that’s enclosed in square brackets, so this particular movie is both the IMAX and sci fi categories, its title is Interstellar, and it came out in the year 2014. So that’s what some real http requests look like, when you’re dealing with elastic search.
  • Term frequency is just how often a given search term appears within a given document. So if the word space occurs very frequently in a given document, it would have a high term frequency.
  • Now document frequency is just how often a term appears in all of the documents in your entire index. So the word space probably doesn’t occur very often across the entire index, so it would have a low document frequency. However, the word does appear in all documents pretty frequently, so it would have a very high document frequency.
  • Next, if we divide term frequency by document frequency, mathematically we get a measure of relevance. So we see how special this term is to the document. It measures not only how often does this term occur within the document, but how does that compare to how often this term occurs in documents across the entire index?
  • ElasticSearch main scaling trick is that, an index is split into what we call shards, and every shard is basically a self-contained instance of lucene.
  • The idea is that if you have a cluster of computers, you can spread these shards out across multiple different machines, as you need more capacity, you can just throw more machines into your cluster and add more shards to that entire index so that it can spread that load out more efficiently.
  • So that’s the basic idea, we just distribute our index among very many different shards and a different shard can live on different computers within your cluster.
  • We have one replica of shard one sitting on node two.
  • Another one replica of shard one is also sitting on node three.
  • We had a replica of PRIMARY-Shard-ZERO sitting on node one.
  • We also had a replica of PRIMARY-Shard-ZERO sitting on node two.
  • Case of Writing to ES :- So let’s say you’re indexing a new document into elastic search that’s going to be a write requests. Now when you do that, whatever node you talk to will say okay, here’s where the primary shard lives for this document that you’re trying to index, I’m going to redirect you to where that primary shard lives. So you’ll go write that data, index it into the primary shard, wherever that node lives on, and then that will automatically get replicated to any replicas for that shard.
  • Case of Reading from ES :-Now when you read, that’s a little bit quicker, they just route it to the primary shard or to any replica of that shard. So that can spread out the load of reads even more efficiently. So the more replicas you have, you’re actually increasing your read capacity for the entire cluster.
  • So we’re saying that we want three primary shards and one replica of each of those primary shards, so you see how that adds up. We have three primaries, times one replica per primary, which is three total replicas plus the three original primaries which gives us six.
  • If we add two replicas, we would end up with nine total shards, right
  • Three primaries and then a total of six replicas, to give us two replica shards for each primary shard.So that’s how that math works out. It can be a little bit confusing sometimes, but that’s the idea.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
aditya goel

aditya goel

Software Engineer for Big Data distributed systems