ELK stack working and sneak-peek into internals

  • Analyze the Documents.
  • Indexing of Documents.
  • Deletion of Documents.
  • Retrieval of Documents.
  • Search of Documents.
  • Say, we only have 1 machine as part of the cluster, then we would surely have both the shards being present on the same node and in case this machine goes down, we shall be in the problem.
  • Similarly, if we don’t have any replica-shards, and both of the primary shards are being deployed at different machines and in case any one of the node goes down, we shall end-up loosing half of our data.
  • Master Node → These are the supervisor nodes for all other nodes in the same cluster. This node is responsible for actions like Creating & Deleting an Index, Tracking which nodes are part of the cluster and allocating the shards to other nodes.
  • Master-Eligible Node → There is a property called “node.master” in elastic.yml file. If this property is set to be true (by default, it is set to be true), then this node is eligible to become a master node. Lets take an example: We have multi-node cluster with 1 master-node. In case, the server which is master node fails, the nodes which are eligible for becoming the new master, competes through a process called as Master-Election-Process and new master is being elected.
  • Data Node → This node holds the data and performs the operations such as CRUD, Search and Aggregations. To make a node as Data-Node, the property called “node.data” in elastic.yml file should be set to true (by default, it is set to be true).
  • Ingest Node → This node is used to pre-process the document, before the document is actually indexed into the Elastic-Search. To make a node as Ingest-Node, the property called “node.ingest” in elastic.yml file should be set to true (by default, it is set to be true).
  • Tribe Node → This node is used for coordination purpose.
  • Say, we supply a document with intent to index it into elastic-search.
  • Elastic-Search would first break this document into the words and tokenise it. Each token is also called as Term.
  • Get rid off, from any extraneous suffixes and prefixes. We remove the stop-words, eliminate white-spaces and remove punctuations.
  • Elastic-Search then perform the lower-casing of all the terms.
  • Elastic-Search then perform the process of stemming i.e. analyse the root of a particular word and trim it. e.g. For two words ‘swimming’ and ‘swimmers’, the root(trimmed) word is swim.
  • It then do the synonym-matching. e.g. Words ‘thin’ and ‘skinny’ are almost same.
  • In Document-1, there is a ‘Field1’ and it has the value:- “The thin lifeguard was swimming in the lake”.
  • In Document-2, there is a ‘Field1’ and it has the value:- “Swimmers race with the skinny lifeguard in lake”.
  • Removal of stop-words.
  • Lowercasing.
  • Stemming.
  • Synonym-match.
  • Lets see an real-time example of how the particular document gets analysed by ‘Standard Analyser’ :-
  • Lets see an real-time example of how the particular document gets analysed by ‘Whitespace Analyser’ :-
  • Lets see an real-time example of how the particular document gets analysed by ‘Simple Analyser’. The ‘simple’ analyser can get rid of punctuations and digits both from a particular word. See below example, how different terms were tokenised with Simple Analyser :-
  • Say, we want that, in case someone tries to index a document into elastic and that document have an additional field(other than the pre-specified fields at the Index creation time) to which we want to be ignored, then we set the below mapping property to ‘false’. Lets see below example :-
  • Say, we want that, in case someone tries to index a document into elastic and that document have an additional field(other than the pre-specified fields at the Index creation time) to which we strictly don’t want to ingest at all, then we can set the below mapping property to ‘‘strict’. Lets see below example :-
  • The filtering query doesn’t returns the relevancy score for the documents returned. It just filters out those documents which matches to the search criteria and returns them.
  • The filtering query results is a fast operation as compared to the plain query matching criteria, as there is no extra computation of relevancy-score being involved in former approach.

--

--

--

Software Engineer for Big Data distributed systems

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
aditya goel

aditya goel

Software Engineer for Big Data distributed systems

More from Medium

Concurrency Control using Snapshot Isolation

Authorisation with Spring Security — Part 1

Interview with Tom Granot — Developer Observability, KoolKits and Reliability

Using Kafka in 2022