In Conversation with ELK Hands-On| Part2

aditya goel
9 min readDec 18, 2021

In case you are directly landing here, it’s advisable to first check out this page.

Following are the topics, which we shall be touching through this blog :-

  • Installation of ES.
  • Mappings in ES.
  • Field Analysers.
  • Bulk Data Ingestion.
  • Document Immutability behaviour into ES.
  • Ingest documents into ES.
  • Partial document update basis of documentId using POST verbatim.
  • Full document update basis of documentId using PUT verbatim.
  • Delete a document from ES.

Question: From where do I download the ElasticSearch ?

Question: Which ElasticSearch version would we be doing hands-on with ?

Question: How do I start ElasticSearch on my local MAC machine ?

Question: Is it so simple ? Didn’t we faced any challenge with this ?

Answer: Yes we did faced challenges.

Question: How do we solve for aforesaid SSLHandshakeException?

Answer: It’s real issue. ES guys have opened #76586 to track it and they shall work on a fix. For now a workaround should be to use cluster settings API instead of elasticsearch.yml:

curl --location --request PUT 'http://localhost:9200/_cluster/settings' \
--header 'Content-Type: application/json' \
--data-raw '{
"persistent": {
"ingest.geoip.downloader.enabled": true
}
}'

And Then hit :-

curl --location --request PUT 'http://localhost:9200/_cluster/settings' \
--header 'Content-Type: application/json' \
--data-raw '{
"persistent": {
"ingest.geoip.downloader.enabled": false
}
}'

It has to be enabled first for cluster to notice change, no need for restart between and after these calls.

Question: What is a Mapping?

Answer: A mapping in ElasticSearch is a Schema Definition. It’s telling to elastic search what format to store your data in and how to index it and how to analyse it. Elastic search usually has reasonable defaults and will infer from the nature of your data the right thing to do more often than not it can figure out if you’re trying to store strings or floating point numbers or integers or whatnot but sometimes you need to give it a little bit of a hint.

Question: How do we create a Mapping?

Answer: Here’s an example of where we’re going to import some data and we want the release date to be explicitly interpreted as a date type field.

In above snippet, we are sending an HTTP request to the server that’s running ElasticSearch.

curl --location --request PUT 'localhost:9200/movies' \
--header 'Content-Type: application/json' \
--data-raw '{
"mappings": {
"properties": {
"year" : {
"type" : "date"
}
}
}
}'

Question: What all things a Mapping can do ?

Note-1: Mapping can do a lot more than that though so mappings can define field types like we talked about in other field types besides date can include strings bytes short integers integers long integers floating point numbers double precision and floating point numbers and boolean values as well.

Note-2: You can also specify whether or not you want a field to be indexed for full text search or not as part of your mapping. So, for example, you might say index colon not_analyzed, if you don’t want that information to be part of full text search.

Note-3: More interesting are the field analysers and analysers have multiple different things they can do they can have character filters. So for example you can remove H2 e-mail encoding or convert Ampersand to an words you could do tokenising with token ideas.

Question: What are different token filters to choose from ?

Answer:

  • standard was split on word boundaries.
  • simple splits on anything that isn’t a letter and converts to lowercase.
  • simple whitespace just splits on whitespace but doesn’t convert to lowercase.

Question: What all things an Analyser can do ?

Question: Should we always ignore stop-words ?

Answer: When we search for to be OR not to be. All of those might be stop words right. So, if you would actually filter out all of the stop words like to be and not an or it wouldn’t work at all you would build a search for that phrase so you have to think closely about whether you want to use stop words or not. So don’t enable stop words lightly. Sometimes it has side effects that you don’t really want.

Question: What are various choices of analysers ?

Answer: For analysers themselves, there are several choices :-

Question: How do we see the mapping for any particular Index?

Question: How do we Ingest a single document into ElasticSearch ?

The command looks like :-

curl --location --request PUT 'localhost:9200/movies/_doc/109487' \
--header 'Content-Type: application/json' \
--data-raw '{
"genre" : ["Sci-Fi", "IMAX"],
"title" : "Interstellar",
"year" : 2014
}'

Question: How do we see all documents, that we have into ElasticSearch, at any given moment of time ?

The command looks like :-

curl --location --request GET 'localhost:9200/movies/_search'

The response would look like :-

  • hit.total.value of 1 means, there are net 1 records, found in our Index.
  • hits array indicates the records into ElasticSearch.
{
"took": 121,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "movies",
"_type": "_doc",
"_id": "109487",
"_score": 1.0,
"_source": {
"genre": [
"Sci-Fi",
"IMAX"
],
"title": "Interstellar",
"year": 2014
}
}
]
}
}

Question: How do we ingest multiple documents inside the ElasticSearch Index ?

Question: Why is the above format, is the way, it is ?

Answer:- The reason, we have all these individual lines here broken up this way instead of just one giant self-contained json request is because, ElasticSearch will hash every specific document to a given shard. So, elastic search needs to be able to deal with these individual documents one at a time, it needs to be able to go through each line and say okay, I’ve got Movie-Id: 1355569 here, which shard should that actually map to? And then it can go off and just hand off that line to the correct shard in our cluster, and then it can process the next line and say okay, this is gonna go to this other shard and this other node instead, and so on and so forth.

Note: So, this format where things are broken up into these groups of two lines allows whatever ElasticSearch server you’re actually sending this to to say okay, this is the shard that I need to be dealing with and I’m going to send this next document to that shard and just process that through, one line at a time, instead of trying to do the whole thing at once.

Question: What are our sample records, which we shall ingest in bulk manner ?

{ "create" : { "_index" : "movies", "_id" : "135569" } }
{ "id": "135569", "title" : "Star Trek Beyond", "year":2016 , "genre":["Action", "Adventure", "Sci-Fi"] }
{ "create" : { "_index" : "movies", "_id" : "122886" } }
{ "id": "122886", "title" : "Star Wars: Episode VII - The Force Awakens", "year":2015 , "genre":["Action", "Adventure", "Fantasy", "Sci-Fi", "IMAX"] }
{ "create" : { "_index" : "movies", "_id" : "109487" } }
{ "id": "109487", "title" : "Interstellar", "year":2014 , "genre":["Sci-Fi", "IMAX"] }
{ "create" : { "_index" : "movies", "_id" : "58559" } }
{ "id": "58559", "title" : "Dark Knight, The", "year":2008 , "genre":["Action", "Crime", "Drama", "IMAX"] }
{ "create" : { "_index" : "movies", "_id" : "1924" } }
{ "id": "1924", "title" : "Plan 9 from Outer Space", "year":1959 , "genre":["Horror", "Sci-Fi"] }

Question: How do we bulk-ingest the documents to the ElasticIndex using Curl /Postman?

curl --location --request PUT 'localhost:9200/_bulk' \
--header 'Content-Type: application/json' \
--data-binary '@/Users/B0218162/Documents/LEARNINGS/MEDIUM-BLOG/ElasticSearch/ml-latest-small/movies.json'

Question: In the bulk-file, we observed that, there is a duplicate event, to which we are trying to insert again, what shall happen to the same?

Answer: Below is the expected behaviour, as we can’t re-insert a given document twice i.e. we lead into a Version-Conflict-Engine-Exception.

Question: Can we really, update the existing document into the ElasticSearch ?

Answer: Well, the documents into ES are Immutable, i.e. they can’t be modified, once ingested.

Question: Wha’t the workaround in such case, then : How do we really modify a document’s particular attribute, in ElasticSearch ? We are talking about the Partial-Update here.

Answer:- We can use POST verb, to update a document to the ElasticSearch.

curl --location --request POST 'localhost:9200/movies/_doc/109487/_update' \
--header 'Content-Type: application/json' \
--data-raw '{
"doc":{
"title":" Space Interstellar"
}
}'

Question: Let’s see the document, to which we just modified into ElasticSearch ?

Answer:- Yes, the document with id: 109487, has been well modified. Look @ below snapshot : title of the document has been well changed now.

Question: What if, we want to completely re-insert the particular document-id into ElasticSearch ? We are talking about the Full-Update here.

Answer:- Please note that, we are freshly re-inserting the new document to the ElasticSearch. The older document shall be over-written.

Question: Can we even change attributes for this document into ElasticSearch ?

Question: Let’s see the document, to which we just modified into ElasticSearch ? Also, can we fetch a single document from ElasticSearch basis of Id ?

Answer:- Yes, we can even fetch a single document from ElasticSearch. Observe that, document has been totally revamped :-

Question: Let’s now search for a document which contains keyword ‘Trek’ ?

curl — location — request GET ‘localhost:9200/movies/_search?q=trek’

Question: Let’s now go ahead and delete the document explicitly from ElasticSearch basis of ‘id’ ?

curl — location — request DELETE ‘localhost:9200/movies/_doc/135569’

References :-

--

--

aditya goel

Software Engineer for Big Data distributed systems