NLP for Search Usecase (Zomato)| Part-2

aditya goel
5 min readJan 15, 2024

If you are landing here directly, it’s advisable to first see through this blog.

Question :- What is Zomato ?

Question :- What are the things that people can search on Zomato ?

Question :- What’s the usual approach adopted by Search-Engine ?

Answer → Here is the approach taken :-

  • We adopt the approach of Tokenisation, then we perform Matching operation and then rank the results.
  • If the search-query is present in Title, that document’s preference would be high as compared to the scenario where search-query is present in Description.

Question :- Does Lexical-Match always works OR do it also have some disadvantages ?

Answer → Lexical-Match mayn’t work always as the query can be sometimes quite complex. For example → Say the query contains : “Best coffee near me”.

  • The result-document which contains “Best Bliss” would also be fetched whereas this document mayn’t be the suitable/appropriate.
  • Similarly, the result-document which contains “Bar Best” would also be fetched whereas this document mayn’t be the suitable/appropriate.

Thus, we need “Natural Language Understanding” here.

Question :- What are the other queries that we need to handle and parse ?

Single intent queries are easy to understand, but such complex queries require deeper understanding of domain & Natural language.

Question :- Showcase an example of understanding the Intent ?

Question :- Under what all categories, can we group the Search-Query ?

Question :- What are the various challenges in understanding the Query ?

Answer → Since we don’t have the training data, we can’t apply the Supervised Learning Algos in this scenario and we shall be using something like Word2Vec algos :-

Question :- Explain something about Word2Vec Algo ?

Answer → Machine understands Numbers and it doesn’t understands words and that’s why we need to convert the word into the Numbers first so that machine can understand & interpret our query.

  • Word2Vec is a Neural-Network Model to learn word associations in context of the training data that we input/supply.
  • Word2Vec helps in converting a given word into a Vector. A Vector is simply a list of numbers.

Question :- How does Word2Vec Algo gets trained ?

Step #1.) First, we generate the tokens from the text, using the BPE (Byte-Pair-Encoding) approach.

Step #2.) Then, the data about the restaurants, food-menu and locations are trained through the Word2Vec Model, which provides us the Word-Embeddings.

Step #3.) Next, these word-embeddings are then used by the (BiDirectional LSTM (Long Short Term memory) + CRF) Neural-Net Model, to do the Sequence-Tagging OR to perform the Named-Entity-Recognition.

Question :- What is Sequence-Tagging ?

Answer → The process of tokenising the input-query and then identifying the following things from the given input query is called as Sequence-Tagging :-

  • What is Location ?
  • What is Restaurant ?
  • What is Dish ?

That’s what a Bidirectional-LSTM would do i.e. it would help to identify the Dish, location and restaurant. This process is known as Sequencer-Tagging OR Named-Entity-Recognition.

Question :- How does the overall architecture of Search looks like ?

Step #1.) First, the customer would be searching some query in the search-bar.

Step #2.) Parrallely, we have a Neural-Net model which is being trained on the data of restaurant, food-menu and location. This model would be exposed through some API-Gateway on the EC2 OR some ECS-cluster.

Step #3.) Next, the Search-Service would talk to the API (backed by the DS Model) which would parse the query, fetch the Named-Entities from the query and finally creates a specialised query.

Step #4.) Finally, the query is being fired on the Elastic-Search-Indices, in order to find the relevant results for the input-query.

That’s all in this blog. If you liked it, please do clap.

References :-

--

--

aditya goel

Software Engineer for Big Data distributed systems