NLP for Search Usecase (Zomato)| Part-2
If you are landing here directly, it’s advisable to first see through this blog.
Question :- What is Zomato ?
Question :- What are the things that people can search on Zomato ?
Question :- What’s the usual approach adopted by Search-Engine ?
Answer → Here is the approach taken :-
- We adopt the approach of Tokenisation, then we perform Matching operation and then rank the results.
- If the search-query is present in Title, that document’s preference would be high as compared to the scenario where search-query is present in Description.
Question :- Does Lexical-Match always works OR do it also have some disadvantages ?
Answer → Lexical-Match mayn’t work always as the query can be sometimes quite complex. For example → Say the query contains : “Best coffee near me”.
- The result-document which contains “Best Bliss” would also be fetched whereas this document mayn’t be the suitable/appropriate.
- Similarly, the result-document which contains “Bar Best” would also be fetched whereas this document mayn’t be the suitable/appropriate.
Thus, we need “Natural Language Understanding” here.
Question :- What are the other queries that we need to handle and parse ?
Single intent queries are easy to understand, but such complex queries require deeper understanding of domain & Natural language.
Question :- Showcase an example of understanding the Intent ?
Question :- Under what all categories, can we group the Search-Query ?
Question :- What are the various challenges in understanding the Query ?
Answer → Since we don’t have the training data, we can’t apply the Supervised Learning Algos in this scenario and we shall be using something like Word2Vec algos :-
Question :- Explain something about Word2Vec Algo ?
Answer → Machine understands Numbers and it doesn’t understands words and that’s why we need to convert the word into the Numbers first so that machine can understand & interpret our query.
- Word2Vec is a Neural-Network Model to learn word associations in context of the training data that we input/supply.
- Word2Vec helps in converting a given word into a Vector. A Vector is simply a list of numbers.
Question :- How does Word2Vec Algo gets trained ?
Step #1.) First, we generate the tokens from the text, using the BPE (Byte-Pair-Encoding) approach.
Step #2.) Then, the data about the restaurants, food-menu and locations are trained through the Word2Vec Model, which provides us the Word-Embeddings.
Step #3.) Next, these word-embeddings are then used by the (BiDirectional LSTM (Long Short Term memory) + CRF) Neural-Net Model, to do the Sequence-Tagging OR to perform the Named-Entity-Recognition.
Question :- What is Sequence-Tagging ?
Answer → The process of tokenising the input-query and then identifying the following things from the given input query is called as Sequence-Tagging :-
- What is Location ?
- What is Restaurant ?
- What is Dish ?
That’s what a Bidirectional-LSTM would do i.e. it would help to identify the Dish, location and restaurant. This process is known as Sequencer-Tagging OR Named-Entity-Recognition.
Question :- How does the overall architecture of Search looks like ?
Step #1.) First, the customer would be searching some query in the search-bar.
Step #2.) Parrallely, we have a Neural-Net model which is being trained on the data of restaurant, food-menu and location. This model would be exposed through some API-Gateway on the EC2 OR some ECS-cluster.
Step #3.) Next, the Search-Service would talk to the API (backed by the DS Model) which would parse the query, fetch the Named-Entities from the query and finally creates a specialised query.
Step #4.) Finally, the query is being fired on the Elastic-Search-Indices, in order to find the relevant results for the input-query.
That’s all in this blog. If you liked it, please do clap.
References :-
- https://medium.com/@adityagoel123/nlp-for-machine-learning-part-1-11990459fbf1
- https://adityagoel123.medium.com/handson-with-nlp-using-tensorflow-part-1-427bbad2497f
- https://medium.com/@adityagoel123/sentiment-classification-using-tensorflow-nlp-70d90ffd0fe
- https://blog.zomato.com/how-we-make-our-search-more-conversational-and-inclusive
- https://arxiv.org/abs/1508.01991
- https://en.wikipedia.org/wiki/Byte_pair_encoding
- https://en.wikipedia.org/wiki/Word2vec