Vector Similarity using SBERT based method

aditya goel
5 min readOct 23, 2023

Note: If you are directly landing here, it’s suggested that, first you have a look at following blogs :-

  1. Blog about TF-IDF.
  2. Blog about BM25.

Question → What is sBERT ? Can you compare it with TF-IDF & BM25 based approach ?

Answer → sBERT is an example of dense-vector. Dense-Vectors are quite interesting as they allow us to consider semantics.

  • These are dense representations of language which means, it has many more values in there.
  • In case of BM25 and TF-IDF based approaches, the vectors thus generated are Sparse-vectors, because there are lots of Zeroes and then comes some odd values.

Question → What is the advantage of sBERT ?

Answer → The advantage of sBERT based approach is that →

  • We can represent the language in a more meaningful manner.
  • For example, the word “hi” would be in a very simlar space to the word “hello”.

Question → How does the sBERT works ?

Answer → The way the sBERT works is described below →

  • We have a transform Model. Our words (Or Our Query) is processed by many BERT Encoding Layers and then we get the dense vector.
  • Similarly, our all the documents are first processed through the same Encoder-Network and that produces the dense-vectors as well.
  • Once we have these dense-vectors for query (q) and all of our documents, then we make use of Cosine-Similarity between both of these, how similar they are OR How close these vectors are to each other ?

Question → Can you showcase with an example for Vector Similarity with sBERT ?

Answer → Let’s consider this example →

  • We have a lone-vector (marked in color blue) and then the other 2 vectors (marked in color red & green) are much more similar to the blue-vector OR at-least, they share the same direction.
  • The Cosine-Similarity finds the angle between the two vectors. If the angle is more smaller, they are more similar, otherwise If the angle is larger, they are less similar.

Question → In order to work with sBERT, what’s the library we are going to make use of ?

Answer → We are going to use Sentence-Transformers Library :-

  • It is a very good library that uses Hugging-Faces-Transformers under the hood and it has super easy implementations of Sentence-Transformers.
  • We are going to make use of “bert-base-nli-mean-tokens” Sentence-Transformer.

Question → Can you show the python based implementation for sBERT ?

Answer → Following are the steps involved :-

1.) Here are the sentences that we have with us:-

2.) The first thing that we need to do is : We initialise our Model here.

3.) Next, we encode all of our sentences with “model.encode”. This step shall produce the Sentence-Encodings.

  • Once the text has been processed by the sBERT Model, it outputs these Sentence-Embeddings.
  • These are the vectors that represents the Full-Sentence OR the Full-Document, that we have input.
  • You can note here that, we have seven sets of Embeddings here and the size of a single embedding-vector is 768. We have seven-sets because we have in-total 7 different sentences as shown above.
  • First, the sentence is breaked down into Tokens.
  • Now, we get the embeddings for each of the token.

4.) Once we have the sentence-embeddings, next we make use of Cosine-Similarity Function in order to find the cosine-similarity between that sentence and all of other sentences :-

Note: Here, we are just running a generic Python Loop to go through each embedding. This is very slow.

5.) Below is we have all of the scores between all of our sentences i.e. all possible combinations.

6.) Let’s visualise these scores for each sentences compared against all other sentences :-

The scores can be showcased in the HeatMap format :-

7.) We can see here that, vectors “b” and “c” have very high similarity, because their score is 0.72.

  • “there is an art to getting your way and throwing bananas on to the street is not it”.
  • “it is not often you find soggy bananas on the street”.

8.) The vectors “b” and “g” are also very similar (because they just use the different synonym words) and their score is also pretty OK i.e. 0.66 (2nd Highest Score). The TF-IDF and BM25 approach would struggle here.

  • “there is an art to getting your way and throwing bananas on to the street is not it”.
  • “to get your way you must not bombard the road with yellow fruit”.

Next Steps → For Vector Similarity, distilbert-base-uncased is the model that can be explored.

Conclusion → Sentence-Transformers don’t require the same words to use. They rely more on the semantic meaning of those words.



aditya goel

Software Engineer for Big Data distributed systems