Sentiment Classification using TensorFlow || NLP with DL

In this case you are landing here directly, it’s recommended to read through this blog first.

In this blog, we shall be using the concept of word embeddings, in order to perform the Sentiment Analysis. We will train our own word embeddings using a simple Keras model for a sentiment classification task with following tasks :-

  • Downloading data from tensorflow dataset.

Question: Demonstrate the entire process of performing the Sentiment Analysis task ?

Phase-1 : Data Downloading and Understanding :-

Step #1.) Let’s first import the necessary libraries :-

Here is the version of the tensorflow, that we shall be using :-

Step #2.) Let’s now download the ready-made dataset of imdb movie-reviews :-

Step #3.) Let’s first understand the data that, we have downloaded. This is a dictionary type of dataset. There are two major components to this dataset :-

  • Training dataset.

Step #4.) Let’s now segregate the training and test data first :-

Step #5.) Let’s first investigate any one sequence in the training data-set :-

Note that, the dataset (that we have downloaded) contains the Sentence as well as it’s true label.

  • Label value of ZERO (0) means that, sentiment for sentence is negative.

Step #6.) We would now create empty lists, in order to store sentences and labels :-

Step #7.) Let’s iterate over the train-data and test-data, to extract sentences and labels from them and eventually append to the afore-declared empty lists.

Step #8.) We would investigate, whether all the sentences have been appended into the afore-declared empty lists.

Step #9.) Now, we convert the training-labels list into numpy array :-

Step #10.) Similarly, we convert the test-labels list into numpy array :-

Phase-2 : Data Downloading and Understanding :-

Step #1.) We would now instantiate an object of Tokenizer class and train it using the corpus of training_sentences.

Note that, vocabulary size of 10,000 means that, while we would obtain the train_sequence of our sentences, the ids of the most 100 frequent words shall be returned.

Step #2.) Let’s understand our word_index. This is nothing other than our Dictionary :-

Step #3.) Let’s understand our word_index. This is nothing other than our Dictionary and note that the size of our dictionary is 88, 583 words.

Step #4.) Next, let’s convert our training corpus of 25,000 sentences into the corresponding word-encodings :-

Note that, the length of the training_sentences was 25,000 and length of the train_seqs is also 25000.

Step #5.) Let’s investigate the first training sentence from our corpus and it’s corresponding word-encoded version :-

Phase-3 : Model preparation :-

Step #1.) We first create a Sequential Keras based Model :-

  • Input Layer is an Embedding Layer, where we supply our word-embeddings as the input to the model. One thing to note here is that, we have defined 16 dimensional layer.

Step #2.) We now proceed ahead with compiling this model :-

  • We have used Binary-CrossEntrophy as the Divergence function, because our problem is here binary type.
  • We are planning to use ADAM optimiser on the top of SGD (Standard Gradient Descent), in order to minimise the cost-function.

Usually, below values are adopted for hyper-parameters Delta and Gamma :-

Here is a ready comparative analysis for various types of Optimisers :-

Step #3.) Lets proceed with the Training of Model :- Given the 25K sequences of data-set that we have got, we would perform the Model-Training now :-

Let’s understand few things about the Epoch :-

  • One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE. We usually need many epochs, in order to arrive at an optimal learning curve. Weights are iteratively tuned, Loss is gradually reduced and Model-accuracy is gradually improved.

Step #3.) We now extract the learned weights for the embedding-layer :- We have each word in 16 dimensions.

Phase-4 : Model Evaluation :-

  • We now plot the graph of Accuracy, which shows our model has problem of OverFitting, because there is a huge gap in the Training-Accuracy and Testing-Accuracy.

Phase-5 : Model Usage :-

Let’s now use our constructed Model, in order to perform the classification task i.e. we shall be using our model to detect, whether a particular review is positive OR negative ?

Example #1) We supply the sentence(review of movie shershah) and we can see that, output value is 1, which indicates it’s a strongly positive review.

Example #2) We supply the sentence(review of movie Lal Singh Chaddha) and we can see that, output value is 5.45e-10, which indicates it’s a strongly negative review.

That’s all in this blog and Thanks for reading till here. If you liked it, please do clap on this page. We shall see you in next blog.

--

--

Software Engineer for Big Data distributed systems

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store