Multi-class Classification of Mathematical-Numbers | DNN+Regularisation | Part3

In case you are directly landing here, it’s highly recommended to have a view at this story.

In this blog, we shall be using KERAs framework with TensorFlow backend, in order to create a Deep Neural Network and would apply Regularisation techniques. We shall be using MNSIT dataset, which consists of 28*28 grayscale images, representing the hand-written images of numeric digits 0 to 9. The dataset is partitioned in 2 parts :-

  • Training dataset of 60,000 images.

This is a classic example of multi-class-classification, where we would be classifying, the input image into 1 amongst 10 classes of decimal-digits (0 to 9). Our Deep-Neural-Network have 2 hidden layers, in addition to the input & output layer, like the way we have demonstrated below :-

First, we import important libraries that we shall be using throughout our demonstration. We shall be using the tensorFlow V1 throughout the setup, therefore we explicitly also disable the TF V2 behaviour.

Next, we initialise the random-number-generator & set the seed so that, we can keep on using the same set of instances or random-numbers in every run of the program.

We would be using the MNIST hand-written data-set, which is already provided as a built-in data-set in the KERAs framework. We, therefore first create the instance of MNIST dataset.

Next, we would be loading the data (from MNSIT data-set) into the current environment.

Let’s first understand the meaning of the 4 variables created above :- The training set is a subset of the data set used to train a model.

  • Xtrain is the training data set.

The test set is a subset of our data-set, that we shall be using to test our model, after the model has gone through initial vetting by the validation set.

  • Xtest is the test data set.

Coming back on the MNSIT dataset, let’s go ahead and print the shape of all of the above 4 datasets. We have following configuration :-

  • Training dataset of 60,000 images, stored in Xtrain.

Also, please note that, this is set of hand-written numbers, stored as gray-scale-images. Each image is being stored as the matrix of size, 28 * 28 with pixel values taking a value from in range (0 to 255).

Let’s go ahead and print few initial sample images (from this data-set) :

As we have learnt so far that, there are net 60,000 images into this training data-set, we are randomly printing the 49,001th image from the training data-set and it’s corresponding label :-

Understand that, each image is of size 28 * 28. Each pixel can be varying within range of (0, 255). Each value represents some density of color. Note that, each image is a 2d array of size (28 * 28) :-

Let’s go ahead and see, what’s the pixels value are for 49,001st image are :-

Similarly, as we have also learnt that, there are only 10,000 images into the test-data-set, we are printing the last image from the test-data-set i.e. 5,678th image and it’s corresponding label :-

Data-Pre-Processing :- Next, we need to perform the pre-processing on the images. Remember from our previous screenshots above that, each image was originally a 2d matrix of size 28*28. Let’s now go ahead and convert the input image from 2d into 1d of size: 784, using the “reshape()” function. Also note that, post this operation, we can neither visually see the image anymore nor plot it using pyplot.

Further, we shall be normalising each of the image(from he training-set), in order to convert the pixel values from range (0 to 255) to range (0 to 1). Also, we shall be converting the values into float type values.

Therefore, each pixel value would now become of float type :-

Next, let’s also perform the similar operations on the Xtest dataset :-

  • Reshape → We shall be reshaping each image in the test dataset from 2d (each image of size 28 * 28) into 1d (each image of size 784).

Note that, the output values into the data-set are categorical in nature with values in range of (0 to 9). Therefore, this categorical data is first converted into vector using One-Hot-Encoding approach :-

Keras library provides us out-of-the-box method to “to_categorical”.

Above step completes the pre-processing step for both input images and output categories. Recall that :-

  • Xtrain contains the 60,000 images, which we shall be using for training.

Let’s now build the simplest & sequential DNN using Keras. We are planning to use the 4 layers in this Neural Network.

First Layer is an input layer, which expects input-tensor of size 784 (i.e. each input image to our model is of size 784*1). Recall that, earlier above, we had converted(reshaped) all of our input images from 2d to 1d.

Note that, only the input layer specifies the input_size. The input layer is defined as dense-layer with 50 neurons and ‘relu’ activation function. Dense-Layer implies that, all neurons of one layer are connected to all neurons of next layer.

Second Layer is defined as a hidden layer with 60 neurons and ‘relu’ activation function. This is also a dense-layer, but we added a L2 based Regulariser with Lambda value as 0.01.

Question: What is our usual goal in overall machine-learning model ?

Question: What is Regularisation actually ?

Question: What is the purpose of Regularisation and how can we build a “Network with Reasonable Generalisation” ?

Question: Why at all Regularisation is required in production data ?

Question: How does L2-Regularisation looks like ?

Question: How does L2 /RIDGE Regularisation (when represented as Hard-Constraint-Problem) looks like ?

Question: How does L2 /RIDGE Regularisation (when represented as Soft-Constraint-Problem) looks like ?

Question: How does L1 /LASSO Regularisation (when represented as Hard-Constraint-Problem) looks like ?

Question: How does L1 /LASSO Regularisation (when represented as Soft-Constraint-Problem) looks like ?

Question: Why is that “Soft-Constraint-format” for problems is even required ?

Usually problems are hard to solve, when represented in hard-constraint-fashion, therefore, it’s advisable to represent the problem in “soft-constraint-fashion”.

Question: Lambda is an Hyper-Parameter, also called as Regularisation-parameter. How do we determine it’s right value ?

Question: What impact does Regularisation approaches have on the values of various parameters involved in the overall network ?

Question: Which Regularisation approach should be used under what circumstances ?

Coming back to the Third Layer Is again defined as a hidden layer with 30 neurons and ‘relu’ activation function. This is also a dense-layer.

Fourth Layer is the last layer, which is again defined as the output layer with 10 neurons(note that, each neuron represents a specific category) and it uses ‘softmax’ activation function, in order to perform classification accurately.

Summary of the Model Here is how our entire model looks like :-

Note that, we have in-total of 44,450 parameters, in order to be trained.

Configuring Model :- Let’s now configure our recently created DNN model for training.

  • We shall be defining “Categorical Entropy” as the Cost-Function / Loss-Function.

Question: What should be our choice of selecting a Cost/Loss/Divergence Functions ?

Once the configuration of our model is complete, we can proceed for Training.

Training of Model :- Given the enormous(60K) data-set that we have got, we shall not be using “Full-Batch-Gradient-Descent” because of expensive computation being involved there, therefore we are planning to use “Mini-Batch-Gradient-Descent” approach, in order to minimise the LOSS.

  • So we have divided our Training-Data-Set into the small chunks of 64 each. Therefore, there are around 938 net-total batches that we have formed in this process.

Let’s understand few things from above step :-

  • Epoch :- One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE. We usually need many epochs, in order to arrive at a optimal learning curve.

As training of the DNN model progresses, both the (accuracy and loss) are stored as a list in the object. Final (Training loss and Loss) can be obtained from the history object.

  • The last element of the history object (model_fit.history[‘loss’])gives us the final loss after training process.

Next, We can also plot the graph for Validation-Accuracy. Below graph signifies that : As the no. of epochs progresses, Training-accuracy also increases.

Next, We can also plot the graph for Validation-Loss. Below graph signifies that : As the no. of epochs progresses, Training-Loss-value also decreases.

Next, We can evaluate our thus build model with the help of testing dataset of size 10,000. The “evaluate” function gives us the testing accuracy and testing loss as the output.

From above snapshot, we can observe that, our DNN model with Regularisation, is giving the testing accuracy of 97.22%. We can try to improvise(i.e. tune) this model’s accuracy by playing on following parameters :-

  • Changing the number of hidden layers itself. (Note that, we have used 4 layers in total in aforesaid demonstration).

Thanks for reading through this and we shall meet you in another article.



Software Engineer for Big Data distributed systems

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store