Multi-class Classification of Mathematical-Numbers | CNN | Part4

  • Training dataset of 60,000 images.
  • Testing dataset of 10,000 images.
  • Xtrain is the training data set.
  • Ytrain is the set of labels to all the data in Xtrain.
  • Xtest is the test data set.
  • Ytest is the set of labels to all the data in Xtest.
  • Training dataset of 60,000 images, stored in Xtrain.
  • Testing dataset of 10,000 images, stored in Xtest.
  • Reshaping.
  • Normalisation.
  • Conv2D is a layer which performs 2D convolutions, with filter of size 3*3. There are 32 such filters (aka kernels), that we have specified in the initial go here. Here, In this convolutional-operation, stride-size of 1 is being considered. All of these 32 filters are initialised randomly by it’s own.
  • The input_shape represents the size of the input-image to this CNN model. Each image is having dimension (28 * 28) with depth of 1.
  • Input-size of Image.
  • Kernel-size.
  • Stride-size.
  • Padding-size.
  • Each of 32 output-feature-maps would be of size (26*26) with depth as 1.
  • There are 32 output-feature-maps, because we took 32 filters initially for Layer-1.
  • See here we have a single-input-image (with 1 feature maps aka 1 dimensions aka 1 depth) as Input to the 1st Convolutional layer.
  • There are 32 kernels, each of size : 3*3. Note that, In each of the filter, we have got 9 different values. So, total no. of parameters comes out to be :~: (1 * 32 * 9) = 288.
  • Also, there is A bias, each one for each of the 32 filters. So, there are 32 biases.
  • Hence, total number of parameters comes to be:- 288 + 32 == 320.
  • Max-Pooling-Operation.
  • Mean-Pooling-Operation.
  • The output of Max-Pooling-Operation would look something like below :-
  • The output of Mean-Pooling-Operation would look something like below :-
  • Conv2D is a layer which performs 2D convolutions, with filter of size 3*3. There are 64 such filters (aka kernels), that we have specified in this layer here. So, for each Incoming-Input-Feature-Map, there shall be 64 outputs thus generated, as a result of application of each of 64 kernels.
  • Note that, input size to the Layer-2 is (13*13*1) and therefore, the size of the output-feature-map of this 2nd layer shall be dictated by following computation :- We have : W=13, K=3, P=0, S=1, therefore, computation for Output-Feature-Map-Size shall be :- O = [(13–3 + 2*0)/1 + 1] = 11. Therefore, each of the 64 output-feature-maps this generated post this convolution-operation of 2nd Conv-Layer shall be (11*11*1).
  • See here we have a single-input-image (with 32 feature maps aka 32 dimensions) as Input to the 2nd Convolutional layer.
  • There are 64 kernels, each of size : 3*3. Note that, In each of the filter, we have got 9 different values. So, total no. of parameters comes out to be :~: (32 * 64 * 9) = 18,432.
  • Also, there is A bias, each one for each of the filter. So, there are 64 biases.
  • Hence, total number of parameters comes to be:- 18,432 + 64 == 18496.
  • Conv2D is a layer which performs 2D convolutions, with filter of size 3*3. There are again 64 such new filters (aka kernels), that we have specified in this layer too. So, for each Incoming-Input-Feature-Map, there shall be 64 outputs thus generated, as a result of application of each of 64 kernels.
  • Note that, input size to the Layer-3 is (5*5*1) and therefore, the size of the output-feature-map of this 3rd layer shall be dictated by following computation :- We have : W=5, K=3, P=0, S=1, therefore, computation for Output-Feature-Map-Size shall be :- O = [(5–3 + 2*0)/1 + 1] = 3. Therefore, each of the 64 output-feature-maps this generated post this convolution-operation of 3rd Conv-Layer shall be (3*3*1).
  • See here we have a single-input-image (with 64 feature maps aka 64 dimensions) as Input to the 3rd Convolutional layer.
  • There are 64 kernels, each of size : 3*3. Note that, In each of the filter, we have got 9 different values. So, total no. of parameters comes out to be :~: (64 * 64 * 9) = 36,864.
  • Also, there is A bias, each one for each of the filter. So, there are 64 biases.
  • Hence, total number of parameters comes to be:- 36,864 + 64 == 36,928.
  • This layer also acts as an input layer, with input-tensor of size: (576*1) i.e. each input image to our fully-connected-model is of size 576*1.
  • Recall that, earlier above, we had converted(reshaped/flattened) all of our input images from 3d (dimension: 3*3*64) to 1d (dimension: 576).
  • This 4th layer is defined as dense-layer with 64 neurons and ‘relu’ activation function.
  • Dense-Layer implies that, all neurons of one layer are connected to all neurons of next layer.
  • There would be 64 outputs, from this 4th Layer, because there are 64 neurons overall we have in this particular 4th layer. Below is the summary, post applying this layer to our model :-
  • Input to the first hidden layer(aka 4th layer of CNN) is an flattened feature-map of size (576*1).
  • In first hidden layer, we have in-total of 64 neurons, so, the total number of variables involved are :- 576*1*64 = 36,864.
  • Also, there would be 64 total biases each-one for 64 neurons there in first hidden layer(aka 4th layer of CNN). Therefore, here in this model, total net number of parameters would be : (36,864 + 64 == 36928).
  • This layer also acts as a hidden layer, with input-tensor of size: (64*1). Recall that, output of the first hidden layer was an image of size 64*1.
  • This 5th layer is defined as dense-layer with 32 neurons and ‘relu’ activation function. Dense-Layer implies that, all neurons of one layer are connected to all neurons of next layer.
  • There would be 32 outputs, from this 5th Layer, because there are 32 neurons overall we have in this particular 5th layer. Below is the summary, post applying this layer to our model :-
  • Input to the second hidden layer(aka 5th layer of CNN) is a flattened feature-map of size (64*1).
  • In second hidden layer(aka 5th layer of CNN model), we have in-total of 32 neurons, so, the total number of variables involved are :- 64*32 = 2048.
  • Also, there would be 32 total biases each-one for 32 neurons there in second hidden layer(aka 5th layer of CNN). Therefore, here in this model, total net number of parameters would be : (2048 + 32 == 2080).
  • This layer also acts as an output layer, with input-tensor of size: (32*1). Recall that, output of the second hidden layer was an image of size 32*1.
  • This 6th layer is defined as dense-layer with 10 neurons and ‘softmax’ activation function.
  • There would be 10 outputs, from this 6th Layer, because there are 10 neurons overall we have in this particular 6th layer.
  • Note that, we have used softmax activation function, in order to perform the Classification accurately. Below is the summary, post applying this 6th layer to our model :-
  • Input to the 6th layer of CNN is a flattened feature-map of size (32*1).
  • In 6th layer(aka final output layer of CNN model), we have in-total of 10 neurons, so the total number of variables involved are :- 32*10 = 320.
  • Also, there would be 10 total biases each-one for 10 neurons there in output layer(aka 6th layer of CNN). Therefore, here in this model, total net number of parameters would be : (320 + 10 == 330).

Finally, our Six-Layered-CNN-model, thus formed so far, can also be visualised as below :-

  • The Cost function is defined as Loss-Function and in this case, we are using “Categorical-Entropy” as Loss function.
  • The metrics used for evaluating the model is “accuracy”.
  • We are planning to use ADAM optimiser on the top of SGD (Standard Gradient Descent), in order to minimise the cost-function.
  • Epoch :- One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE. We usually need many epochs, in order to arrive at an optimal learning curve. Here, in this example, In one Full-Epoch (i.e. Forward+Backward pass), these 938 batches are passed. Weights are iteratively tuned, Loss is gradually reduced and Model-accuracy is gradually improved.
  • Batch size: Since we have limited memory, probably we shall not be able to process the entire training instances(i.e. 60,000 instances) all at one forward pass. So, what is commonly done is splitting up training instances into subsets (i.e., batches), performing one pass over the selected subset (i.e., batch), and then optimising the network through back-propagation. So we have divided our Training-Data-Set into the small chunks of 64 each. Therefore, there are around 60,000 / 64 ~==~ 938 net-total batches that we have formed in this process.
  • Validation Split of 0.1: It means that out of total dataset, 1% of data is set aside as the cross-validation-set. It’s value is a Float between 0 and 1. It stands for the fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch.
  • Post 1st EPOCH is completed, net validation loss of our model was 6.34%.
  • Post 2nd EPOCH is completed, net validation loss of our model has reduced to 4.37%. This is a considerable improvement.
  • Post 3rd EPOCH is completed, net validation loss of our model has further reduced to 4.13%. This is further respectable improvement.
  • Post 1st EPOCH is completed, net total accuracy of our model was 98.02%.
  • Post 2nd EPOCH is completed, net total accuracy of our model increased to 98.85%.
  • Post 3rd EPOCH is completed, net total accuracy of our model increased to 98.90%.
  • The last element of the history object (model_fit.history[‘loss’])gives us the final loss after training process.
  • The last element of the history object (model_fit.history[‘acc’])gives us the final accuracy after training process.
  • Note that, earlier above, we had converted the Ytest as well into the vector representation, using the one-hot-encoding approach. Therefore, the corresponding label for the sample random image appears into the vector-format-representation.
  • In the one-Hot-encoding-format, each position represents the number. For example, Index-0 represents 0. Index-1 represents 1, Index-2 represents 2. and so on.
  • Note that in above example, we have randomly chosen the 7349th image from the testing-data-set and same upon being converted to the one-hot-vector format, indicates that position 3rd from beginning is ONE and rest other are ZERO)
  • High value @ 3rd Index, indicates that, our model is too much confident about this particular number being as 2.
  • Small value @ all other indices, indicates that, our model is too less confident about this particular number either being as 0 OR being as 1 OR being as 3 OR being as 3 OR being as 4 OR being as 5 OR being as 6 OR being as 7 OR being as 8 OR being as 9.
  • By changing the number of Kernels being applied in the initial layers.
  • By changing the size of each kernel being applied on the input image.
  • By changing the type of sub-sampling-operation like MaxPooling OR MeanPooling.
  • By changing the number of neurons in each of the hidden/output layers.
  • By modifying the various optimisers like RMSProp, ADAM, etc.
  • Changing the number of hidden layers itself. (Note that, we have used 3 CNN layers and 3 hidden layers, in total, in aforesaid demonstration).
  • Changing the number of neurons in each of the involved layers (input/dense layer).
  • By playing on the value of the eTa i.e. Learning-Rate.
  • Choice of Activation functions at each layer.
  • By modifying the various optimisers like RMSProp, etc.
  • By testing out the CIFAR dataset.
  • By training for more epochs for better graphs.

--

--

--

Software Engineer for Big Data distributed systems

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
aditya goel

aditya goel

Software Engineer for Big Data distributed systems

More from Medium

How to Detect Rotten Fruits Using Image Processing Python?

My Experience as a Computer Vision Intern

Dataset augmentation for Deep Learning

Pneumonia Detection using Mask RCNN