# Multi-class Classification of Mathematical-Numbers | DNN+Regularisation | Part3

• Training dataset of 60,000 images.
• Testing dataset of 10,000 images.
• `Xtrain` is the training data set.
• `Ytrain` is the set of labels to all the data in `Xtrain`.
• `Xtest` is the test data set.
• `Ytest` is the set of labels to all the data in `Xtest`.
• Training dataset of 60,000 images, stored in Xtrain.
• Testing dataset of 10,000 images, stored in Xtest.
• Reshape → We shall be reshaping each image in the test dataset from 2d (each image of size 28 * 28) into 1d (each image of size 784).
• Normalisation → We then perform normalisation where each pixel value is converted from range of (0 to 255) to the range of (0 to 1).
• Xtrain contains the 60,000 images, which we shall be using for training.
• Ytrain contains the corresponding labels for 60,000 images in Xtrain dataset.
• Xtest contains the 10,000 images, which we shall be using for testing/validation.
• Ytest contains the corresponding labels for 10,000 images in Xtest dataset.
• We shall be defining “Categorical Entropy” as the Cost-Function / Loss-Function.
• The metrics used for evaluating the model is “accuracy”.
• We are planning to use ADAM optimiser on the top of SGD (Standard Gradient Descent), in order to minimise the cost-function more robustly.
• So we have divided our Training-Data-Set into the small chunks of 64 each. Therefore, there are around 938 net-total batches that we have formed in this process.
• In one Full-Epoch (i.e. Forward+Backward pass), these 938 batches are passed. Weights are iteratively tuned, Loss is gradually reduced and Model-accuracy is gradually improved.
• Epoch :- One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE. We usually need many epochs, in order to arrive at a optimal learning curve.
• Batch size: Since we have limited memory, probably we shall not be able to process the entire training instances(i.e. 60,000 instances) all at one forward pass. So, what is commonly done is splitting up training instances into subsets (i.e., batches), performing one pass over the selected subset (i.e., batch), and then optimising the network through back-propagation.
• Validation Split: It means that out of total dataset, 2% of data is set aside as the cross-validation-set. It’s value is a Float between 0 and 1. It stands for the fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch.
• The last element of the history object (model_fit.history[‘loss’])gives us the final loss after training process.
• The last element of the history object (model_fit.history[‘acc’])gives us the final accuracy after training process.
• Changing the number of hidden layers itself. (Note that, we have used 4 layers in total in aforesaid demonstration).
• Changing the number of Neurons in each of the involved layers (input/dense/output layer).
• By tuning on the value of the eTa i.e. Learning-Rate.
• By tuning on the value of the Lambda i.e. Regularisation-Rate.
• Choice of Activation functions at each layer.
• By modifying the various optimisers like RMSProp, etc.

--

--