Multi-class Classification of Mathematical-Numbers | Vanilla DNN | Part1

  • Training dataset of 60,000 images.
  • Testing dataset of 10,000 images.
  • Xtrain is the training data set.
  • Ytrain is the set of labels to all the data in Xtrain.
  • Xtest is the test data set.
  • Ytest is the set of labels to all the data in Xtest.
  • Training dataset of 60,000 images, stored in Xtrain.
  • Testing dataset of 10,000 images, stored in Xtest.
  • Reshape → We shall be reshaping each image in the test dataset from 2d (each image of size 28 * 28) into 1d (each image of size 784).
  • Normalisation → We then perform normalisation where each pixel value is converted from range of (0 to 255) to the range of (0 to 1).
  • Xtrain contains the 60,000 images, which we shall be using for training.
  • Ytrain contains the corresponding labels for 60,000 images in Xtrain dataset.
  • Xtest contains the 10,000 images, which we shall be using for testing/validation.
  • Ytest contains the corresponding labels for 10,000 images in Xtest dataset.
  • The Cost function is defines as Loss-Function and in this case, we are using “Categorical Entropy” as Loss function.
  • The metrics used for evaluating the model is “accuracy”.
  • We are planning to use ADAM optimiser on the top of SGD (Standard Gradient Descent), in order to minimise the cost-function.
  • Epoch :- One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE. Note here that single pass(Forward+Backward) or one epoch is not enough, as it may lead to under-fitting of the curve. As the number of epochs increases, more number of times the weight are changed in the neural network and the curve usually goes from under-fitting to optimal to over-fitting curve.
  • Batch size: Since we have limited memory, probably we shall not be able to process the entire training instances(i.e. 60,000 instances) all at one forward pass. So, what is commonly done is splitting up training instances into subsets (i.e., batches), performing one pass over the selected subset (i.e., batch), and then optimising the network through back-propagation. The number of training instances within a subset (i.e., batch) is called batch_size. The higher the batch size, the more memory space we would be needing. The batch_size is usually specified in power of 2.
  • The last element of the history object (model_fit.history[‘loss’])gives us the final loss after training process.
  • The last element of the history object (model_fit.history[‘acc’])gives us the final accuracy after training process.
  • Changing the number of neurons in each of the hidden/output layers.
  • Changing the number of hidden layers as well.
  • By modifying the various optimisers like RMSProp, etc.




Software Engineer for Big Data distributed systems

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Surfing and Machine Learning: one and the same learning process

OhMyGraphs: GraphSAGE in PyG

Loss Functions Explained

[CVPR2019/PaperSummary]Improving Pedestrian Attribute Recognition With Weakly-Supervised…

Create Your Own Real Image Dataset with python (Deep Learning)

A Transformer Chatbot Tutorial with TensorFlow 2.0

Daily learning notes — 10th May

The Psychology Behind Machine Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
aditya goel

aditya goel

Software Engineer for Big Data distributed systems

More from Medium

How to make an image detection model faster? [BENCHMARK]

My Experience as a Computer Vision Intern

Implement ResNet with TensorFlow2

COVID-19 Detection Using Deep Learning