Multi-class Classification using Decision-Tree Model

  • criterion: It defines the function to measure the quality of a split. Sklearn supports “gini” criteria for Gini Index & “entropy” for Information Gain. By default, it takes “gini” value.
  • splitter: It defines the strategy to choose the split at each node. Supports “best” value to choose the best split & “random” to choose the best random split. By default, it takes “best” value.
  • max_features: It defines the no. of features to consider when looking for the best split. We can input integer, float, string & None value. If an integer is inputted then it considers that value as max features at each split. If float value is taken then it shows the percentage of features at each split. If “auto” or “sqrt” is taken then max_features=sqrt(n_features). If “log2” is taken then max_features= log2(n_features). If None, then max_features=n_features. By default, it takes “None” value.
  • max_depth: The max_depth parameter denotes maximum depth of the tree. It can take any integer value or None. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. By default, it takes “None” value.
  • min_samples_split: This tells above the minimum no. of samples reqd. to split an internal node. If an integer value is taken then consider min_samples_split as the minimum no. If float, then it shows percentage. By default, it takes “2” value.
  • min_samples_leaf: The minimum number of samples required to be at a leaf node. If an integer value is taken then consider min_samples_leaf as the minimum no. If float, then it shows percentage. By default, it takes “1” value.
  • max_leaf_nodes: It defines the maximum number of possible leaf nodes. If None then it takes an unlimited number of leaf nodes. By default, it takes “None” value.
  • min_impurity_split: It defines the threshold for early stopping tree growth. A node will split if its impurity is above the threshold otherwise it is a leaf.

--

--

--

Software Engineer for Big Data distributed systems

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

The ultimate guide to confusion matrix in machine learning

Machine Learning in Digital Process Automation — Part III

Everything you need to know about Convolution Neural Nets

Powering Glovo’s Machine Learning with Real-Time Data

Ensemble Methods: Bagging vs Boosting

Measuring Shopping Page Performance with Markov Attribution Modelling

Introduction to Learning Rates in Machine Learning

Roadmap for Conquering Computer Vision

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
aditya goel

aditya goel

Software Engineer for Big Data distributed systems

More from Medium

What is Python Interpreter ?

How to Load and Access SQLite Databases Using Terminal

Introduction to testing with Pytest on Colab

How To predict brain tumor using TensorFlow, Keras, and convolutional neural network?