Random forest is a famous algorithm that has a good performance, and it is easy to explain how it works.
But before Tom introduce Random forest model to you, he want first cover the concept of a decision tree.
A decision tree is a tree-like classification model of decisions and their possible consequences. Here is a decision tree of how to distinguish products from Apple.
Sklearn provides an simple way to generate a decision tree
A decision tree is too simplistic and very sensitive to biased data, to avoid this bias, Tom can use multiple decision trees, which ensembles a random forest model. A random forest arbitrarily chooses a subset of all the features and builds a decision tree with them. With the increase of tree numbers, the biases will tend to balance.
Sklearn also provides an simple way to use the random forest model directly.
So Tom decided to use random forest directly to generate a baseline.
Result of the classifier:
|avg / total||0.59||0.61||0.58||1000000|
From all the statistical result shown above, we can see that the model did a poor job classifying every thing except class 0 and class 1. In fact no class other than class 0 or class 1 are in the prediction result. The result was bad but Tom decided to start from here.