-
The decision tree chooses the splitting point where the impurity is maximized (which can be calculated).
- The impurity is calculated using measures such as error rate, entropy, and Gini coefficient.
-
Pruning is done to prevent overfitting.
Advantages:
- It can handle data of various scales and types.
- It is highly adaptable.
- Even beginners can create easily understandable trees.
Disadvantages:
- It cannot do anything outside the predetermined scope.
-
- Tree predictions cannot handle anything outside the scope.
- It is prone to overfitting.
In random forests, many trees are created by randomly omitting data and features, and the majority vote is taken to achieve high accuracy.
- This also helps to avoid overfitting.
- It is the most popular method for regression and classification.
- However, the interpretability, which is a benefit of trees, is reduced.
There is also gradient boosting regression trees, which have more parameters but higher performance.
- It combines many small trees with pre-pruning.