Purity Metrics

2 minute read

Decision trees recursively partition the data based on feature values.

images/machine_learning/supervised/decision_trees/purity_metrics/slide_02_01.tif

Pure Leaf Node: Terminal node where every single data point belongs to the same class.

💡Zero Uncertainty.

So, what should be the logic to partition the data at each step or each node ?

The goal of a decision tree algorithm is to find the split that maximizes information gain, meaning it removes the most uncertainty from the data.

So, what is information gain ?
How do we reduce uncertainty ?

Let’s understand few terms first, before we understand information gain.

Measure of uncertainty, randomness, or impurity in a data.

\[H(S)=-\sum _{i=1}^{n}p_{i}\log(p_{i})\]

Binary Entropy:

images/machine_learning/supervised/decision_trees/purity_metrics/slide_04_01.png

💡Entropy can also be viewed as the ‘average surprise'.

A highly certain event provides little information when it occurs (low surprise).
An unlikely event provides a lot of information (high surprise).

⭐️ Measures the reduction in entropy (uncertainty) achieved by splitting a dataset based on a specific attribute.

\[IG=Entropy(Parent)-\left[\frac{N_{left}}{N_{parent}}Entropy(Child_{left})+\frac{N_{right}}{N_{parent}}Entropy(Child_{right})\right] \]

Note: The goal of a decision tree algorithm is to find the split that maximizes information gain, meaning it removes the most uncertainty from the data.

⭐️ Measures the probability of an element being incorrectly classified if it were randomly labeled according to the distribution of labels in a node.

\[Gini(S)=1-\sum_{i=1}^{n}(p_{i})^{2}\]

Range: 0 (Pure) - 0.5 (Maximum impurity)

Note: Gini is used in libraries like Scikit-Learn (as the default), because it avoids the computationally expensive log function.

Gini Impurity is a first-order approximation of Entropy.
For most of the real-world cases, choosing one over the other results in the exact same tree structure or negligible differences in accuracy.
When we plot the two functions, they follow nearly identical shapes.

Video Decision Trees Purity Metrics | Information Gain | Gini Impurity | Explained with Examples

Previous: Decision Trees Introduction Next: Decision Trees For Regression