Some implementations of Decision tree does not include:
Answers
Answer:
Parent node: In any two connected nodes, the one which is higher hierarchically, is a parent node.
Child node: In any two connected nodes, the one which is lower hierarchically, is a child node.
Root node: The starting node from which the tree starts, It has only child nodes. The root node does not have a parent node. (dark blue node in the above image)
Leaf Node/leaf: Nodes at the end of the tree, which do not have any children are leaf nodes or called simply leaf. (green nodes in the above image)
Internal nodes/nodes: All the in-between the root node and the leaf nodes are internal nodes or simply called nodes. internal nodes have both a parent and at least one child. (red nodes in the above image)
Splitting: Dividing a node into two or more sun-nodes or adding two or more children to a node.
Decision node: when a parent splits into two or more children nodes then that node is called a decision node.
Pruning: When we remove the sub-node of a decision node, it is called pruning. You can understand it as the opposite process of splitting.
Branch/Sub-tree: a subsection of the entire tree is called a branch or sub-tree.
Types of Decision Tree
Regression Tree
A regression tree is used when the dependent variable is continuous. The value obtained by leaf nodes in the training data is the mean response of observation falling in that region. Thus, if an unseen data observation falls in that region, its prediction is made with the mean value. This means that even if the dependent variable in training data was continuous, it will only take discrete values in the test set. A regression tree follows a top-down greedy approach.
Classification Tree
A classification tree is used when the dependent variable is categorical. The value obtained by leaf nodes in the training data is the mode response of observation falling in that region It follows a top-down greedy approach.
Together they are called as CART(classification and regression tree)
Building a decision Tree from data
Decision tree structure
How to create a tree from tabular data? which feature should be selected as the root node? on what basis should a node be split? To all these questions answer is in this section
The decision of making strategic splits heavily affects a tree’s accuracy. The purity of the node should increase with respect to the target variable after each split. The decision tree splits the nodes on all available variables and then selects the split which results in the most homogeneous sub-nodes.
The following are the most commonly used algorithms for splitting
1. Gini impurity
Gini says, if we select two items from a population at random then they must be of the same class and the probability for this is 1 if the population is pure.
It works with the categorical target variable “Success” or “Failure”.
It performs only Binary splits
Higher the value of Gini higher the homogeneity.
CART (Classification and Regression Tree) uses the Gini method to create binary splits.
Steps to Calculate Gini impurity for a split
Calculate Gini impurity for sub-nodes, using the formula subtracting the sum of the square of probability for success and failure from one.
1-(p²+q²)
where p =P(Success) & q=P(Failure)
Calculate Gini for split using the weighted Gini score of each node of that split
Select the feature with the least Gini impurity for the split.
.
Explanation:All About Decision Tree from Scratch with Python Implementation