CSCI567 Problem 2-Decision Tree Solved

30.00 $

Category:

Description

Rate this product

Remember from lecture, we learned that we can use decision tree to solve classification and regression problem. Mostly we focus on classification. In problem 1 we used KNN to do classification. We could also use decision tree algorithm to do the same job. For Decision Tree, you will be asked to implement ID3 algorithm. It’s guaranteed that all features are discrete in this assignment. After finishing the implementation, you can use dt_test_script.py to test the correctness of your functions.

2.1.1 Information Gain

  • In ID3 algorithm, we use Entropy to measure the uncertainty in the data set. We use Information Gain to measure the quality of a split.
  • Entropy:
  • Information_Gain:
  • see more detail on ID3 Algorithm In this section, you need to implement Information_Gain function on utils.py.

2.1.2 Grow decision Tree and predict (25 points)

  • In ID3 algorithm, we use the largest information gain to split the set S. Please consult the Lecture 2 notes page 23.
  • Implement TreeNode split function and TreeNode predict function in hw1_dt.py:
    • TreeNode.split
      • In TreeNode class, variable features represents all data points in current TreeNode. Variable labels represents corresponding labels for all data points. Variable children is a list of TreeNode after split the current node on best attribute. This should be a recursive process that once we call the split function, the TreeNode will keep spliting until we get the whole decision tree.
      • When there is a tie of information gain on comparing all attributes of current TreeNode, always choose the attribute which has more attribute values. If they have same number of attribute values, use the one with small index.
      • Build child list of each TreeNode with increasing order on attribute values, the order matters because it will be used when comparing two decision trees.
    • TreeNode.predict
      • This function will be called once you have got the decision tree. It will take one single data point as a parameter, your code should process that data point and go through your tree to a leaf node and make prediction. This function should return a predicted lable.
  • You don’t need to implement Decision Tree predict and train function in hw1_dt.py. We will provide these two function in the statercode. Reading the train and predict function would help you understand functions that you need to implement.
    • DecisionTree.train
    • DecisionTree.predict
  • This is the comarision logic we will use to check your decision tree.

Part 2.2 Pruning Decision Tree

Sometimes, in order to prevent overfitting. We need to prune our Decision Tree. There are several approaches to avoid overfitting in building decision trees.

  • Pre-pruning: stop growing the tree earlier, before it perfectly classifies the training set.
  • Post-pruning: grows decision tree completely on training set, and then post prune decision tree.

Practically, the second approach post-pruning is more useful, also it is not easy to precisely estimate when to stop growing the tree by using pre-pruning. We will use Reduced Error Pruning, as one of Post-pruning in this assignment.

For Pruning of Decision Tree, you can refer Reduce Error Pruning and P69 of Textbook: Machine Learning -Tom Mitchell.

2.2.1 Reduce Error Pruning

Hint: in this part, you can add other parameters or functions in TreeNode class and DecisionTree class to help you in pruning the decision tree. But your changes should not influence results of previous parts. Implement the reduced_error_pruning function in util.py.

Good Luck! 🙂

  • Decision-Tree-lhympp.zip