Description
- Â Construct the root and the first level of a decision tree for the contact lenses data. Use the ID3 algorithm. Show the details of your construction. Then, check your solution with Weka (the data file is included with Weka).
- Â Construct two rules using PRISM for the weather data. Show the details of your construction. Then, check your solution with Weka (the data file is included with Weka).
-  Classify using Naïve Bayes method (on contact lenses data) the data item:
pre-presbyopic, hypermetrope, yes, reduced, ? Then, check your solution with Weka (the data file is included with Weka).
Note: You can download Weka from: http://www.cs.waikato.ac.nz/ml/weka
Implement a Naive Bayes classifier for text classification. This classifier will be used to classify fortune cookie messages into two classes: messages that predict what will happen in the future and messages that just contain a wise saying. We will label messages that predict what will happen in the future as class 1 and messages that contain a wise saying as class 0. For example,
ï‚· Â Â “Never go in against a Sicilian when death is on the line” would be a message in class 0. ï‚· Â Â Â Â Â Â Â Â Â “You will get an A in SENG 474” would be a message in class 1.
You can use any language you wish. There are two sets of data files provided:
- The training data:
- txt: This is the training data consisting of fortune cookie messages.
- txt: This file contains the class labels for the training data.
- The testing data:
- txt: This is the testing data consisting of fortune cookie messages.
- txt: This file contains the class labels for the testing data. These are only used to determine the accuracy of the classifier.
Your results must be stored in a file called results.txt.
- Run your classifier by training on traindata.txt and trainlabels.txt then testing on traindata.txt and trainlabels.txt. Report the accuracy in results.txt (along with a comment saying what files you used for the training and testing data). In this situation, you are training and testing on the same data. This is a sanity check: your accuracy should be very high i.e. > 90%
- Run your classifier by training on traindata.txt and trainlabels.txt then testing on testdata.txt and testlabels.txt. Report the accuracy in results.txt (along with a comment saying what files you used for the training and testing data). We will not be letting you know beforehand what your performance on the test set should be.