[SOLVED] CSE508 Assignment 5

30.00 $

Category:
Click Category Button to View Your Next Assignment | Homework

You will receive the following solution file(s) instantly after successful payment:

zip file icon A5_MT19133-w4lhhu.zip (1102.9 KB)
Assignment Instructions Updated Recently? Submit Below and we will provide new Solution!
Submit New Instructions
🔒 Securely Powered by:
Secure Checkout
Rate this product

Download the 20_newsgroup dataset. You need to pick documents of comp.graphics, sci.med, talk.politics.misc, rec.sport.hockey, sci.space [5​ classes] for text classification.

Implement the following algorithms for text classification:

  1. Naive Bayes
  2. kNN (vary k=1,3,5)

Feature selection techniques to be used with both algorithms:

  • Tf-IDF
  • Mutual Information Implementation Points:
  • Perform the data pre-processing steps.
  • Split your dataset randomly into train: test ratio. You need to select the documents randomly for splitting. You are not​ supposed to split documents in sequential order, for instance, choosing the first 800 documents in the train set and last 200 in the test set for the train: test ratio of 80:20.
  • Implement the TF-IDF scoring technique and mutual information technique for efficient feature selection.
  • For each class – train your Naive Bayes Classifier and kNN on the training data.
  • Test your classifiers on testing data and report the confusion matrix and overall accuracy.​
  • Perform the above steps on 50:50, 70:30, and 80:20 training and testing split ratios.
  • Compare and analyze the performance of the above-mentioned two classification algorithms for both the feature selection techniques across different train: test ratios. Use graphs to report the performance comparison.​ Also, mention your inferences from the graphs.​ Example of a graph you can report – a graph showing the performance of kNN for different values of k.

 

 

 

 

 

  • A5_MT19133-w4lhhu.zip