Name: CSE508 Assignment 5 Solved
SKU: 52480
Price: 30.00 USD
Availability: InStock

Description

Rate this product

Download the 20 _newsgroup dataset. You need to pick documents of comp.graphics, sci.med, talk.politics.misc, rec.sport.hockey, sci.space [5 classes] for text classification.

Implement the following algorithms for text classification:

Naive Bayes
kNN (vary k=1,3,5)

Feature selection techniques to be used with both algorithms:

Tf-IDF
Mutual Information Implementation Points:
Perform the data pre-processing steps.
Split your dataset randomly into train: test ratio. You need to select the documents randomly for splitting. You are not supposed to split documents in sequential order, for instance, choosing the first 800 documents in the train set and last 200 in the test set for the train: test ratio of 80:20.
Implement the TF-IDF scoring technique and mutual information technique for efficient feature selection.
For each class – train your Naive Bayes Classifier and kNN on the training data.
Test your classifiers on testing data and report the confusion matrix and overall accuracy.
Perform the above steps on 50:50, 70:30, and 80:20 training and testing split ratios.
Compare and analyze the performance of the above-mentioned two classification algorithms for both the feature selection techniques across different train: test ratios. Use graphs to report the performance comparison. Also, mention your inferences from the graphs. Example of a graph you can report – a graph showing the performance of kNN for different values of k.

A5_MT19133-w4lhhu.zip

CSE508 Assignment 5 Solved

If Helpful Share:

Description

Related products

CSE508- Network Security, Homework 1- Passive Network Monitoring

CSE508- Network Security,Homework 3- DNS Poisoning Solved

CSE508-Network Security, Homework 4-Plugboard Proxy Solved