Description

5/5 - (1 vote)

In this assignment you are required to build a Naïve Bayes email spam filter.

Data Description

Name: CS 534-Artificial Intelligence - Assignment 5 Solved
SKU: 73929
Price: 35.00 USD
Availability: InStock

The data can be downloaded from here .

This dataset was created from 64 emails collected from the DBWorld mailing list. Please note, the actual emails are not given to you, and the emails have already been processed using NLP.

There are two datasets, dbworld_bodies_stemmed and dbworld_subjects_stemmed corresponding to the email body and email subject respectively

The data is currently represented as a binary stemmed bag-of-words and requires no additional NLP.

Each dataset is in a table form with 64 rows and n
The 1^st column is “id” and has values from 1 to 64, corresponding to each of the 64 emails (this column can be removed).
The 2 to n-1 columns are unique words found in all the emails, they have binary values i.e. 0 means that the word did not appear in the email and 1 means that the word appeared.
The n^th column is CLASS, 0 means discard email and 1 means keep email.

Naïve Bayes Classifier

You should implement from scratch a Naïve Bayes classifier (using the spam filter example discussed in class).

Also implement Laplacian smoothing to handle words not in the dictionary. (40 points)

Using the implemented algorithm, train and test the model for each dataset.

Use 80% of each class data to train your classifier and the remaining 20% to test it. Which dataset provides better classification i.e. email body or email subject? (20 points)

f -measure= 2PreRec Pre+ Rec

TP TP

where Pre= ; Rec= ; TP+ FP TP + FN

and TP is the number of true positives (class 1 members predicted as class 1), TN is the number of true negatives (class 2 members predicted as class 2), FP is the number of false positives (class 2 members predicted as class 1), and FN is the number of false negatives (class 1 members predicted as class 2).

Compare your classifier with the scikit-learn implementation

(sklearn.naive_bayes.MultinomialNB ).

Repeat the analysis from (b). (20 points)

Assignment-5-kk0qwl.zip

CS 534-Artificial Intelligence – Assignment 5 Solved

If Helpful Share:

Description

Data Description

Naïve Bayes Classifier

Related products

CS534 Implementation #3-Decision Tree Ensemble for Optical Character Recognition Solved

CS534 Homework 1 Solved

CS534 Homework 3 Solved