ECE448 Assignment 3-Naive Bayes/Logistic Regression Classification Solved

30.00 $

Category:

Description

5/5 - (1 vote)

Naive Bayes/Logistic Regression Classification

Programming language

You may only use modules from the Python standard library and numpy.

In this assignment you will apply machine learning techniques for image and text classification task,

and apply logistic regression classifier to do binary classification.

Contents

  •   Part 1:Image Classification
  •   Part 2: Text Classification
  •   Part 3: Linear Classifier
  •   Extra Credit
  •   Provided Code Skeleton
  •   Deliverables
  •   Report checklist

    Part 1: Digit image classification

Image from Baidu

Data: You are provided with part of the Digit Mnist dataset. There are 55000 training examples and 10000 test examples. The labels are from 0 to 9, representing digits of 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. In this section, you will apply Naïve Bayes model for this task.

Artificial Intelligence (Spring 2022) Assignment #3 Naive Bayes model

  •   Features: Each image consists of 28*28 pixels which we represent as a flattened array of size 784, where each feature/pixel Fi takes on intensity values from 0 to 255 (8 bit grayscale).
  •   Training: The goal of the training stage is to estimate the likelihoods P(Fi | class) for every pixel location i and for every digit class (0, 1, 2, 3, 4, 5, 6, 7, 8, 9). The likelihood estimate is defined as

    P(Fi = f | class) = (# of times pixel i has value f in training examples from this class) / (Total # of training examples from this class)

    In addition, as discussed in the lecture, you have to smooth the likelihoods to ensure that there are no zero counts. Laplace smoothing is a very simple method that increases the observation count of every value f by some constant k. This corresponds to adding k to the numerator above, and k*V to the denominator (where V is the number of possible values the feature can take on). The higher the value of k, the stronger the smoothing. Experiment with different values of k (say, from 0.1 to 10) and find the one that gives the highest classification accuracy.

    You should also estimate the priors P(class) by the empirical frequencies of different classes in the training set.

  •   Testing: You will perform maximum a posteriori (MAP) classification of test digit class according to the learned Naive Bayes model. Suppose a test image has feature values f1, f2, … , f784. According to this model, the posterior probability (up to scale) of each class given the digit is given by

    P(class) ⋅ P(f1 | class) ⋅ P(f2 | class) ⋅ … ⋅ P(f784 | class)

    Note that in order to avoid underflow, it is standard to work with the log of the above quantity:

    log P(class) + log P(f1 | class) + log P(f2 | class) + … + log P(f784 | class)

    After you compute the above decision function values for all ten classes for every test image, you will use them for MAP classification.

  •   Evaluation: Report your performance in terms of average classification rate and the classification rate for each digit (percentage of all test images of a given item correctly classified). Also report your confusion matrix. This is a 10×10 matrix whose entry in row r and column c is the percentage of test images from class r that are classified as class c. In addition, for each class, show the test examples from that class that have the highest and the lowest posterior probabilities according to your classifier. You can think of these as the most and least “prototypical” instances of each digit class (and the least “prototypical” one is probably misclassified).
  •   Likelihood visualization: When using classifiers in real domains, it is important to be able to inspect what they have learned. One way to inspect a naive Bayes model is to look at the most likely features for a given label. Another tool for understanding the parameters is to visualize the feature likelihoods

Artificial Intelligence (Spring 2022) Assignment #3

for high intensity pixels of each class. Here high intensities refer to pixel values from 128 to 255. Therefore, the likelihood for high intensity pixel feature Fi of class c1 is sum of probabilities of the top 128 intensities at pixel location i of class c1.

255

feature likelihood(𝐹 , 𝑐 ) = ∑ 𝑃(𝐹 = 𝑘|𝑐 ) 𝑖1𝑖1

𝑘=128

For each of the ten classes, plot their trained likelihoods for high intensity pixel features to see what likelihood they have learned.

Part 2: Text Classification

You are given a dataset consisting of texts which belong to 14 different classes. We have split the dataset into a training set and a development dataset. The training set consists of 3865 texts and their corresponding class labels from 1-14, with instances from each of the classes and the development set consists of 483 test instances and their corresponding labels. We have already done the preprocessing of the dataset and extracted into a Python list structure in text_main.py. Using the training set, you will learn a Naive Bayes classifier that will predict the right class label given an unseen text. Use the development set to test the accuracy of your learned model. Report the accuracy, recall, and F1-Score that you get on your development set. We will have a separate (unseen) train/test set that we will use to run your code after you turn it in. No other outside non- standard python libraries can be used.

Unigram Model

The bag of words model in NLP is a simple unigram model which considers a text to be represented as a bag of independent words. That is, we ignore the position the words appear in, and only pay attention to their frequency in the text. Here each text consists of a group of words. Using Bayes theorem, you need to compute the probability of a text belonging to one of the 14 classes given the words in the text. Thus you need to estimate the posterior probabilities:

𝑃(𝐶𝑙𝑎𝑠𝑠 = 𝐶𝑖|𝑊𝑜𝑟𝑑𝑠) = 𝑃(𝐶𝑙𝑎𝑠𝑠 = 𝐶𝑖) ∏ 𝑃(𝑊𝑜𝑟𝑑|𝐶𝑙𝑎𝑠𝑠 = 𝐶𝑖)

It is standard practice to use the log probabilities so as to avoid underflow. Also, P(words) is just a constant, so it will not affect your computation.

Training and Development

 Training: To train the algorithm you are going to need to build a bag of words model using the texts. After you build the model, you will need to estimate the log-likelihoods log𝑃(Word|Type =
𝐶𝑖) .The variable 𝐶𝑖 can only take on 14 values, 1-14. Additionally, you will need to make sure you

𝑃(𝑊𝑜𝑟𝑑𝑠)
𝐴𝑙𝑙 𝑤𝑜𝑟𝑑𝑠

Artificial Intelligence (Spring 2022) Assignment #3 smooth the likelihoods to prevent zero probabilities. In order to accomplish this task, use Laplace

 Development: After you have computed the log-likelihoods, you will have your model predict class labels of the text from the development set. In order to do this, you will do MAP estimatation classification using the equation shown above.

Use only the training set to learn the individual probabilities. The following results should be put in your report:

  1. Plot your confusion matrix. This is a 14×14 matrix whose entry in row r and column c is the percentage of test text from class r that are classified as class c.
  2. Accuracy, recall, and F1 scores for each of the classes on the development set.
  3. Top 20 feature words of each of the classes .
  4. Calculate your accuracy without including the class prior into the Naive Bayes equation i.e. Only

    computing the ML inference of each instance. Report the change in accuracy numbers, if any. Also state your reasoning for this observation. Is including the class prior always beneficial? Change your class prior to a uniform distribution. What is the change in result?

Part 3: Linear Classifier

Image by TA

There are some points on the 2-D plane, some of which are labeled as 1, and others are labeled as 0. Your task is to find the boundary line that can correctly separate these two categories of points. As in the image shown above, the solid line is the real boundary, and the dashed line is the boundary found by a logistic regression classifier.

Note:

  1. a)  You need to achieve a logistic regression classifier for this task. The logistic.py are the only file you

    need to modify in this section.

  2. b)  Although we only do classification on 2-D points in this task, your code should be working on arbitrary

    dimensions.

Artificial Intelligence (Spring 2022) Logistic regression model

Logistic regression model, also known as differentiable perceptron, which is as follows:

Assignment #3

Note: a)

b)

• •

 

This logistic regression model is different from the one in the lecture slide. You should implement this one in this task, NOT the one in the slide.
The derivative of sigmoid function is 𝑓′(𝑥) = 𝑓(𝑥) × (1 − 𝑓(𝑥))

Features: The coordinates of every points. Denote the number of points as N, the dimension of coordinates as P, the feature matrix should be P*N.
Training: Achieve the training process of Logistic Regression model. Recall the loss function of logistic regression in lecture slide, which is as follows: (Note: a better mesurement would be logistic loss which is not required in this MP. If you are interested, see Logistic regression here.)

𝑛

𝐿 ( 𝑦 1 , … , 𝑦 𝑛 , 𝑓 1 , … , 𝑓 𝑛 ) = ∑ ( 𝑦 𝑖 − s i g m o i d ( 𝑤⃗ ⃗ 𝑇 𝑓 𝑖 ) ) 2 𝑖=1

Testing: The code provided has already achieve the testing process for you. You do NOT need to achieve this. But do Not forget to report the test results on your report.
Evaluation: We repeated the process of training and testing for many times, and take the average training error and testing error as our evaluation of our model. This is also achieved in the skeleton code for you.

𝒇(⃗𝒘⃗ 𝑻⃗𝒇) = 𝐬𝐢𝐠𝐦𝐨𝐢𝐝(⃗𝒘⃗ 𝑻⃗𝒇) = 𝟏
𝟏 + 𝒆 − 𝒘⃗ ⃗ ⃗ 𝑻 ⃗ 𝒇

Extra Credit Suggestion

Implement the naive Bayes algorithm over a bigram model as opposed to the unigram model. Bigram model is defined as follows:

P(𝑤1. . 𝑤𝑛) = 𝑃(𝑤1)𝑃(𝑤2|𝑤1). . 𝑃(𝑤𝑛|𝑤𝑛−1)

Then combine the bigram model and the unigram model into a mixture model defined with parameter λ: 𝑛𝑚

(1 − λ)𝑃(𝑌) ∏ 𝑃(𝑤𝑖|𝑌) + λ𝑃(𝑌) ∏ 𝑃(𝑏𝑖|𝑌) 𝑖=1 𝑖=1

Did the bigram model help improve accuracy? Find the best parameter λ that gives the highest classification accuracy. Report the optimal parameter λ and report your results (Accuracy number) on the bigram model and optimal mixture model, and answer the following questions:

  1. Running naive Bayes on the bigram model relaxes the naive assumption of the model a bit. However, is this always a good thing? Why or why not?
  2. What would happen if we did an N-gram model where N was a really large number?

Artificial Intelligence (Spring 2022) Assignment #3 Provided Code Skeleton

We have provided ( zip file) all the code to get you started on your MP.
For part 1, you are provided the following. The doc strings in the python files explain the purpose of each

function.

  •   image_main.py- This is the main file which loads the dataset and calls your Naive Bayes algorithms.
  •   naive_bayes.py- This is the only file that needs to be modified.
  •   x_train.npy, y_train.npy, x_test.npy and y_test.npy- These files contain the training and testing

    examples.

    For part 2, you are provided the following. The doc strings in the python files explain the purpose of each function

  •   text_main.py- This is the main file which loads the dataset and calls your Naive Bayes Algorithm.
  •   TextClassifier.py- This is the only file that needs to be modified.
  •   train_text.csv- This file contains the training examples.
  •   dev_text.csv- This file contains the development examples for testing your model.
  •   stop_words.csv- This file contains the stop words which are required for preprocessing the dataset. For part 3, you are provided the following. The doc strings in the python files explain the purpose of each

    function

  • linear_classifier_main.py- This is the main file which loads the dataset and calls your Perceptron and Logistic Regression Algorithm.
  • logistic.py- This is the only file that needs to be modified to achieve your Logistic Regression Algorithm.
  • mkdata.py – This is the file to make synthetic data for your algorithm.
  •   plotdata.py – This file is to plot the experiment result of your perceptron model.
  •   plotdata_log_reg.py- This file is to plot the experiment result of your Logistic Regression model. Deliverables

    This MP will be submitted via blackboard.
    Please upload only the following files to blackboard.

    1. naive_bayes.py – your solution python file to part 1
    2. TextClassifier.py – your solution python file to part 2
    3. logistic.py – your solution python file to part 3
    4. report.pdf – your project report in pdf format

Artificial Intelligence (Spring 2022) Assignment #3 Report Checklist

Your report should briefly describe your implemented solution and fully answer the questions for every part of the assignment. Your description should focus on the most “interesting” aspects of your solution, i.e., any non-obvious implementation choices and parameter settings, and what you have found to be especially important for getting good performance. Feel free to include pseudocode or figures if they are needed to clarify your approach. Your report should be self-contained and it should (ideally) make it possible for us to understand your solution without having to run your source code.

Kindly structure the report as follows:

  1. Title Page:
    List of all team members, course number and section for which each member is registered, date on which the report was written
  2. Section I:
    Image Classification. Report average classification rate, the classification rate for each class and the confusion matrix. For each class, show the test examples from that class that have the highest and lowest posterior probabilities or perceptron scores according to your classifier. Show the ten visualization plots both feature likelihoods.
  3. Section II:
    Text Classification. Report all your results, confusion matrix ,recall ,precision, F1 score for all the 14 classes. Include the top feature words for each of the classes. Also, report the change in accuracy results when the class prior changes to uniform distribution and when its removed. Provide the reasoning for these observations
  4. Section III:
    Linear Classifier . Report all your average error rate of training and test set for your Logistic Regression model. Show your visual result of the models.
  5. Extra Credit:
    If you have done any work which you think should get extra credit, describe it here
  6. Statement of Contribution:
    Specify which team-member performed which task. You are encouraged to make this a many-to-many mapping, if applicable. e.g., You can say that “Rahul and Jason both implemented the BFS function, their results were compared for debugging and Rahul’s code was submitted. Jason and Mark both implemented the DFS function, Mark’s code never ran successfully, so Jason’s code was submitted. Section I of the report was written by all 3 team members. Section II by Mark and Jason, Section III by Rahul and Jason.”… and so on.

Only attach files that are the required deliverables in blackboard. Your report must be a formatted pdf document. Pictures and example outputs should be incorporated into the document. Exception: items which are very large or unsuitable for inclusion in a pdf document (e.g. videos or animated gifs) may be put on the web and a URL included in your report. You can write your report either in English or Chinese.

Extra credit:

We reserve the right to give bonus points for any advanced exploration or especially challenging or creative solutions that you implement. This includes, but is not restricted to, the extra credit suggestion given above.

  • MP3-zsxsvs.zip