Name: ISYE 6740 Homework 5 Solved
SKU: 70513
Price: 35.00 USD
Availability: InStock

Description

5/5 - (6 votes)

where α_i≥ 0 are the dual variables. What does this imply in terms of how to relate data to w?

Explain why only the data points on the “margin” will contribute to the sum above,i.e., playing a role in defining w.

Suppose we only have four training examples in two dimensions as shown in Fig. Thepositive samples at x₁= (0,0), x₂= (2,2) and negative samples at x₃= (h,1) and x₄= (0,3).

For what range of parameter h > 0, the training points are still linearly separable? ii. (10 points) Does the orientation of the maximum margin decision boundary change as h changes, when the points are separable?

Multi-class classification for MNIST data set, comparison.

This question is to compare different classifiers and their performance for multi-class classifications on the complete MNIST dataset at http://yann.lecun.com/exdb/mnist/. You can find the data

file mnist 10digits.mat in the homework folder. The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. We will compare KNN, logistic regression, SVM, kernel SVM, and neural networks. We suggest to use Scikit-learn, which is a commonly-used and powerful Python library with various machine learning tools. But you can also use other similar libraries in other programming languages of your choice to perform the tasks. Below are some tips.

We suggest you to “standardize” the features before training the classifiers, by dividing the values of the features by 255 (thus map the range of the features from [0, 255] to [0, 1]).
You may adjust the number of neighbors K used in KNN to have a reasonable result (you may use cross validation but it is not required; any reasonable tuning to get good result is acceptable).
You may use a neural networks function neural network with hidden layer sizes = (20, 10). • For kernel SVM, you may use radial basis function kernel, and a heuristic called “median trick”: choose the parameter of the kernel K(x,x⁰) = exp{−kx − x⁰k²/(2σ²)}. Choose the bandwidth

as σ = ^pM/2 where M = the median of {kxⁱ− x^jk²,1 ≤ i,j ≤ m⁰,i 6= j} for pairs of training samples. Here you can randomly choose m⁰= 1000 samples from training data to use for the “median trick”^[1].

For KNN and SVM, you can randomly downsample the training data to size m = 5000, to improve computation efficiency.

Train the classifiers on training dataset and evaluate on the test dataset.

Report confusion matrix, precision, recall, and F-1 score for each of the classifiers. Forprecision, recall, and F-1 score of each classifier, we will need to report these for each of the digits. So you can create a table for this. For this question, each of the 5 classifier, KNN, logistic regression, SVM, kernel SVM, and neural networks,
Comment on the performance of the classifier and give your explanation why some ofthem perform better than the otherNeural networks.
- Consider a neural networks for a binary classification using sigmoid function for eachunit. If the network has no hidden layer, explain why the model is equivalent to logistic regression.
- Consider a simple two-layer network in the lecture slides. Given m training data (xⁱ,yⁱ), i = 1,…,m, the cost function used to training the neural networks

where σ(x) = 1/(1 + e^−x) is the sigmoid function, zⁱis a two-dimensional vector such that ), and). Show the that the gradient is given by

where uⁱ= w^Tzⁱ. Also find the gradient of `(w,α,β) with respect to α and β and write down their expression.

homework5-i1uxhb.zip

ISYE 6740 Homework 5 Solved

If Helpful Share:

Description

Related products

ISYE6740 Homework 3 Solved

ISYE6740 Homework 1 Solved

ISYE6740 Homework 2-Eigenfaces and simple face recognition Solved