ISYE6740-Homework 5 Solved

35.00 $

Description

5/5 - (2 votes)
  1. House price dataset.

The HOUSES dataset contains a collection of recent real estate listings in San Luis Obispo county and around it. The dataset is provided in RealEstate.csv. You may use “one-hot-keying” to expand the categorical variables.

The dataset contains the following useful fields (You may excluding the Location and MLS in your linear regression model).

You can use any package for this question.

  • Price: the most recent listing price of the house (in dollars).
  • Bedrooms: number of bedrooms.
  • Bathrooms: number of bathrooms.
  • Size: size of the house in square feet.
  • Price/SQ.ft: price of the house per square foot.
  • Status: Short Sale, Foreclosure and Regular.
  • Fit the Ridge regression model to predict Price from all variable. You can use one-hot keying to expand the categorical variable Status. Use 5-fold cross validation to select the regularizer optimal parameter, and show the CV curve. Report the fitted model (i.e., the parameters), and the sum-of-squares residuals. You can use any package. The suggested search range for the regularization parameter is from 80 to 150.
  • Use lasso to select variables. Use 5-fold cross validation to select the regularizer optimal parameter, and show the CV curve. Report the fitted model (i.e., the parameters selected and their coefficient). Show the Lasso solution path. You can use any package for this. The suggested search range for the regularization parameter is from 6000 to 8000.

Consider the following dataset, plotting in the following figure. The first two coordinates represent the value of two features, and the last coordinate is the binary label of the data.

X1 = (−1,0,+1),X2 = (−0.5,0.5,+1),X3 = (0,1,−1),X4 = (0.5,1,−1), X5 = (1,0,+1),X6 = (1,−1,+1),X7 = (0,−1,−1),X8 = (0,0,−1).

In this problem, you will run through T = 3 iterations of AdaBoost with decision stumps (as explained in the lecture) as weak learners.

  • For each iteration t = 1,2,3, compute by hand (i.e., show the calculation steps) and draw the decision stumps on the figure (you can draw this by hand).
  • What is the training error of this AdaBoost? Give a short explanation for why AdaBoost outperforms a single decision stump.

Table 1: Values of AdaBoost parameters at each timestep.

t t αt Zt Dt(1) Dt(2) Dt(3) Dt(4) Dt(5) Dt(6) Dt(7) Dt(8)
1

2

3

  1. Random forest and one-class SVM for email spam classifier

Your task for this question is to build a spam classifier using the UCR email spam dataset https: //archive.ics.uci.edu/ml/datasets/Spambase came from the postmaster and individuals who had filed spam. Please download the data from that website. The collection of non-spam emails came from filed work and personal emails, and hence the word ’george’ and the area code ’650’ are indicators of non-spam. These are useful when constructing a personalized spam filter. You are free to choose any package for this homework. Note: there may be some missing values. You can just fill in zero.

  • Build a CART model and visualize the fitted classification tree.
  • Now also build a random forest model. Randomly shuffle the data and partition to use 80% for training and the remaining 20% for testing. Compare and report the test error for your classification tree and random forest models on testing data. Plot the curve of test error (total misclassification error rate) versus the number of trees for the random forest, and plot the test error for the CART model (which should be a constant with respect to the number of trees).
  • Now we will use a one-class SVM approach for spam filtering. Randomly shuffle the data and partition to use 80% for training and the remaining 20% for testing.Extract all non-spam emails from the training block (80% of data you have selected) to build the one-class kernel SVM using RBF kernel (you can turn the kernel bandwidth to achieve good performance). Then apply it on the 20% of data reserved for testing (thus this is a novelty detection situation), and report the total misclassification error rate on these testing data.
  1. Locally weighted linear regression and bias-variance tradeoff. Consider a dataset with n data points (xi,yi), xi Rp, following the following linear model

where) are independent (but not identically distributed) Gaussian noise with zero mean and variance σi2.

  • Show that the ridge regression which introduces a squared `2 norm penalty on the parameter in the maximum likelihood estimate of β can be written as follows

βˆ(λ) = argmin

for property defined diagonal matrix W, matrix X and vector y.

  • Find the close-form solution for) and its distribution conditioning on {xi}.
  • Derive the bias as a function of λ and some fixed test point x.
  • Derive the variance term as a function of λ.
  • Now assuming the data are one-dimensional, the training dataset consists of two samples x1 = 1.5 and x2 = 1, and the test sample x = 0. The true parameter = 1, the noise variance is given by = 1. Plot the MSE (Bias square plus variance) as a function of the regularization parameter λ.
  1. Medical imaging reconstruction.

In this problem, you will consider an example resembles medical imaging reconstruction in MRI. We begin with a true image image of dimension 50 × 50 (i.e., there are 2500 pixels in total). Data is cs.mat; you can plot it first. This image is truly sparse, in the sense that 2084 of its pixels have a value of 0, while 416 pixels have a value of 1. You can think of this image as a toy version of an MRI image that we are interested in collecting.

Because of the nature of the machine that collects the MRI image, it takes a long time to measure each pixel value individually, but it’s faster to measure a linear combination of pixel values. We measure n = 1300 linear combinations, with the weights in the linear combination being random, in fact, independently distributed as N(0,1). Because the machine is not perfect, we don’t get to observe this directly, but we observe a noisy version. These measurements are given by the entries of the vector

y = Ax + n,

where y ∈ R1300, A ∈ R1300×2500, and n ∼ N(0,25 × I1300) where In denotes the identity matrix of size n × n. In this homework, you can generate the data y using this model.

Now the question is: can we model y as a linear combination of the columns of x to recover some coefficient vector that is close to the image? Roughly speaking, the answer is yes.

Key points here: although the number of measurements n = 1300 is smaller than the dimension p = 2500, the true image is sparse. Thus we can recover the sparse image using few measurements exploiting its structure. This is the idea behind the field of compressed sensing.

The image recovery can be done using lasso

.

  • Now use lasso to recover the image and select λ using 10-fold cross-validation. Plot the cross-validation error curves, and show the recovered image.
  • To compare, also use ridge regression to recover the image:

.

Select λ using 10-fold cross-validation. Plot the cross-validation error curves, and show the recovered image. Which approaches give a better recovered image?

  • HW5-4du3wb.zip