COMP9417 Homework 1-Linear Regression & Friends Solved

35.00 $

Category:
Click Category Button to View Your Next Assignment | Homework

You'll get a download link with a: zip solution files instantly, after Payment

Securely Powered by: Secure Checkout

Description

5/5 - (1 vote)

Linear Regression & Friends

Introduction In this homework, we will explore Linear Regression and its regularized counterparts, LASSO and Ridge Regression, in more depth.

Points Allocation There are a total of 25 marks. The available marks are:

  • Question 1 a): 4 marks
  • Question 1 b): 2 marks
  • Question 2 a): 1 mark
  • Question 2 b): 1 mark
  • Question 2 c): 3 marks
  • Question 2 d): 3 marks
  • Question 2 e): 2 marks
  • Question 2 f): 2 marks
  • Question 2 g): 1 mark
  • Question 3 a): 2 marks
  • Question 3 b): 2 marks
  • Question 3 c): 2 marks

    What to Submit

  • A single PDF file which contains solutions to each question. For each question, provide your solution in the form of text and requested plots. For some questions you will be requested to provide screen shots of code used to generate your answer — only include these when they are explicitly asked for.
  • .py file(s) containing all code you used for the project, which should be provided in a separate .zip file. This code must match the code provided in the report.
  • You may be deducted points for not following these instructions.
  • You may be deducted points for poorly presented/formatted work. Please be neat and make your

    solutions clear. Start each question on a new page if necessary. 1

  • You cannot submit a Jupyter notebook; this will receive a mark of zero. This does not stop you from developing your code in a notebook and then copying it into a .py file though, or using a tool such as nbconvert or similar.
  • We will set up a Moodle forum for questions on this homework. Please read the existing questions before posting new questions. Please do some basic research online before posting questions. Please only post clarification questions. Any questions deemed to be fishing for answers will be ignored and/or deleted.
  • Please check the Moodle forum for updates to this spec. It is your responsibility to check for an- nouncements about the spec.
  • Please complete your homework on your own, do not discuss your solution with other people in the course. General discussion of the problems is fine, but you must write out your own solution and acknowledge if you discussed any of the problems in your submission (including their name and zID).
  • As usual, we monitor all online forums such as Chegg, StackExchange, etc. Posting homework ques- tions on these site is equivalent to plagiarism and will result in a case of academic misconduct.

    When and Where to Submit

  • Due date: Week 3, Sunday June 20th, 2021 by 11:55pm.
  • Late submissions will incur a penalty of 20% per day (from the ceiling, i.e., total marks available for the homework) for the first 5 days. For example, if you submit 2 days late, the maximum possible mark is 60% of the available 25 marks.
  • Submission must be done through Moodle, no exceptions.

Page 2

Question 1. Simple Linear Regression

  1. (a)  Consider a data set consisting of X values (features) X1 , . . . , Xn and Y values (responses) Y1 , . . . , Yn . Let βˆ0,βˆ1,σˆ be the output of running ordinary least squares (OLS) regression on the data. Now define the transformation:

    X􏰇 i = c ( X i + d ) ,

    􏰇􏰇
    for each i = 1,…,n, where c ̸= 0 and d are arbitrary real constants. Let β0,β1,σ􏰇 be the output of

    OLS on the data X􏰇1,…,X􏰇n and Y1,…,Yn. Write equations for β􏰇0,β􏰇1,σ􏰇 in terms of βˆ0,βˆ1,σˆ (and in terms of c, d), and be sure to justify your answers. Note that the estimate of error in OLS is taken to be:

    􏰒T
    σˆ = eˆ eˆ ,

    n−p whereeˆisthevectorofresiduals,i.e.withi-theelementeˆi =Yi−Yˆi,whereYˆi isthei-thprediction

    made by the model, and p is the number of features (so in this case p = 2).

  2. (b)  Suppose you have a dataset where X takes only two values while Y can take arbitrary real values. To consider a concrete example, consider a clinical trial where Xi = 1 indicates that the i-th patient receives a dose of a particular drug (the treatment), and Xi = 0 indicates that they did not, and Yi is the real-valued outcome for the i-th patient, e.g. blood pressure. Let Y T and Y P indicate the sample mean outcomes for the treatment group and non-treatment (placebo) group, respectively. What will be the value of the OLS coefficients βˆ0 , βˆ1 in terms of the group means?

What to submit: For both parts of the question, present your solution neatly – photos of handwritten work or using a tablet to write the answers is fine. Please include all working and circle your final answers.

Question 2. LASSO vs. Ridge Regression

In this problem we will consider the dataset provided in data.csv, with response variable Y , and features X1, . . . , X8.

(a) Use a pairs plot to study the correlations between the features. In 3-4 sentences, describe what you see and how this might affect a linear regression model. What to submit: a single plot, some commentary.

(b) In order for LASSO and Ridge to be run properly, we often rescale the features in the dataset. First, rescale each feature so that it has zero mean, and then rescale it so that 􏰑n X2 = n where n

i=1 ij
denotes the total number of observations. What to submit: print out the sum of squared observations of

each of the 8 (transformed) features, i.e. 􏰑 X2 for j = 1,…,8 i ij

(c) Now we will apply ridge regression to this dataset, recall that ridge regression is defined as the solution to the optimisation:

ˆ 􏰏1 2 2􏰐 β=argmin 2∥Y−Xβ∥2+λ∥β∥2 .

Run ridge regression with λ = {0.01, 0.1, 0.5, 1, 1.5, 2, 5, 10, 20, 30, 50, 100, 200, 300}. Create a plot with x-axis representing log(λ), and y-axis representing the value of the coefficient for each feature in each of the fitted ridge models. In other words, the plot should describe what happens to each of the coefficients in your model for the different choices of λ. For this problem you are permitted

Page 3

to use the sklearn implementation of Ridge regressiom to run the models and extract the coef- ficients, and base matplotlib/numpy to create the plots but no other packages are to be used to generate this plot. In a few lines, comment on what you see, in particular what do you observe for features 3, 4, 5?

What to submit: a single plot, some commentary, a screen shot of the code used for this section. Your plot must have a legend, and you must use the following colors: [’red’, ’brown’, ’green’, ’blue’, ’orange’, ’pink’, ’purple’, ’grey’] for the features X1,..,X8 in the plot.

  1. (d)  In this part, we will use Leave-One-Out Cross Validation (LOOCV) to find a good value of λ for the ridge problem. Create a fine grid of λ values running from 0 to 50 in increments of 0.1, so the grid would be: 0,0.1,0.2,…,50. For each data point i = 1,…,n, run ridge with each λ value on the dataset with point i removed, find βˆ, then get the leave-one-out error for predicting Yi. Average the squared error over all n choics of i. Plot the leave-one-out error against λ and find the best λ value. Compare your results to standard Ordinary Least Squares (OLS), does the ridge seem to give better prediction error based on your analysis? Note that for this question you are not permitted to use any existing packages that implement cross validation, you must write the code yourself from scratch. You must create the plot yourself from scratch using basic matplotlib functionality.

    What to submit: a single plot, some commentary, a screen shot of any code used for this section.

  2. (e)  Recall the LASSO problem:

    ˆ 􏰏1 2 􏰐 β=argmin 2∥Y−Xβ∥2+λ∥β∥1 .

    Repeat part (c) for the LASSO. What to submit: a single plot, some commentary, a screen shot of the code used for this section. You must use the same color scheme as in part (c).

  3. (f)  Repeattheleave-one-outanalysisofpart(d)fortheLASSOandforagridofλvalues0,0.1,…,20. Note that sklearn will throw some warnings for the λ = 0 case which can be safely ignored for our purposes. What to submit: a single plot, some commentary, a screen shot of the code used for this section.
  4. (g)  Briefly comment on the differences you observed between the LASSO and Ridge. Which model do you prefer and why? Provide reasonable justification here for full marks. What to submit: some commentary and potentially plots if your discussion requires it.

Question 3. Sparse Solutions with LASSO

In this question, we will try to understand why LASSO regression yields sparse solutions. Sparse means that the solution of the LASSO optimisation problem:

ˆ 􏰏1 2 􏰐 β=argmin 2∥Y−Xβ∥2+λ∥β∥1

has most of its entries βˆj = 0, which you may have observed empirically in the previous question. To study this from a theoretical perspective, we will consider a somewhat extreme case in which we take the penalty term λ to be very large, and show that the optimal LASSO solution is βˆ = 0p, the vector of all zeroes. Assume that X ∈ Rn×p, Y ∈ Rn and the optimisation is over β ∈ Rp.

(a) Considerthequantity|⟨Y,Xβ⟩|.Showthat|⟨Y,Xβ⟩|≤maxj|XjTY|􏰑j|βj|,whereXj denotesthe j-th column of X.

Page 4

  1. (b)  We will now assume that λ is very large, such that it satisfies: λ ≥ max |XT Y |.

    j
    Using the result of part (a), and the assumption on λ, prove that βˆ = 0p is a solution of the LASSO

    problem.

  2. (c)  In the previous part, we showed that βˆ = 0p is a minimizer of l(β). Prove that βˆ is the unique minimizer of l(β), i.e. if β ̸= 0p, then l(β) > l(0p). hint: consider the two cases: ∥Xβ∥2 = 0 and ∥Xβ∥2 > 0
    What to submit: For all parts of the question, present your solution neatly – photos of handwritten work or using a tablet to write the answers is fine. Please include all working and circle your final answers. Note that if you cannot do part (a), you are still free to use the result of part (a) to complete parts (b) and (c).

Page 5

  • Ass1-qdn1al.zip