ECE4710J Homework 3- Geometry of Least Squares Solved

30.00 $

Category:

Description

5/5 - (1 vote)

Geometry of Least Squares

  1. Suppose we have a dataset represented with the design matrix span(X) and response vector Y. We use linear regression to solve for this and obtain optimal weights as ˆ. Draw the geometric interpretation of the column space of the design matrix span(X), the response vector Y, the residuals Y Xˆ, and the predictions Xˆ(using optimal parameters) and X(using an arbitrary vector ).
  • What is always true about the residuals in least squares regression? Select all that apply.

⇤ A. They are orthogonal to the column space of the design matrix.

⇤ B. They represent the errors of the predictions.

⇤ C. Their sum is equal to the mean squared error.

⇤ D. Their sum is equal to zero. ⇤ E. None of the above.

1

  • Which are true about the predictions made by OLS? Select all that apply.

⇤ A. They are projections of the observations onto the column space of the design matrix.

⇤ B. They are linear combinations of the features.

⇤ C. They are orthogonal to the residuals.

⇤ D. They are orthogonal to the column space of the features.

⇤ E. None of the above.

  • We fit a simple linear regression to our data (xi,yi),i = 1,2,3, where xi is the independent variable and yi is the dependent variable. Our regression line is of the form yˆ = ˆ0 +ˆ1x. Suppose we plot the relationship between the residuals of the model and the ysˆ , and find that there is a curve. What does this tell us about our model?

⇤ A. The relationship between our dependent and independent variables is well represented by a line.

⇤ B. The accuracy of the regression line varies with the size of the dependent variable.

⇤ C. The variables need to be transformed, or additional independent variables are needed.

3

Understanding Dimensions

  1. In this exercise, we will examine many of the terms that we have been working with in regression (e.g. ˆ) and connect them to their dimensions and to concepts that they represent.

First, we define some notation. The np design matrixX hasX corresponds top+1 features, where the additionn observations on p features. (In lecture, we stated that we sometimes say

feature is a column of all 1s for the intercept term, but strictly speaking that column doesn’t need to exist. In this problem, one of the p columns may be a column of all 1s.) Y is the response variable. It is a vector, containing the true response for all observations. We assume in this problem that we use X and Y to compute optimal parameters ˆfor a linear model, and that this linear model generates predictions using Yˆ = Xˆ as we saw in lecture and in Question 1 of this discussion. Each of the n rows in our design matrix X contains all features for a single observation. Each of the p columns in our design matrix X contains a single feature, for all observations. We denote the rows and columns of X as follows:

X:,j   jth column vector in X,j = 1,…,p Xi,:                       ith row vector in X,i = 1,…,n

Below, on the left, we have several expressions, labelled a through h, and on the right we have several terms, labelled 1 through 10. For each expression, determine its shape (e.g., nat all. If a specific expression is nonsensical because the dimensions don’t line up for a matrix⇥ p), and match it to one of the given terms. Terms may be used more than once or not multiplication, write “N/A” for both.

(a)    X

(b)    ˆ

(c)     X:,j

(d)    X1,: · ˆ

(e)    X:,1 · ˆ

(f)      Xˆ

(g)    (XT X) 1XT Y

1.    the residuals

2.    0

3.    1st response, y1

4.    1st predicted value, yˆ1

5.    1st residual, e1

6.    the estimated coefficients

7.    the predicted values

(h) (I X(XT X) 1XT )Y
  1. the features for a single observation
  2. the value of a specific feature for all observations
  3. the design matrix

As an example, for 2a, you would write: “2a. Dimension: n p, Term: 10”.

  • H3-bipjsl.zip