## Description

# 1Â Â Â Â Â Â Â Â Â Â Linear Regression with Heterogenous Noise

In the standard linear regression model, we consider the model that the observed response variable *y *is the prediction perturbed by noise, namely

*y *= **x**^{> }+ *“*

where *” *is a Gaussian random variable with mean 0 and variance ^{2}. Notably, we are assuming that for all observations in the training data, the corresponding noises are identically and independently distributed. In other words, for the *n*-th observation **x*** _{n}*, the observed response is

where *“ _{n }*â‡ N(0

*,*

^{2}).

This assumption is not applicable in some cases. For example, in the example of predicting the sale prices of houses, the variances for larger houses (e.g., houses with larger **x*** _{n }*which is the square footage) tend to be bigger, as the sale prices for larger houses seem to be more variable. In this case, we can model the data in the following way:

where *“ _{n }*are independently distributed but

**do not have to be identically distributed**. In particular, each one could have a diâ†µerent variance, namely,

*“*â‡ N(0

_{n }*,*

_{n}^{2}).

- Suppose our training dataset contains {(
**x**)_{n},y_{n}*,n*= 1*,*2*,…,N*} such observations. Write down the log-likelihood function of the data. This function should be a function of the data as well as and all*n*. - Derive the maximum likelihood estimate of , and express it in terms of the data as well as all theÂ Â Â Â Â Â Â Â Â Â Â Â Â
._{n}

You should assumeÂ Â Â Â Â Â Â * _{n }*is known to you â€” you do not need to estimate them from the data.

# 2Â Â Â Â Â Â Â Â Â Â Linear Regression with Smooth CoeÂ Â Â Â Â Â Â Â Â cients

Consider a dataset with *n *data points (**x*** _{i},y_{i}*),

**x**

*2R*

_{i }

^{p}^{â‡¥1}, drawn from the following linear model:

*y *= **x**^{> }+ *“,*

where *” *is a Gaussian noise. Suppose the features *x _{i}*

_{1}

*,…,x*for all

_{ip }*i*= 1

*,…,n*have a natural ordering. Several examples have this ordering property; for example in the study of the impact of proteins on certain types of cancer, the proteins are ordered sequentially on a line. Intuitively, we can encode the natural ordering information by introducing a condition that requires the diâ†µerence (

_{i i}_{+1})

^{2 }cannot be large, for

*i*= 1

*,…,p*1.

1

- State the condition as a regularizer. Write the new optimization problem for finding by combining both this regularization and
*L*_{2 }(10 points) - Find the optimal by solving the problem in part (a). (5 points)

# 3Â Â Â Â Â Â Â Â Â Â Linearly Constrained Linear Regression

Consider a dataset with *n *data points (**x*** _{i},y_{i}*),

**x**

*2R*

_{i }

^{p}^{â‡¥1}, drawn from the following linear model:

*y *= **x**^{> }+ *“,*

where *” *is Gaussian noise. Suppose we have additional information about that requires *A *= **b **where *A *2 R^{q}^{â‡¥p }and **b **2 R^{q}^{â‡¥1}. Suppose the constraint *A *= **b **has a non-empty set of solutions; thus the optimization has feasible solutions. Find the maximum likelihood estimation of under this constraint.

# 4Â Â Â Â Â Â Â Â Â Â Online Learning

The perceptron algorithm often makes harsh updates, as it is strongly biased towards the current mistakenlylabeled sample. Suppose at the *i*th step, the classifier is **w*** _{i }*and we want to make a more conservative update based on observation of (

**x**

*) to a new classifier*

_{i},y_{i}**w**

_{i}_{+1}. Derive a new update method for the perceptron such that it makes the smallest diâ†µerence from the previous model, that is, it minimizes k

**w**

_{i}_{+1 }

**w**

*k2 while ensuring that*

_{i}**w**

_{i}_{+1 }classifies the current sample correctly. You need to provide the closed form analytical equation for the update rule.