CSC411 Assignment 7-Representer Theorem Solved

35.00 $

Category:
Click Category Button to View Your Next Assignment | Homework

You'll get a download link with a: zip solution files instantly, after Payment

Securely Powered by: Secure Checkout

Description

Rate this product
  1. Representer Theorem. In this question, you’ll prove and apply a simplified version of the Representer Theorem, which is the basis for a lot of kernelized algorithms. Consider a linear model:

z = w>ψ(x) y = g(z),

where ψ is a feature map and g is some function (e.g. identity, logistic, etc.). We are given a training set . We are interested in minimizing the expected loss plus an L2 regularization term:

where L is some loss function. Let Ψ denote the feature matrix

Observe that this formulation captures a lot of the models we’ve covered in this course, including linear regression, logistic regression, and SVMs.

  •  Show that the optimal weights must lie in the row space of Ψ.

Hint: Given a subspace S, a vector v can be decomposed as v = vS +v⊥, where vS is the projection of v onto S, and v⊥ is orthogonal to S. (You may assume this fact without proof, but you can review it here[1].) Apply this decomposition to w and see if you can show something about one of the two components.

  • [3pts] Another way of stating the result from part (a) is that w = Ψ>α for some vector α. Hence, instead of solving for w, we can solve for α. Consider the vectorized form of the L2 regularized linear regression cost function:

Ψw  .

Substitute in w = Ψ>α, to write the cost function as a function of α. Determine the optimal value of α. Your answer should be an expression involving λ, t, and the Gram matrix K = ΨΨ>. For simplicity, you may assume that K is positive definite. (The algorithm still works if K is merely PSD, it’s just a bit more work to derive.)

Hint: the cost function J(α) is a quadratic function. Simplify the formula into the following form:

for some positive definite matrix A, vector b and constant c (which can be ignored). You may assume without proof that the minimum of such a quadratic function is given by α = −A−1b.

  1. ] Compositional Kernels. One of the most useful facts about kernels is that they can be composed using addition and multiplication. I.e., the sum of two kernels is a kernel, and the product of two kernels is a kernel. We’ll show this in the case of kernels which represent dot products between finite feature vectors.
    •  Suppose k1(x,x0) = ψ1(x)>ψ1(x0) and k2(x,x0) = ψ2(x)>ψ2(x0). Let kS be the sum kernel kS(x,x0) = k1(x,x0)+k2(x,x0). Find a feature map ψS such that kS(x,x0) = ψS(x)>ψS(x0).
    •  Suppose k1(x,x0) = ψ1(x)>ψ1(x0) and k2(x,x0) = ψ2(x)>ψ2(x0). Let kP be the product kernel kP(x,x0) = k1(x,x0)k2(x,x0). Find a feature map ψP such that kP(x,x0) = ψP(x)>ψP(x0).

Hint: For inspiration, consider the quadratic kernel from Lecture 20, Slide 11.

2

[1] https://metacademy.org/graphs/concepts/projection_onto_a_subspace

  • A7-amb52l.zip