Name: CSC411 Assignment 7-Representer Theorem Solved
SKU: 55769
Price: 35.00 USD
Availability: InStock

Description

Rate this product

Representer Theorem. In this question, you’ll prove and apply a simplified version of the Representer Theorem, which is the basis for a lot of kernelized algorithms. Consider a linear model:

z = w^>ψ(x) y = g(z),

where ψ is a feature map and g is some function (e.g. identity, logistic, etc.). We are given a training set . We are interested in minimizing the expected loss plus an L₂regularization term:

where L is some loss function. Let Ψ denote the feature matrix

Observe that this formulation captures a lot of the models we’ve covered in this course, including linear regression, logistic regression, and SVMs.

Show that the optimal weights must lie in the row space of Ψ.

Hint: Given a subspace S, a vector v can be decomposed as v = v_S+v_⊥, where v_Sis the projection of v onto S, and v_⊥is orthogonal to S. (You may assume this fact without proof, but you can review it here^[1].) Apply this decomposition to w and see if you can show something about one of the two components.

[3pts] Another way of stating the result from part (a) is that w = Ψ^>α for some vector α. Hence, instead of solving for w, we can solve for α. Consider the vectorized form of the L₂regularized linear regression cost function:

Ψw .

Substitute in w = Ψ^>α, to write the cost function as a function of α. Determine the optimal value of α. Your answer should be an expression involving λ, t, and the Gram matrix K = ΨΨ^>. For simplicity, you may assume that K is positive definite. (The algorithm still works if K is merely PSD, it’s just a bit more work to derive.)

Hint: the cost function J(α) is a quadratic function. Simplify the formula into the following form:

for some positive definite matrix A, vector b and constant c (which can be ignored). You may assume without proof that the minimum of such a quadratic function is given by α = −A⁻¹b.

] Compositional Kernels. One of the most useful facts about kernels is that they can be composed using addition and multiplication. I.e., the sum of two kernels is a kernel, and the product of two kernels is a kernel. We’ll show this in the case of kernels which represent dot products between finite feature vectors.
- Suppose k₁(x,x⁰) = ψ₁(x)^>ψ₁(x⁰) and k₂(x,x⁰) = ψ₂(x)^>ψ₂(x⁰). Let k_Sbe the sum kernel k_S(x,x⁰) = k₁(x,x⁰)+k₂(x,x⁰). Find a feature map ψ_Ssuch that k_S(x,x⁰) = ψ_S(x)^>ψ_S(x⁰).
- Suppose k₁(x,x⁰) = ψ₁(x)^>ψ₁(x⁰) and k₂(x,x⁰) = ψ₂(x)^>ψ₂(x⁰). Let k_Pbe the product kernel k_P(x,x⁰) = k₁(x,x⁰)k₂(x,x⁰). Find a feature map ψ_Psuch that k_P(x,x⁰) = ψ_P(x)^>ψ_P(x⁰).

Hint: For inspiration, consider the quadratic kernel from Lecture 20, Slide 11.

[1] https://metacademy.org/graphs/concepts/projection_onto_a_subspace

A7-amb52l.zip

CSC411 Assignment 7-Representer Theorem Solved

If Helpful Share:

Description

Related products

CSC411 Assignment 4- AlexNet Solved

CSC411 Assignment 3- Robust Regression Solved

CSC411 Assignment 1-Nearest Neighbours and the Curse of Dimensionality Solved