COMP6211E Optimization for Machine Learning (Homework #1) Solved

30.00 $

Description

5/5 - (1 vote)

Read Chapter 2.5 (Convex Duality)

Theoretical Problems (10 points)

1. (1 points) Let f (x) = (cid:107)x(cid:107)1 + (cid:107)x(cid:107)4

2/4, where x ∈ Rd. Find its conjugate f ∗(x).

2. (2 points) Let x ∈ R and y ∈ R+. Is f (x, y) = x2/y a convex function? Prove your claim.

3. (2 points) Consider the convex set C = {x ∈ Rd : (cid:107)x(cid:107)∞ ≤ 1}. Given y ∈ Rd, compute the projection

projC(y).

4. (3 points) Compute ∂f (x) for the following functions of x ∈ Rd

• f (x) = (cid:107)x(cid:107)2 • f (x) = 1((cid:107)x(cid:107)∞ ≤ 1) • f (x) = (cid:107)x(cid:107)2 + (cid:107)x(cid:107)∞

5. (3 points) Consider the square root Lasso method. Given X ∈ Rn×d and y ∈ Rn, we want to find

w ∈ Rd to solve

[w∗, ξ∗] = arg min w,b,ξ

(cid:107)Xw − y(cid:107)2 + λ

ξj

 ,

d (cid:88)

j=1

subject to ξj ≥ wj,

ξj ≥ −wj

(j = 1, . . . , d).

(1)

(2)

Lasso produces sparse solutions. Define the support of the solution as

S = {j : w∗,j (cid:54)= 0}.

Write down the KKT conditions under the assumption that Xw∗ (cid:54)= y. Simplify in terms of S, XS, X ¯S, y, wS. Here XS contains the columns of X in S, X ¯S contains the columns of X not in S, and wS contains the nonzero components of w∗.

Programming Problem (4 points)

We consider ridge regression problem with randomly generated data. The goal is to implement gradient

descent and experiment with different strong-convexity settings and different learning rates.

• Use the python template “prog-template.py”, and implement functions marked with ’# implement’.

• Submit your code and outputs. Compare to the theoretical convergence rates in class, and discuss your

experimental results.

1

 

  • assignment1-ufibsq.zip