Description
Read Chapter 2.5 (Convex Duality)
Theoretical Problems (10 points)
1. (1 points) Let f (x) = (cid:107)x(cid:107)1 + (cid:107)x(cid:107)4
2/4, where x ∈ Rd. Find its conjugate f ∗(x).
2. (2 points) Let x ∈ R and y ∈ R+. Is f (x, y) = x2/y a convex function? Prove your claim.
3. (2 points) Consider the convex set C = {x ∈ Rd : (cid:107)x(cid:107)∞ ≤ 1}. Given y ∈ Rd, compute the projection
projC(y).
4. (3 points) Compute ∂f (x) for the following functions of x ∈ Rd
• f (x) = (cid:107)x(cid:107)2 • f (x) = 1((cid:107)x(cid:107)∞ ≤ 1) • f (x) = (cid:107)x(cid:107)2 + (cid:107)x(cid:107)∞
5. (3 points) Consider the square root Lasso method. Given X ∈ Rn×d and y ∈ Rn, we want to find
w ∈ Rd to solve
[w∗, ξ∗] = arg min w,b,ξ
(cid:107)Xw − y(cid:107)2 + λ
ξj
,
d (cid:88)
j=1
subject to ξj ≥ wj,
ξj ≥ −wj
(j = 1, . . . , d).
(1)
(2)
Lasso produces sparse solutions. Define the support of the solution as
S = {j : w∗,j (cid:54)= 0}.
Write down the KKT conditions under the assumption that Xw∗ (cid:54)= y. Simplify in terms of S, XS, X ¯S, y, wS. Here XS contains the columns of X in S, X ¯S contains the columns of X not in S, and wS contains the nonzero components of w∗.
Programming Problem (4 points)
We consider ridge regression problem with randomly generated data. The goal is to implement gradient
descent and experiment with different strong-convexity settings and different learning rates.
• Use the python template “prog-template.py”, and implement functions marked with ’# implement’.
• Submit your code and outputs. Compare to the theoretical convergence rates in class, and discuss your
experimental results.
1