Name: COMP6211E Assignment 4 Solved
SKU: 103638
Price: 30.00 USD
Availability: InStock

Description

Rate this product

Theoretical Problems

(4 points) Write down the ADMM algorithms for the following problems
- (2 points) Let Σ be aˆ d×d symmetric positive semidefinite matrix. We want to solve the following problem with d × d symmetric positive definite matrices X and Z:

) + trace(Σ^ˆX) + λkZk₁ⁱ, X − Z = 0.

Write down the ADMM algorithm, and derive the closed form solutions for the sub-optimization problems. (Input: ρ, Σ,^ˆX₀, T; Output: X_T)

(2 points) Let x ∈ R^d, b ∈ R^d, and Σ is a d × d symmetric positive definite matrix, A is m × d Assume that we want to solve the problem

subject to kAxk_∞≤ 1

using ADMM by rewriting it as follows:

Write down the ADMM algorithm for this decomposition with closed form solutions for the sub problems. (Input: ρ, Σ, A, b, x₀, T, m; Output: x_T)

(4 points) Write down the SGD algorithms for the following problems. Consider the k-class linear structured SVM problem, which has the following loss function:

with the constraints

(2 points) Write down the SGD procedure with batchsize 1 (single step update rule).
(2 points) Assume that sup_ykψ(x_i,y)k₂≤ A and you have a budget of T total gradient evaluations. How do you set learning rate in your SGD procedure, and what is the convergence rate you expect based on the lecture?

(6 points) Consider the L₁–L₂regularized loss minimization problem:

where x_i,w ∈ R^d, kx_ik₂≤ 1, y_i∈ {±1},λ < 1, and (u)₊= max(0,u).

(2 point) Estimate a simple upper bound of smoothness parameter and a simple lower bound of strong convexity parameter. Estimate a simple bound for the SGD variance V .
(2 points) Write down the minibatch Accelerated Proximal SGD update rule with batch size m. (Input: w₀, λ, {η_t,θ_t}, T, m; Output: w_T; Assume λ > 0) For T^˜= mT total gradient computations, what is the largest batch size you can choose? How do you want to set constant learning rate and momentum parameters for this batch size, and what’s the convergence rate? You may use O(·) notation to hide constants.
(2 points) Assume that λ = 0, write down minibatch stochastic RDA update rule for this problem with minibatch size m for T˜ = mT total gradients, by setting constant θ and η₀, and calculate the corresponding setting for η_t. (Input: w₀, η₀> 0, θ ∈ (0,1), T, m; Output: w_T)

What is the largest batch size m, and what’s the corresponding θ and η₀and what is the convergence rate in terms of T˜? You may use O˜(·) which is up to a lnT˜ factor.

By comparing the solution of the proximal operator, can you explain why dual averaging can achieve more sparsity when the weights are near zero?

Programming Problem (6 points)

Using the mnist class 1 (positive) versus 7 (negative) data.
Use the python template “progtemplate.py”, and implement functions marked with ’# implement’.
(4 pts) Implement RDA-ACCL and ADMM-ACCL-linear and compare the RDA-ACCL algorithm with different θ Submit your code and outputs.
(2 pts) Compare your plots to the theoretical convergence rates in class, and discuss your experimental results.

assignment4-0mtn7q.zip

COMP6211E Assignment 4 Solved

If Helpful Share:

Description

Related products

SOLVED:201 Assignment 3

SOLVED: CSCI203 ASSIGNMENT 1

SOLVED:Lab Assignment #8 Activities Solution