Description

5/5 - (1 vote)

Visualizing Gradients

On the left is a 3D plot of f (x, y) — @ — + (y — 3) ². On the right is a plot of its gradient field. Note that the arrows show the relative magnitudes of the gradient vector.

From the visualization, what do you think is the minimal value of this function and where does it occur?
Calculate the gradient V f =
When Vf 6, what are the values of and y?

Gradient Descent Algorithm

Given the following loss function and = [Xi i=l, y = and O^t, explicitly write out the update equation for 9^t+ ^lin terms of :ti, Yi, 9^t, and a, where a is the constant learning rate.

L(O, i, j) = — — log(Yi))

Convexity

Convexity allows optimization problems to be solved more efficiently and for global optimums to be realized. Mainly, it gives us a nice way to minimize loss (i.e. gradient descent). There are three ways to informally define convexity.
Walking in a straight line between points on the function keeps you at or above the function. This works for any function.
The tangent line at any point lies at or below the function, globally. To use this definition, the function must be differentiable.
The second derivative is non-negative everywhere (in other words, the function is “concave up” everywhere). To use this definition, the function must be twice differentiable.

Is the function described in Question 1 convex? Make an argument visually.

GPA Descent

Consider the following non-linear model with two parameters:

f9(x) = 90 • 0.5 +00 • 01 • + sin(91) • .T2

For some nonsensical reason, we decide to use the residuals of our model as the loss function.

That is, the loss for a single observation is

We want to use gradient descent to determine the optimal model parameters, 90 and 91.

Suppose we have just one observation in our training data, (Xl = 1, .T2

Assume that we set the learning rate a to 1. An incomplete version of the gradient descent update equation for O is shown below. 9!) and 91(^t) denote the guesses for 90 and 91 at timestep t, respectively.

9(t+1)

Express both A and B in terms of Of), , and any necessary constants.

Assume we initialize both 08⁰) and 91(⁰to O. Determine 98¹) and Of) (i.e. the guesses for 90 and 91 after one iteration of gradient descent).
What happens to 90 as t + 00 (i.e. as we run more and more iterations of gradient descent)?

H4-1bbyh5.zip

ECE4710J Homework 4-Visualizing Gradients Solved

If Helpful Share:

Description

Visualizing Gradients

Gradient Descent Algorithm

Convexity

GPA Descent

Related products

ECE4710J Homework 3- Geometry of Least Squares Solved

ECE4710J Homework 2 Solved

ECE4710J Homework 1 Solved