## Description

# 1 Objective Question

Select the true statements from below:

- The computational cost of batch gradient descent is higher than that ofstochastic gradient descent as batch gradient descent requires summing over all examples.
- The computational cost of stochastic gradient descent is higher than thatof mini-batch gradient descent as the mini-batch gradient descent requires summing over a few examples.
- In cases where multiple local minima with respect to the objective function
*F*(*θ*) are present, stochastic gradient descent can sometimes avoid falling into these local minima because it uses various ∇(_{d}F*θ*) rather than ∇*F*(*θ*).

(Here ∇* _{d}F*(

*θ*) are the gradient defined for each individual training example

*d*.)

# 2 Subjective Question

Prove the convergence of the batch gradient descent algorithm for a fixed learning rate for a convex objective function. What is the effect of the learning rate on the convergence guarantee.

# 3 Programming Question

Implement Non-negative Matrix Factorization (NMF) using gradient descent to solve for the task of topic modeling. You have been provided with a starter code in ‘Matrix Factorization using Gradient Descent Excercise.ipynb’. The notebook briefly discusses on how to apply NMF for the task of topic modeling and provides an objective function that needs to be optimized using the gradient descent algorithm.

1