EN.601.682 Homework 2 Solved

30.00 $

Category: Tags: ,
Click Category Button to View Your Next Assignment | Homework

You'll get a download link with a: zip solution files instantly, after Payment

Securely Powered by: Secure Checkout

Description

5/5 - (2 votes)

1. The goal of this problem is to minimize a function given a certain input using gradient descent by breaking down the overall function into smaller components via a computation graph. The function is defined as:

f(x1,x2,w1,w2)= 1 +0.5(w12 +w2). 1 + e−(w1x1+w2x2)

(a) Please calculate ∂f , ∂f , ∂f , ∂f .

Solution:

∂w1 ∂w2

∂x1 ∂x2

∂f x1 · e−(w1x1+w2x2)
∂w = (1 + e−(w1x1+w2x2))2 + w1

1
∂f x2 · e−(w1x1+w2x2)

∂w = (1 + e−(w1x1+w2x2))2 + w2 2

∂f = ∂x1

∂f = ∂x2

w1 · e−(w1x1+w2x2) (1 + e−(w1x1+w2x2))2 w2 · e−(w1x1+w2x2) (1 + e−(w1x1+w2x2))2

(b) Start with the following initialization: w1 = 0.3,w2 = −0.5,x1 = 0.2,x2 = 0.4, draw the computation graph. Please use backpropagation as we did in class.
You can draw the graph on paper and insert a photo into your report.
The goal is for you to practice working with computation graphs. As a consequence, you must include the intermediate values during the forward and backward pass. Solution:

The computation graph is shown as below. All number above the lines are values in forward pass. All numbers below the lines are values in backward pass.

1

(c) Implement the above computation graph in the complimentary Colab Notebook using numpy. Use the values of (b) to initialize the weights and fix the input. Use a constant step size of 0.01. Plot the weight value w1 and w2 for 30 iterations in a single figure in the report.

Solution:

2. The goal of this problem is to understand the classification ability of a neural network. Specifically, we consider the XOR problem. Go to the link in footnote1 and answer the following questions. Hint: hit reset the network right next to the run button after you change the architecture.

(a) Can a linear classifier, without any hidden layers, solve the XOR problem?
Solution: No. Since there’s only one layer,it is only capable of distinguish all data with a line. It is apparently not possible to divide the data in XOR problem with a line.

1 https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=xor&regDataset=
reg- plane&learningRate=0.01&regularizationRate=0&noise=0&networkShape=&seed=0.10699&showTestData= false&discretize=true&percTrainData=80&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false& cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero= false&hideText=false

2

(b) With one hidden layer and ReLU(x) = max(0,x), how many neurons in the hidden layer do you need to solve the XOR problem? Describe the training loss and estimated

prediction accuracy when using 2, 3 and 4 neurons. certain number of neurons is necessary to solve XOR. Solution:
When using 2 neurons, the training loss is 0.268, the

78 = 0.78. The picture is shown as below. 100

Discuss the intuition of why a

estimated prediction accuracy is

When using 3 neurons, the training loss is 0.260, the 73 = 0.73. The picture is shown as below.

100

estimated prediction accuracy is

When using 4 neurons, the training loss is 0.002, the

100 = 1.00. The picture is shown as below. 100

estimated prediction accuracy is

3

I think that there are 2 status for x1 and 2 status for x2. Since a layer of neurons can only perform 1 manipulation, we need 2 = 4 neurons to represent the 4 conditions when x1 xor x2. Therefore, we can use the four neurons in the hidden layers to make to right prediction.

3. In this problem, we want to build a neural network from scratch using Numpy for a real- world problem. We consider the MNIST dataset (http://yann.lecun.com/exdb/mnist/), a hand-written digit classification dataset. Please follow the formula in the complimentary Colab Notebook. Hint: Make sure you pass the loss and gradient check in the notebook.

(a) Implement the loss and gradient of a linear classifier (python function linear classifier forward and backward).

  1. (b)  Implement the loss and gradient of a multilayer perceptron with one hidden layer and ReLU(x) = max(0, x) (python function mlp single hidden forward and backward).
  2. (c)  Implement the loss and gradient of a multilayer perceptron with two hidden layer, skip connection and ReLU(x) = max(0, x) (python function mlp two hidden forward and backward).
  3. (d)  Plot the development accuracy of each epoch of three models in a single figure using the following hyperparameters: the batch size is 50, the learning rate is 0.005 and the number of epochs is 20.
    Solution:

4

(e) Try using other hyperparameters and select a set of best hyperparameters using de- velopment accuracy. Once you pick the best model and hyperparameters, include the development accuracy of each epoch into the above figure (make a new figure) and report the test accuracy of the selected model and hyperparameters.

Solution: The best parameter I currently find is BS = 100, LR = 0.01, NB EPOCH =
20. The development accuracy is 97.30%, higher than the original MLP with 2 hidden layers dev loss, which is 97.29%.
The picture is shown as below:

The test accuracy is 97.18%

5

  • Deep-Learning-hw2-zx58w5.zip