CSCI567 Homework3-Neural Networks  Solved

30.00 $

Category:

Description

Rate this product

Neural Networks

For this assignment, you are asked to implement neural networks. You will use this neural network to classify MNIST database of handwritten digits (0-9). The architecture of the neural network you will implement is based on the multi-layer perceptron (MLP, just another term for fully connected feedforward networks we discussed in the lecture), which is shown as following. It is designed for a K-class classification problem.

Let (xRD,y{1,2,,K})(x∈RD,y∈{1,2,⋯,K}) be a labeled instance, such an MLP performs the following computations:

input features:linear(1):tanh:relu:linear(2):softmax:predicted label:xRDu=W(1)x+b(1),W(1)RM×D and b(1)RMh=21+e2u1h=max{0,u}=⎡⎣⎢⎢max{0,u1}max{0,uM}⎤⎦⎥⎥a=W(2)h+b(2),W(2)RK×M and b(2)RKz=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢ea1keakeaKkeak⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥y^=argmaxkzk.input features:x∈RDlinear(1):u=W(1)x+b(1),W(1)∈RM×D and b(1)∈RMtanh:h=21+e−2u−1relu:h=max{0,u}=[max{0,u1}⋮max{0,uM}]linear(2):a=W(2)h+b(2),W(2)∈RK×M and b(2)∈RKsoftmax:z=[ea1∑keak⋮eaK∑keak]predicted label:y^=argmaxkzk.

For a KK-class classification problem, one popular loss function for training (i.e., to learn W(1)W(1)W(2)W(2)b(1)b(1)b(2)b(2)) is the cross-entropy loss. Specifically we denote the cross-entropy loss with respect to the training example (x,y)(x,y) by ll:

l=log(zy)=log(1+kyeakay)l=−log⁡(zy)=log⁡(1+∑k≠yeak−ay)

Note that one should look at ll as a function of the parameters of the network, that is, W(1),b(1),W(2)W(1),b(1),W(2) and b(2)b(2). For ease of notation, let us define the one-hot (i.e., 1-of-KK) encoding of a class yy as

yRK and yk={1, if y=k,0, otherwise.y∈RK and yk={1, if y=k,0, otherwise.

so that

l=kyklogzk=yT⎡⎣⎢⎢logz1logzK⎤⎦⎥⎥=yTlogz.l=−∑kyklog⁡zk=−yT[log⁡z1⋮log⁡zK]=−yTlog⁡z.

We can then perform error-backpropagation, a way to compute partial derivatives (or gradients) w.r.t the parameters of a neural network, and use gradient-based optimization to learn the parameters.

Submission: You need to submit both neural_networks.py and utils.py.

Q1. Mini batch Stochastic Gradient Descent

First, you need to implement mini-batch stochastic gradient descent which is a gradient-based optimization to learn the parameters of the neural network.
You need to realize two alternatives for SGD, one without momentum and one with momentum. We will pass a variable αα to indicate which option. When α0α≤0, the parameters are updated just by gradient. When α>0α>0, the parameters are updated with momentum and αα will also represents the discount factor as following:

υ=αυηδtwt=wt1+υυ=αυ−ηδtwt=wt−1+υ

You can use the formula above to update the weights.
Here, αα is the discount factor such that α(0,1)α∈(0,1). It is given by us and you do not need to adjust it.
ηη is the learning rate. It is also given by us.
υυ is the velocity update (A.K.A momentum update). δtδt is the gradient

  • TODO 1 You need to complete def miniBatchStochasticGradientDescent(model, momentum, _lambda, _alpha, _learning_rate) in neural_networks.py

Notice that for a complete mini-batch SGD, you will also need to find the best size of mini-batch and number of epochs. In this assignment, we omit this step. Both size of mini-batch and number of epoch has already been given. You do not need to adjust them.

Q2. Linear Layer

Second, you need to implement the linear layer of MLP. In this part, you need to implement 3 python functions in class linear_layer.

In the function def __init__(self, input_D, output_D), you need to initialize W with random values using np.random.normal such that the mean is 0 and standard deviation is 0.1. You also need to initialize gradients to zeroes in the same function.

forward pass:backward pass:u=linear(1).forward(x)=W(1)x+b(1),where W(1) and b(1) are its parameters.[lx,lW(1),lb(1)]=linear(1).backward(x,lu).forward pass:u=linear(1).forward(x)=W(1)x+b(1),where W(1) and b(1) are its parameters.backward pass:[∂l∂x,∂l∂W(1),∂l∂b(1)]=linear(1).backward(x,∂l∂u).

You can use the above formula as a reference to implement the def forward(self, X) forward pass and def backward(self, X, grad) backward pass in class linear_layer. In backward pass, you only need to return the backward_output. You also need to compute gradients of W and b in backward pass.

  • TODO 2 You need to complete def __init__(self, input_D, output_D) in class linear_layer of neural_networks.py
  • TODO 3 You need to complete def forward(self, X) in class linear_layer of neural_networks.py
  • TODO 4 You need to complete def backward(self, X, grad) in class linear_layer of neural_networks.py

Q3. Activation function – tanh

Now, you need to implement the activation function tanh. In this part, you need to implement 2 python functions in class tanh. In def forward(self, X), you need to implement the forward pass and you need to compute the derivative and accordingly implement def backward(self, X, grad), i.e. the backward pass.

tanh:h=21+e2u1tanh:h=21+e−2u−1

You can use the above formula for tanh as a reference.

  • TODO 5 You need to complete def forward(self, X) in class tanh of neural_networks.py
  • TODO 6 You need to complete def backward(self, X, grad) in class tanh of neural_networks.py

Q4. Activation function – relu

You need to implement another activation function called relu. In this part, you need to implement 2 python functions in class relu. In def forward(self, X), you need to implement the forward pass and you need to compute the derivative and accordingly implement def backward(self, X, grad), i.e. the backward pass.

relu:h=max{0,u}=⎡⎣⎢⎢max{0,u1}max{0,uM}⎤⎦⎥⎥relu:h=max{0,u}=[max{0,u1}⋮max{0,uM}]

You can use the above formula for relu as a reference.

  • TODO 7 You need to complete def forward(self, X) in class relu of neural_networks.py
  • TODO 8 You need to complete def backward(self, X, grad) in class relu of neural_networks.py

Q5. Dropout (15 points)

To prevent overfitting, we usually add regularization. Dropout is another way of handling overfitting. In this part, you will initially read and understand def forward(self, X, is_train) i.e. the forward pass of class dropout. You will also derive partial derivatives accordingly to implement def backward(self, X, grad) i.e. the backward pass of class dropout.
Now we take an intermediate variable qRJq∈RJ which is the output from one of the layers. Then we define the forward and the backward passes in dropout as follows.
The forward pass obtains the output after dropout.

forward pass:s=dropout.forward(qRJ)=11r×⎡⎣⎢⎢1[p1>=r]×q11[pJ>=r]×qJ⎤⎦⎥⎥,where pj is generated randomly from [0,1),j{1,,J},and r[0,1) is a pre-defined scalar named dropout rate which is given to you.forward pass:s=dropout.forward(q∈RJ)=11−r×[1[p1>=r]×q1⋮1[pJ>=r]×qJ],where pj is generated randomly from [0,1),∀j∈{1,⋯,J},and r∈[0,1) is a pre-defined scalar named dropout rate which is given to you.

The backward pass computes the partial derivative of loss with respect to qq from the one with respect to the forward pass result, which is ls∂l∂s.

backward pass:lq=dropout.backward(q,ls)=11r×⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢1[p1>=r]×ls11[pJ>=r]×lsJ⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥.backward pass:∂l∂q=dropout.backward(q,∂l∂s)=11−r×[1[p1>=r]×∂l∂s1⋮1[pJ>=r]×∂l∂sJ].

Note that pj,j{1,,J}pj,j∈{1,⋯,J} and rr are not be learned so we do not need to compute the derivatives w.r.t. to them. You do not need to find the best rr since we have picked it for you. Moreover, pj,j{1,,J}pj,j∈{1,⋯,J} are re-sampled every forward pass, and are kept for the corresponding backward pass.

  • TODO 9 You need to complete def backward(self, X, grad) in class dropout of neural_networks.py

Q6. Connecting the dots

In this part, you will combine the modules written from question Q1 to Q5 by implementing TODO snippets in the def main(main_params, optimization_type="minibatch_sgd") i.e. main function. After implementing forward and backward passes of MLP layers in Q1 to Q5,now in the main function you will call the forward methods and backward methods of every layer in the model in an appropriate order based on the architecture.

  • TODO 10 You need to complete main(main_params, optimization_type="minibatch_sgd") in neural_networks.py

Google Colab

Google colab is a free online jupyter notebook made available by google for researchers/students wherein the jupyter notebook is backed by GPU.

Jupyter notebook is a python notebook with executable cells accompanied by textual texts for better documentation of the code. They are a good way to document the code.

GPUs are now standard way to compute weights and gradients during training of neural networks, they are faster than CPUs due to their inherent parallelization nature.

We highly suggest trying it for computation of your forward networks and tinkering around with the num_epochs, learning rate to see how the training loss varies. You can look at it here:

https://colab.research.google.com/

Note

Do NOT change the hyperparameters and submit on vocareum even if your changes give better training loss. Changing the hyperparemeters is just for your understanding of how gradient descent is working to optimize the code.

  • PA3-f8jk8b.zip