CMU11485 Homework 3 Part 1-THE RISE OF RNN Solved

35.00 $

Category:

Description

5/5 - (1 vote)
  • You are required to do this assignment in the Python (version 3) programming language. Do not use any auto-differentiation toolboxes (PyTorch, TensorFlow, Keras, etc) – you are only permitted and recommended to vectorize your computation using the Numpy library.
  • We recommend that you look through all of the problems before attempting the first problem. However we do recommend you complete the problems in order, as the difficulty increases, and questions often rely on the completion of previous questions.
  • If you haven’t done so, use pdb to debug your code effectively.

Introduction

In this assignment, you will continue to develop your own version of PyTorch, which is of course called MyTorch (still a brilliant name; a master stroke. Well done!).

Homework Structure

Below is a list of files that are directly relevant to hw3.

IM POR TANT: First, copy the high lighted files/ fold ers from the HW3P1 hand out over to the cor re spond

ing fold ers that you used in hw1 and hw2.

NOTE: We recommend you make a backup of your hw3 files before copying everything over, just in case you break code or want to revert back to an earlier version.

Next, copy and paste the fol low ing code stubs from hw3/stubs.py into the cor rect files.
  1. Copy Slice(Function) into nn/functional.py.
  2. Copy Cat(Function) into nn/functional.py.
  3. Copy unsqueeze(), getitem () and     len () into tensor.py. These are methods for class Tensor.
  4. Copy function cat into tensor.py. This is an independent function and not a class method.

NOTE: You may need to define               pow , which implicitly calls the Pow(Function), if you haven’t done so

al ready. This should be able to han dle int ex po nents. Check HW3P1 FAQ for help.

0.1       Running/Submitting Code

This section covers how to test code locally and how to create the final submission.

0.1.1         Running Local Autograder

Run the command below to calculate scores and test your code locally.

./grade.sh 3

If this doesn’t work, converting line-endings may help:

sudo apt install dos2unix dos2unix grade.sh

./grade.sh 3

If all else fails, you can run the autograder manually with this:

python3 ./autograder/hw3_autograder/runner.py

0.1.2         Running the Sandbox

We’ve provided sandbox.py: a script to test and easily debug basic operations and autograd.

Note: We will not provide new sandbox methods for this homework. You are required to write your own from now onwards.

python3 sandbox.py

0.1.3         Submitting to Autolab

Note: You can submit to Autolab even if you’re not finished yet. You should do this early and often, as it guarantees you a minimum grade and helps avoid last-minute problems with Autolab.

Run this script to gather the needed files into a handin.tar file:

./create_tarball.sh

If this crashes (with some message about a hw4 folder) use dos2unix on this file too.

You can now upload handin.tar to Autolab .

1           RNN

In mytorch/nn/rnn.py we will implement a full-fledged Recurrent Neural Network module with the ability to handle variable length inputs in the same batch.

1.1         RNN Unit

Follow the starter code available in mytorch/nn/rnn.py to get a better sense of the various attributes of the RNNUnit.

Figure 1: The computation flow for the RNN Unit forward.

)                                                            (1)

The equation you should follow is given in equation 1.

You are re quired to im ple ment the for ward method of RN NUnit mod  

ule. The description of the inputs and

expected outputs are specified below:

Inputs

  • x (effective batch size, input size)
    • Input at the current time step.
    • NOTE: For now interpret effective batch size same as regular batch size. The difference will become apparent later in the homework. For the definition of effective batch size check Appendix and/or Figure 5
  • h (effective batch size, hidden size)
    • Hidden state generated by the previous time step, ht−1

Outputs

  • h prime: (effective batch size, hidden size)

– New hidden state generated at the current time step, h0t

NOTE: As a matter of convention, if you apply certain reshape/transpose operations while using self.weight ih then please do the same for self.weight hh. This is important to note because self.weight hh is symmetric and is therefore exposed to multiple interpretations on how to use it.

1.2         Detour 1: Cat, Slice, Unsqueeze

In the following section we implement some necessary functions which will prove to be extremely handy while completing the rest of the homework. These are namely: Cat, Slice and Unsqueeze.

1.2.1         Cat (7 points)

Concatenates the given sequence of tensors in the given dimension. All tensors must either have the same shape (except in the concatenating dimension) or be empty. Please refer to the PyTorch Documentation for better understanding. If you are not into documentations then please refer back to Recitation 0 where this was covered as well.

First implement the corresponding Cat class in mytorch/nn/functional.py. This will be a subclass of

Func tion class.
of se  

Next implement a helper function cat in the mytorch/tensor.py to call the concatenation operation on a list quences. This should in turn correctly call the corresponding Function sub-class. Below is a description

of the required inputs and outputs.

Inputs

  • seq: list of tensors
    • The list basically contains the sequences we want to concatenate
  • dim: (int, default=0)
    • The dimension along which we are supposed to concatenate the tensors in the list seq

Outputs • Tensor:

  • The concatenated tensor

NOTE: You don’t need to add anything to the Tensor class in mytorch/tensor.py with respect to Cat operation.

1.2.2         Slice

Despite being worth only 5 points, implementing this operation takes your Tensor class to a while new level. With this operation up and running, you can index your Tensor class just like you index Numpy arrays/PyTorch tensors, while autograd takes charge of calculating the required gradients (assuming you implemented the backward correctly).

First implement the corresponding Slice class in mytorch/nn/functional.py. This will be a subclass of

Func tion class.  
Next add the getitem method in your Ten sor class my torch tensor.py, which in turn calls the Slice
func tion sub -class.  
 

HINT

The implementation of a slicing operation from scratch may appear to be a daunting task but we will employ a cool trick to get this done in just a few lines of code. Whenever we try to slice a segment of a given tensor using the [] notation, python creates a key (depending on what you provide within []) and passes this to the getitem  function. This key is used by the class to provide the appropriate result to the user. The key can be either an integer, a python slice object, a list etc depending on how we slice our object. This is the principle implemented in Numpy for slicing ndarrays. For the purpose of our implementation we will try to leverage this fact to make our task easy. Once we create a getitem  method for our tensor class, everytime we slice our tensors the getitem  method will be invoked with the appropriate key. Given that the intention of slicing on our tensor is to get the appropriate segment of the underlying ndarray (data attribute) as a tensor object, can you use this key to complete the task.

1.2.3         Unsqueeze

Returns a new tensor with a dimension of size one inserted at the specified position. Please refer to the PyTorch Documentation for better understanding. If you are not into documentations then please refer back to Recitation 0 where this was covered as well.

Add the unsqueeze method in your Ten sor class my torch tensor.py

.

HINT: Why is this only worth 3 points? Well if you look closely, you might be able to use a function you implemented previously without any extra effort. The name of the function you might need begins with ’R’ in numpy.

1.3         Detour 2: pack sequence, unpack sequence

What’s all the fuss about handling variable length samples in a batch?

This section is devoted to the construction of objects of class PackedSequence which help us create batches in the context of RNNs.

As you might have guessed by now, handling of variable length sequences in a batch requires way more machinery than what exists in a simple RNN. In this section we will create all the utility functions that will help us with the handling of variable length sequence in a single batch.

Before we proceed let’s understand an important class that packages our packed sequences in a form which makes it easier for us to deal with them. This will also help you better understand why handling variable length sequences in a batch can be a tedious task.

Refer to Appendix for more details.

Please use the Ten sor Slic ing and Cat op er ations you im ple mented in the pre vi ous sec tions while im ple ment

ing these func tions.

1.3.1         pack sequence (15 points)

In my torch/n n/u til.py im ple ment the method pack sequence

.

This function as the name suggests, packs a list of variable length Tensors into an object of class PackedSequence. A list of tensors is given (as shown in Figure 2) which are to be converted into a PackedSequence object (detailed class description below). This object holds the packed 2d tensor which is later fed into the RNN module for training and testing. Refer to Figures 2, 3 and 4 to better understand what pack sequence does.

Figure 2: List of tensors we want to pack

Figure 3: First we sort the list in a descending order based on number of timesteps in each

Figure 4: Final Packed 2d Tensor

Class: PackedSequence Attributes

  • data
    • The actual packed tensor. In the context of this homework this is a 2D tensor. Refer to 4
  • sorted indices
    • 1d ndarray (numpy n-dimensional array) of integers containing the indices of elements in the original list when they are sorted based on number of time steps in each sample. Refer to Figure 2 and Figure 3
  • batch sizes
    • 1d ndarray of integers where ith element represents the number of samples in the original list having atleast (i+1) timesteps. Refer to Figure 4 for more clarity.

Function: pack sequence Inputs

  • seq: list of tensors

– The list contains tensors representing individual instances, each having a variable length.

Outputs

  • PackedSequence:

– PackedSequence

1.3.2         unpack sequence

In my torch/n n/u til.py im ple ment the method un pack sequence

.

This function unpacks a given PackedSequence object into a list of tensors that were used to create it. This is also a part of the class PackedSequence.

Function: unpack sequence Inputs

  • ps: PackedSequence

Outputs

  • list of Tensors:

1.4         TimeIterator

In my torch/n n/rnn.py im ple ment the for ward method for the TimeIt er a tor

.

er ates through time by pro cess ing the en tire se quence of  

For a given input this class it          timesteps. Can be thought to represent a single layer for a given basic unit (interpret this as RNNUnit for now) which is applied at each time step.

Working of forward method in TimeIterator

  • This module’s forward method is tasked with receiving a batch of samples packed in the PackedSequence form.
  • The method runs the section of the input corresponding to a single time-step across the batches through an RNNUnit.
  • The hidden state returned by the RNNUnit is collected and then the section of the input corresponding to the next time-step across the batches are fed through the RNNUnit along with the previously ( lasttime step) collected hidden state to generate a new hidden-state.
  • This process is done iteratively till all time-steps are exhausted for each sample in the batch.
  • Follow Figures 5, 6, 7 and 8 to get a better understanding of how TimeIterator processes and given input PackedSequence.

The inputs and outputs of the forward method are given below for the

Class: TimeIterator Forward Method Inputs

  • input: PackedSequence
  • hidden: Tensor (batch size,hidden size)

Outputs

  • PackedSequence: hiddens at each timestep for each sample packaged as a packed sequence
Sam ples are or dered in de scend ing or der based on num ber of  
  • Tensor (batch size,hidden size): This is the hidden generated by the last time step for each sample joined together. This is a slight deviation from PyTorch.
base class with ba sic unit = RN  

Finally in mytorch/nn/rnn.py create a class RNN as a sub-class of TimeIterator which instantiates the NUnit. Please look at the code for more details.

Figure 5: Iteration 0 for the TimeIterator

Figure 6: Iteration 1 for the TimeIterator

Figure 7: Iteration 2 for the TimeIterator

Figure 8: The final output from the TimeIterator

2           GRU

In this section we will explore the world of GRU. We will heavily use the machinery built for RNN to make our lives easier and create a full-fledged working GRU.

2.1         GRU Unit

nam ing con ven tion than the Py torch doc u men ta  

In mytorch/nn/gru.py implement the forward pass for a GRUUnit (though we follow a slightly different tion.) The equations for a GRU cell are the following:

rt = σ(Wirxt + bir + Whrht−1 + bhr) (2)
zt = σ(Wizxt + biz + Whzht−1 + bhz) (3)
nt = tanh(Winxt + bin + rt ⊗ (Whnht−1 + bhn)) (4)
ht = (1 −zt) ⊗nt + zt ⊗ht−1 (5)

Please refer to (and use) the GRUUnit class attributes defined in the init method. You are not expected to define any extra attributes for a working implementation of this class.

The inputs to the GRUCell forward method are x and h represented as xt and ht−1 in the equations above. These are the inputs at time t.

The output of the forward method is ht in the equations above.

You are required to implement the forward method of this module. The description of the inputs and expected outputs are specified below:

Inputs

  • x (effective batch size, input size)
    • Input at the current time step.
    • For the definition of effective batch size check Appendix and/or Figure 5
  • h (effective batch size, hidden size)
    • Hidden state generated by the previous time step, ht−1

Outputs

  • h prime: (effective batch size, hidden size)

– New hidden state generated at the current time step, h0t

2.2         GRU

class with ba sic unit = GRUU  

Finally in mytorch/nn/gru.py create a class GRU as a sub-class of TimeIterator which instantiates the base nit. Please look at the code for more details.

Appendix

A       Glossary

  • effective batch: ith effective batch refers to the set of ith timesteps from each sample that are simultaneously fed to the RNNUnit in the (i − 1)th iteration inside the TimeIterator
  • effective batch size: number of samples in an effective batch. Effective batch size of the ith effective batch is equal to the number of samples containing atleast (i+1) timesteps

B What’s all the fuss about handling variable length samples in one batch?

B.1        Importance of batching in Deep Learning

When training a neural network, we stack together a number of inputs a.k.a batching and pass them together for a single training iteration. From a computational standpoint this helps us make the most of GPUs by parallelizing this computation (done by various frameworks such a PyTorch for you). Moreover ideas like batch normalization fundamentally depend on existence of batches instead of using single samples for training. Therefore there is no denying the importance of batching in the deep learning world.

B.2          Batching for MLPs and CNNs

Life in a world with only MLPs and CNNs was simpler from a batching perspective. Each input sample had exactly the same shape. These input samples in the form of tensors (each with the same shape) could then be stacked together along a new dimension to create a higher-dimensional tensor which then becomes our batch. For example consider a simple 1-D input where each sample has 20 features. We now want to create a batch of 256 such samples. Each sample has a shape (20,) which are then stacked along a new dimension (this now becomes dimension 0) to provide a batch of shape (256,20) where the first dimension represents number of samples in a batch while the second dimension represents the number of features.

B.3        Batching in RNNs

With RNNs, this becomes very tricky. Each sample in this context has a temporal dimension (atleast in the simplest case). The number of feature at each time step remains fixed for all the samples. Let this be K for the purpose of our discussion. Therefore each sample i has Ti time steps where at each time step we have K features. The ith sample can then be represented by a tensor of shape (Ti,K). Given that the first dimension is of variable length there is no simple way for us to create batches as we did for MLPs and CNNs. One might argue for fixed time-step inputs, but that severely limits the power of RNNs/GRUs/LSTMs and reduces them to a large fixed MLP. Therefore batching in RNNs require a special setup which we provide via PackedSequences.

  • P1-k7pmti.zip