CSCI-GA3033-090 Homework 1 Solved

30.00 $ 15.00 $

Click Category Button to View Your Next Assignment | Homework

You'll get a download link with a: . zip solution files instantly, after Payment


5/5 - (1 vote)


This homework is designed to follow up on the lecture about Deep Imitation Learning. For this assignment, you will need to know about the imitation learning algorithms we talked about in the class, in particular DAgger., If you have not already, we propose you brush up on the lecture notes.


Code folder

Find the folder with the provided code in the following google drive folder: Download all the files in the same directory, and then follow the instructions in the file, and then file.


In the class, we learned about the DAgger (dataset aggregation) algorithm, which is used to clone an expert policy. This method is quite useful especially when querying the expert is expensive, and thus we want to learn a policy that is almost as good as the expert without the high number of queries to it.


In this homework, we have provided you with an environment that is hard to learn directly. Thankfully, we have access to an expert in this environment. In this homework, your task will be to utilize DAgger to learn a deep neural network policy that performs well on this task.



The environment we will use in this homework is built upon the Reacher environment from OpenAI gym ( We have provided our environment in the file in our code directory. It follows the OpenAI gym API, which you can learn more about at For this homework, an agent in this environment is considered successful if it can achieve a mean reward of at least 15.0.


In this homework, we will attempt to learn this agent from image observations. Unfortunately, learning this agent directly from images without any priors is incredibly difficult, since images can be from a very high dimensional space. Thankfully, we have access to an expert prediction for any state the environment is currently on, which can be retrieved by the get_expert_action() function call. Note: get_expert_action() does not take any arguments, thus you must be careful to call it right after you have called .reset() or .step() on the environment to get the associated expert action.

Question 1

Download the code folder, with every file associated, from here Complete the code template provided in, with the right code in every TODO section, to implement DAgger. Attach the completed file in your submission.

Question 2

Create a plot with the number of expert queries on the X-axis, and the performance of the imitation model on the Y-axis. Elaborate if you see any clear trends here. (Hint: in the env, the variable expert_calls counts the number of expert queries.

Question 3

Could you potentially improve on the number of queries to the expert made by the DAgger algorithm? Think about when querying the expert may be redundant.


Bonus points: Try implementing your answer from question 3, and generate a query-vs-reward plot similar to question 2 for this implementation. Compare this plot with your answer from Q2. Is there a clear improvement?

Python environment installation instructions

  1. Make sure you have conda installed in your system. Instructions link here.
  2. Then, get the conda_env.yml file, and from the same directory, run conda env create -f conda_env.yml. If you don’t have a GPU, you can remove the line saying – nvidia::cudatoolkit=11.1.
  3. Activate the environment, conda activate hw1_dagger.
  4. Then, install pybullet gym using the following instructions:
    (New: alternately, just install pybullet-gym from here: thanks Shubham!)
  5. If you installed it from the official repo, go to the pybullet-gym directory, find this file: pybullet-gym\pybulletgym\envs\roboschool\envs\ and change L29-L33 to the following:
    self._cam_dist = 0.75

self._cam_yaw = 0

self._cam_pitch = -90

self._render_width = 320

self._render_height = 240

  1. If you are still having trouble with training, up the image resize from (60, 80) to something higher.
  2. Finally, run the code with python once you have completed all the to-do steps in the code itself.