CS312 Lab 7-Reinforcement Learning Solved

35.00 $

Category:
Click Category Button to View Your Next Assignment | Homework

You'll get a download link with a: zip solution files instantly, after Payment

Securely Powered by: Secure Checkout

Description

Rate this product

Reinforcement Learning

Goal:Solve an MDP problem using policy and value iteration.

Note: This assignment is to be done individually.

MDP(Markov Decision Process): Grid World Problem:

There will be a grid, location of the player in the grid will represent a state, there will be a starting state, there will be two absorbing states having very different rewards like +1 and -20 while other states will have negative reward -1 associated with them, movement to that state will incur this negative reward. The black block is a wall where your agent won’t be able to penetrate through. The transition probabilities for moving from one state to another are also given below. We need to find optimal movement direction for each state.

End +10

End -200

ssssssss

Start

Below are the transition probabilities

  1. Develop code for solving the MDP problem using policy and value iteration.
  2. Write a report clearly describing the above MDP considered and your observations on

    running the policy and value iteration algorithms on the formulated MDP.

  3. Further, one should also suggest ways to check whether the algorithm yields optimal policy for the setting considered.

Submission: This assignment is to be submitted individually.
Please submit a zip file <Roll_number>.zip with the following contents

  1. Program: <Roll_number>.<extension> (e.g., 1800100xx.c/cpp)
  2. Report: <Roll_number>.<extension> (e.g., 1800100xx.pdf). Report should be in pdf

    format.

  3. Readme file: readme.txt (Execution details)

Report Format :

  1. [1 mark] MDP Description: Clearly describe (S, A, P, R, N)
  2. [5 marks] State-transition Graph for the MDP
  3. [1 mark] Optimal Policy: Suggest ways to check whether the algorithm yields optimal policy for the setting considered.
  4. [5 marks] Experimental Results: Vary the gamma parameter, show the policy found in each case by both algorithms
  5. [2 marks] Comparison of Policy Iteration and Value Iteration
  6. [1 marks] Conclusions
  • Lab-7-c0a6no.zip