Name: CS312 Lab 7-Reinforcement Learning Solved
SKU: 87308
Price: 35.00 USD
Availability: InStock

CS312 Lab 7-Reinforcement Learning Solved

35.00 $

Category: CS312

Click Category Button to View Your Next Assignment | Homework

You'll get a download link with a: zip solution files instantly, after Payment

Securely Powered by:

Description

Rate this product

Reinforcement Learning

Goal:Solve an MDP problem using policy and value iteration.

Note: This assignment is to be done individually.

MDP(Markov Decision Process): Grid World Problem:

There will be a grid, location of the player in the grid will represent a state, there will be a starting state, there will be two absorbing states having very different rewards like +1 and -20 while other states will have negative reward -1 associated with them, movement to that state will incur this negative reward. The black block is a wall where your agent won’t be able to penetrate through. The transition probabilities for moving from one state to another are also given below. We need to find optimal movement direction for each state.

		End +10
		End -200
	ssssssss
Start

Below are the transition probabilities

Develop code for solving the MDP problem using policy and value iteration.
Write a report clearly describing the above MDP considered and your observations on
running the policy and value iteration algorithms on the formulated MDP.
Further, one should also suggest ways to check whether the algorithm yields optimal policy for the setting considered.

Submission: This assignment is to be submitted individually.
Please submit a zip file <Roll_number>.zip with the following contents

Program: <Roll_number>.<extension> (e.g., 1800100xx.c/cpp)
Report: <Roll_number>.<extension> (e.g., 1800100xx.pdf). Report should be in pdf
format.
Readme file: readme.txt (Execution details)

Report Format :

[1 mark] MDP Description: Clearly describe (S, A, P, R, N)
[5 marks] State-transition Graph for the MDP
[1 mark] Optimal Policy: Suggest ways to check whether the algorithm yields optimal policy for the setting considered.
[5 marks] Experimental Results: Vary the gamma parameter, show the policy found in each case by both algorithms
[2 marks] Comparison of Policy Iteration and Value Iteration
[1 marks] Conclusions

Lab-7-c0a6no.zip

CS312 Lab 7-Reinforcement Learning Solved

If Helpful Share:

Description

Related products

CS312 Lab 4-TSP Competition Solved

CS312 Assignment 1-Uninformed Search Solved

CS312 Task 6-Machine Learning Support Vector Machine Classifier Solved