TASK: The figure below shows a rectangular grid world representation of a simple finite MDP. The cells of the grid except two grey colored walls correspond to the states of the environment. At each cell, five actions are possible: north, south, east, west, and south-east, which deterministically cause the agent to move one cell in the respective direction on the grid. Actions that would take the agent off the grid or that would make the agent hit a wall leave its location unchanged, but also result in a reward of ?1. Other actions result in a reward of 0, except those that move the agent out of the special states X and Y. From state X, all five actions yield a reward of +3 and take the agent to X?. From state Y, all actions yield a reward of +5 and take the agent to Y?. Perform the following tasks; 1. Solve the given MDP using Policy Iteration (Policy Evaluation + Greedy Policy Improvement) 2. Solve the given MDP using Value Iteration For both tasks, submit the optimal state-value, action-value functions, and optimal policy together with the number of iterations it takes to compute these functions. Besides, provide state value function for the equiprobable random policy.
• Posted on : April 16th, 2018

