(a) (10 points) Consider a UAV performing reconnaissance in a 4×4 grid of sectors as depicted in the figure above. The UAV has the ability to fly north, south, west and east with each action moving it by one sector. Each action is successful in its intended direction by a probability of 0.85 . Remaining probability is divided equally between the two directions perpendicular to its intended action. The UAV prefers the sectors "igure 1: UAV reconnaissance problem for the programming assignment with a green circle and would like to avoid the red sector. The patterned sector is out of bounds. Write a program in C,C++ or Java that models this problem as a MDP consisting of a tuple of states, actions, transition and reward functions. Assign a reward of +1 to the sectors with a green circle and a cost of 1 to the red sector. All other sectors have a cost of 0.05 . (b) (20 points) In the program, implement policy iteration for MDPs whose algorithm is provided in Fig. 17.7 of the textbook, for the optimality criterion of discounted infinite horizon with a discount factor of γ=0.99. Display the converged policy of the UAV as output. Use the policy to generate a trajectory from the start state (0,0), and determine if it leads to any of the green sectors. Show this trajectory in the README file.

Question

contestada

Respuesta :