Q3. In micro-blackjack, you repeatedly draw a card (with replacement) that is equally likely to be a 1, 2, or 3. You can either Draw or Stop if the total score of the cards you have drawn is less than 6. If your total score is 5 or higher, the game ends, and you receive a utility of o. When you Stop, your utility is equal to your total score (up to 4), and the game ends. When you Draw, you receive no utility. There is no discount (y = 1). Let's formulate this problem as an MDP with the following states: 0, 1, 3, 4 and a Done state, for when the game ends. (a) What are the transition functions and the reward functions for this MDP? (10 points) (b) Fill in the following table of value iteration values for the first 3 iterations. Show your works. (20 points) States 0 1 2 3 4 V(0) V(1) V(2) (c) You should have noticed that value iteration converged above. What is the optimal policy for the MDP based on above iteration result? Fill the table below. (10 points) States 0 1 2 3 4 T*
Answers
Answered by
0
Answer:
34t is lulululylulkulouliluliioluklukkfvlukvcghdhjdjfofmrnjffjis called 345
Similar questions