You are designing a Reinforcement Learning agent for a racing game. Among the following reward schemes, which one leads to the best performance of
the agent?
+5 for reaching the finish line, -1 for going off the road
+5 for reaching the finish line, -0.1 for every second that passes before the agent reaches the finish line
+5 for reaching the finish line, -0.1 for every second that passes before the agent reaches the finish line, +1 for the agent going off the road
-5 for reaching the finish line, +0.1 for every second that passes before the agent reaches the finish line
Answers
Answered by
0
Explanation:
To analyse the best performance, let us write all the possibilities in mathematical equations;
Let x be the number of seconds before the finishing line is crossed;
Equation A ;
R = +5 - 1 = 4
Equation B ;
R = +5 -0.1x
Equation C ;
R = +5 - 0.1x + 1
Equation D ;
R = -5 + 0.1x
Now to get optimum performance, equation C can lead to best result provided value of x is less than 20
Similar questions