Computer Science, asked by ghadgeakshay44, 1 year ago

You are designing a Reinforcement Learning agent for a racing game. Among the following reward schemes, which one leads to the best performance of
the agent?
+5 for reaching the finish line, -1 for going off the road
+5 for reaching the finish line, -0.1 for every second that passes before the agent reaches the finish line
+5 for reaching the finish line, -0.1 for every second that passes before the agent reaches the finish line, +1 for the agent going off the road
-5 for reaching the finish line, +0.1 for every second that passes before the agent reaches the finish line

Answers

Answered by nidaeamann

Explanation:

To analyse the best performance, let us write all the possibilities in mathematical equations;

Let x be the number of seconds before the finishing line is crossed;

Equation A ;

R = +5 - 1 = 4

Equation B ;

R = +5 -0.1x

Equation C ;

R = +5 - 0.1x + 1

Equation D ;

R = -5 + 0.1x

Now to get optimum performance, equation C can lead to best result provided value of x is less than 20

Previous Question

Next Question