Computer Science, asked by hicrk2412, 10 months ago

2) What is decay rate of the weightage given to past rewards in the computation of the Q function in the stationary and non-stationary updates in the multi-
armed bandit problem?
1. hyperbolic, linear

2. linear, hyperbolic


3. hyperbolic, exponential

4. exponential, linear

Answers

Answered by omkar252627
5

Answer:

this question answer is option 3

Answered by dikshaagarwal4442
0

Answer:

1. Decay rate of the weightage given to past rewards in the computation of the Q function in the stationary and non-stationary updates in the multi-armed bandit problems is hyperbolic and linear.

Explanation:

Multi-armed Bandit Problem

One of the easiest reinforcement learning (RL) issues to resolve is the multi-armed bandit problem. Each action our agent does results in a reward that is returned in accordance with a predetermined, underlying probability distribution. The objective of the game, which spans numerous episodes (in this case, single actions), is to maximize your prize.

One strategy is to choose each one in turn, record your earnings, and then again choose the option that gave you the highest payout. This is feasible, but as previously mentioned, each bandit has a corresponding underlying probability distribution, thus you might require more samples to locate the right one.

Nevertheless, each draw you waste attempting to determine which bandit to play will prevent you from receiving the greatest possible payout. The explore-exploit dilemma refers to this straightforward balancing act.

To learn more about reinforcement learning, click on the link below:

https://brainly.in/question/54329159

To learn more about the multi-armed bandit problem, click on the link below:

https://brainly.in/question/36036940

#SPJ2

Similar questions