Computer Science, asked by hicrk2412, 11 months ago

Consider the following policy-search algorithm for a multi-armed binary bandit:

∀a, πt+1(a)=πt(a)(1−α)+α(1a=atRt+(1−1a=at)(1−Rt))
where 1at=a is 1 if a=at and 0 otherwise. Which of the following is true for the above algorithm?

1. It is LR−I algorithm

2. It is LR−ϵP algorithm

3. It would work well if the best arm had probability of 0.9 of resulting in +1 reward and the next best arm had probability of 0.5 of resulting in +1 reward

4. It would work well if the best arm had probability of 0.3 of resulting in +1 reward and the worst arm had probability of 0.25 of resulting in +1 reward

Answers

Answered by Harsimran134

Answer:

Pata nhi

Explanation:

Answered by pronayfarab03

Answer:
3
Explanation:
The equation is a L r-p algorithm so it would work well if the best arm had probability of 0.9 of resulting in +1 reward and the next best arm had probability of 0.6 of resulting in +1 reward.

Previous Question

Next Question