Which of the following best refers to PAC -optimality solution to bandit problems? ϵ – is the difference between the reward of the chosen arm and true optimal reward δ – is the probability that chosen arm is not optimal N – is the number of steps to reach PAC-optimality.
options:-
1. Given δ and ϵ, minimize the number of steps to reach PAC-optimality(i.e. N)
2. Givenδ and N, minimize ϵ
3. Given ϵ and N, maximize the probability of choosing optimal arm(i.e. minimize δ)
4. none of the above is true about PAC-optimality
Answers
Answered by
6
Answer:
option 3rd is correct
hope it helps
Answered by
0
Option C) Given ϵ and N, maximize the probability of choosing optimal arm(i.e. minimize δ)
- To find the best arm, we only need to choose each one once because the rewards are deterministic.
- The greedy approach assures that each arm is selected at least once because, upon selecting any arm, the consequent reward estimate will necessarily be lower than the original estimates of the other arms. The greedy method has an initial reward that is larger than all possible rewards. The greedy algorithm will choose the arm with the highest reward once each arm has been chosen.
- We are informed that since the prizes are predetermined, it is required to pull each arm once before choosing the one for which we saw the highest payout.
- It should be noted that this is only possible because the incentives were predictable (i.e., non-stochastic) and because we were aware of this beforehand.
#SPJ2
Similar questions