1.The multi-armed bandit problem is a generalized use
case for
a.reinforcement learning
b.supervised learning
c.unsupervised learning
d.all of the above
Answers
Answered by
25
Unsupervised learning (C)
Answered by
0
The correct answer is option (a) A. Reinforcement learning.
Explanation:
- Multi-Arm Bandit (MAB) is a classic reinforcement learning problem, in which a player is facing with k slot machines or bandits, each with a different reward distribution, and the player is trying to maximise his cumulative reward based on trials.
- It is a peculiar Reinforcement Learning (RL) problem that has wide applications and is gaining popularity.
- Multi-armed bandits extend RL by ignoring the state and try to balance between exploration and exploitation.
Similar questions