English, asked by sharmaditi032, 20 days ago

1.The multi-armed bandit problem is a generalized use
case for
a.reinforcement learning
b.supervised learning
c.unsupervised learning
d.all of the above​


Answered by DarkenedSky

Unsupervised learning (C)

Answered by sarahssynergy

The correct answer is option (a) A. Reinforcement learning.


  • Multi-Arm Bandit (MAB) is a classic reinforcement learning problem, in which a player is facing with k slot machines or bandits, each with a different reward distribution, and the player is trying to maximise his cumulative reward based on trials.
  • It is a peculiar Reinforcement Learning (RL) problem that has wide applications and is gaining popularity.
  • Multi-armed bandits extend RL by ignoring the state and try to balance between exploration and exploitation.
Similar questions