Consider the following environment of PacMan For the environment design a Reinforcement Learning Agent (Pacman), the objective of the agent is to figure out the best actions the agent can take at any given state. The rules of the game are as follows: Every move has a reward of Consuming a food pe1 llet will have a reward of +10 If pacman collides with a ghost, then the reward will be 500 If the pacman has eaten all the food pellets without colliding with the ghosts, then the reward will be +500 Use Q Assume a discount factor of 0.8 The action noise is 0. 3 (the consequences are the same as in the grid world example) The environment is static i.e. no ghosts are moving The actions for pacman are Up, Down, North and Right You can cross the wallsLearning to figure out the best action at every state. Sh iteration of QLearning. ow your working for every
Answers
Answered by
1
DQN, and similar algorithms like AlphaGo and TRPO, fall under the category of reinforcement learning (RL), a subset of machine learning. In reinforcement learning, an agent exists within an environment and looks to maximize some kind of reward. It takes an action, which changes the environment and feeds it the reward associated with that change. Then it takes a look at its new state and settles on its next action, repeating the process endlessly or until the environment terminates. This decision-making loop is more formally known as a Markov decision process (MDP).
thanks
Answered by
0
Explanation:
same answer sorry for that
Attachments:
Similar questions
Social Sciences,
1 month ago
English,
1 month ago
Math,
3 months ago
English,
3 months ago
Physics,
10 months ago