English, asked by kshetrakarra, 5 months ago

what is differences between Q-learning & deep Q-learning

Answers

Answered by prakashkkaladindi
0

Answer:

Why 'Deep' Q-Learning? Q-learning is a simple yet quite powerful algorithm to create a cheat sheet for our agent. This helps the agent figure out exactly which action to perform

Answered by hkofficial654
0

Explanation:

The main motive behind developing Deep Q-Learning was to handle environments involving continuous states and action.

Using the rudimentary Q-Learning algorithm is fine for small and discrete environments. It works by maintaining a Q-table where the row number encodes the specific sates of the environment and the columns encode the various actions that our agent can take in the environment. If you have a continuous environment, you can still work with Q-Learning by discretizing the states but if you have multiple variables to define any possible state in the environment, the Q-table will be ridiculously large and not feasible or practical. The reason is simple. The more number of rows and columns you have in your Q-table, the more time it will take the agent to explore every state and to update the values. So even after training, most of the cells in the Q-table will be zero. All in all, not a feasible solution!

The best solution to this problem is called Deep Q-Network. It uses a deep neural network to approximate the Q-table. Think of it this way…

You have a neural network that takes the state as it’s input. The output (prediction) of this neural network will be the Q-value for each action. Now, for any given state, the most desirable action is simply the action with the biggest Q value!

Due to sequential nature of sampling experiences, there is a problem of high correlation between samples. This problem is tackled by creating a buffer of experiences. The training data is randomly sampled from this buffer. This process is called Experience Replay. It’s highly advisable to use this technique when making Deep Q-Networks. In a nutshell, you will make your network more likely to output the target value when it sees the same state again. Ultimately the neural network should learn to generalize from similar states.

To demonstrate the power of Deep Q-Learning, I have implemented this algorithm on the CartPole environment on OpenAI Gym. I have used Keras library to make the neural network. The good thing about Keras is that you can save your model after training and load it later for testing.

Similar questions