what is difference between the DDQN and DQN?
$begingroup$
I think I did not understand what is the difference between DQN and DDQN in implementation.
I understand that we change the traget network during the running of DDQN but I do not understand how it is done in this code.
We put the self.target_model.set_weights(self.model.get_weights())
In implementation of DDQN and this is added when action is finished for DQN https://github.com/keon/deep-q-learning
self.target_model.set_weights(self.model.get_weights())
is added to DQN in order to change the DQN to DDQN! But this happens when we are going out from running by break! There for there is no difference between them!
What is wrong in my mind? (Maybe the difference will be at test? Is this code for train and the test is done with setting the exploration rate=0 and then run this for just one episode with new weight we found? Is it right?
Therefore, what is difference between the presented DQN (https://github.com/keon/deep-q-learning/blob/master/dqn.py) and DDQN(https://github.com/keon/deep-q-learning/blob/master/ddqn.py) of this link:
deep-learning reinforcement-learning dqn deep-network weight-initialization
$endgroup$
bumped to the homepage by Community♦ 17 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
$begingroup$
I think I did not understand what is the difference between DQN and DDQN in implementation.
I understand that we change the traget network during the running of DDQN but I do not understand how it is done in this code.
We put the self.target_model.set_weights(self.model.get_weights())
In implementation of DDQN and this is added when action is finished for DQN https://github.com/keon/deep-q-learning
self.target_model.set_weights(self.model.get_weights())
is added to DQN in order to change the DQN to DDQN! But this happens when we are going out from running by break! There for there is no difference between them!
What is wrong in my mind? (Maybe the difference will be at test? Is this code for train and the test is done with setting the exploration rate=0 and then run this for just one episode with new weight we found? Is it right?
Therefore, what is difference between the presented DQN (https://github.com/keon/deep-q-learning/blob/master/dqn.py) and DDQN(https://github.com/keon/deep-q-learning/blob/master/ddqn.py) of this link:
deep-learning reinforcement-learning dqn deep-network weight-initialization
$endgroup$
bumped to the homepage by Community♦ 17 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
$begingroup$
I think I did not understand what is the difference between DQN and DDQN in implementation.
I understand that we change the traget network during the running of DDQN but I do not understand how it is done in this code.
We put the self.target_model.set_weights(self.model.get_weights())
In implementation of DDQN and this is added when action is finished for DQN https://github.com/keon/deep-q-learning
self.target_model.set_weights(self.model.get_weights())
is added to DQN in order to change the DQN to DDQN! But this happens when we are going out from running by break! There for there is no difference between them!
What is wrong in my mind? (Maybe the difference will be at test? Is this code for train and the test is done with setting the exploration rate=0 and then run this for just one episode with new weight we found? Is it right?
Therefore, what is difference between the presented DQN (https://github.com/keon/deep-q-learning/blob/master/dqn.py) and DDQN(https://github.com/keon/deep-q-learning/blob/master/ddqn.py) of this link:
deep-learning reinforcement-learning dqn deep-network weight-initialization
$endgroup$
I think I did not understand what is the difference between DQN and DDQN in implementation.
I understand that we change the traget network during the running of DDQN but I do not understand how it is done in this code.
We put the self.target_model.set_weights(self.model.get_weights())
In implementation of DDQN and this is added when action is finished for DQN https://github.com/keon/deep-q-learning
self.target_model.set_weights(self.model.get_weights())
is added to DQN in order to change the DQN to DDQN! But this happens when we are going out from running by break! There for there is no difference between them!
What is wrong in my mind? (Maybe the difference will be at test? Is this code for train and the test is done with setting the exploration rate=0 and then run this for just one episode with new weight we found? Is it right?
Therefore, what is difference between the presented DQN (https://github.com/keon/deep-q-learning/blob/master/dqn.py) and DDQN(https://github.com/keon/deep-q-learning/blob/master/ddqn.py) of this link:
deep-learning reinforcement-learning dqn deep-network weight-initialization
deep-learning reinforcement-learning dqn deep-network weight-initialization
edited Sep 24 '18 at 16:16
user10296606
asked Sep 22 '18 at 5:19
user10296606user10296606
2989
2989
bumped to the homepage by Community♦ 17 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 17 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
From what I understand, the difference between DQN and DDQN is in the calculation of the target Q-values of the next states. In DQN, we simply take the maximum of all the Q-values over all possible actions. This is likely to select over-estimated values, hence DDPG proposed to estimate the value of the chosen action instead. The chosen action is the one selected by our policy model.
I looked through the codes and got confused too because this bit wasn't implemented. Then I realized they are commented out. The commented lines here would have selected the action for the next state using the current model and used the target model to get the Q-values for the selected action. They changed it in a commit some time ago, no idea why.
As for the code self.target_model.set_weights(self.model.get_weights())
, it is the updating of the target model. The target model is supposed to have the same function as the policy model, but the DQN algorithm purposely separates them and updates it once in a while to stabilize training. It can be done once every certain number of steps, or in this case they seem to do it every episode.
$endgroup$
add a comment |
$begingroup$
In particular, DQN is just Q-learning, which uses neural networks as a policy and use "hacks" like experience replay, target networks and reward clipping.
In original paper authors use convolutional network, which takes your image pixels and then fit it into a set of convolutional layers. However there are a couple of statistical problems:
- DQN approximate a set of values that are very interrelated (DDQN
solves it) - DQN tend to be overoptimistic. It will over-appreciate being in this state although this only happened due to the statistical error (Double DQN solves it)
$$Q(s,a) = V(s) + A(s,a)$$
By decoupling the estimation, intuitively our DDQN can learn which states are (or are not) valuable without having to learn the effect of each action at each state (since it’s also calculating V(s)
).
We’re able to calculate V(s). This is particularly useful for states where their actions do not affect the environment in a relevant way. In this case, it’s unnecessary to calculate the value of each action. For instance, moving right or left only matters if there is a risk of collision
As @emilyfy said self.target_model.set_weights(self.model.get_weights())
- the updating of the target model.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38632%2fwhat-is-difference-between-the-ddqn-and-dqn%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
From what I understand, the difference between DQN and DDQN is in the calculation of the target Q-values of the next states. In DQN, we simply take the maximum of all the Q-values over all possible actions. This is likely to select over-estimated values, hence DDPG proposed to estimate the value of the chosen action instead. The chosen action is the one selected by our policy model.
I looked through the codes and got confused too because this bit wasn't implemented. Then I realized they are commented out. The commented lines here would have selected the action for the next state using the current model and used the target model to get the Q-values for the selected action. They changed it in a commit some time ago, no idea why.
As for the code self.target_model.set_weights(self.model.get_weights())
, it is the updating of the target model. The target model is supposed to have the same function as the policy model, but the DQN algorithm purposely separates them and updates it once in a while to stabilize training. It can be done once every certain number of steps, or in this case they seem to do it every episode.
$endgroup$
add a comment |
$begingroup$
From what I understand, the difference between DQN and DDQN is in the calculation of the target Q-values of the next states. In DQN, we simply take the maximum of all the Q-values over all possible actions. This is likely to select over-estimated values, hence DDPG proposed to estimate the value of the chosen action instead. The chosen action is the one selected by our policy model.
I looked through the codes and got confused too because this bit wasn't implemented. Then I realized they are commented out. The commented lines here would have selected the action for the next state using the current model and used the target model to get the Q-values for the selected action. They changed it in a commit some time ago, no idea why.
As for the code self.target_model.set_weights(self.model.get_weights())
, it is the updating of the target model. The target model is supposed to have the same function as the policy model, but the DQN algorithm purposely separates them and updates it once in a while to stabilize training. It can be done once every certain number of steps, or in this case they seem to do it every episode.
$endgroup$
add a comment |
$begingroup$
From what I understand, the difference between DQN and DDQN is in the calculation of the target Q-values of the next states. In DQN, we simply take the maximum of all the Q-values over all possible actions. This is likely to select over-estimated values, hence DDPG proposed to estimate the value of the chosen action instead. The chosen action is the one selected by our policy model.
I looked through the codes and got confused too because this bit wasn't implemented. Then I realized they are commented out. The commented lines here would have selected the action for the next state using the current model and used the target model to get the Q-values for the selected action. They changed it in a commit some time ago, no idea why.
As for the code self.target_model.set_weights(self.model.get_weights())
, it is the updating of the target model. The target model is supposed to have the same function as the policy model, but the DQN algorithm purposely separates them and updates it once in a while to stabilize training. It can be done once every certain number of steps, or in this case they seem to do it every episode.
$endgroup$
From what I understand, the difference between DQN and DDQN is in the calculation of the target Q-values of the next states. In DQN, we simply take the maximum of all the Q-values over all possible actions. This is likely to select over-estimated values, hence DDPG proposed to estimate the value of the chosen action instead. The chosen action is the one selected by our policy model.
I looked through the codes and got confused too because this bit wasn't implemented. Then I realized they are commented out. The commented lines here would have selected the action for the next state using the current model and used the target model to get the Q-values for the selected action. They changed it in a commit some time ago, no idea why.
As for the code self.target_model.set_weights(self.model.get_weights())
, it is the updating of the target model. The target model is supposed to have the same function as the policy model, but the DQN algorithm purposely separates them and updates it once in a while to stabilize training. It can be done once every certain number of steps, or in this case they seem to do it every episode.
answered Oct 15 '18 at 9:31
emilyfyemilyfy
11
11
add a comment |
add a comment |
$begingroup$
In particular, DQN is just Q-learning, which uses neural networks as a policy and use "hacks" like experience replay, target networks and reward clipping.
In original paper authors use convolutional network, which takes your image pixels and then fit it into a set of convolutional layers. However there are a couple of statistical problems:
- DQN approximate a set of values that are very interrelated (DDQN
solves it) - DQN tend to be overoptimistic. It will over-appreciate being in this state although this only happened due to the statistical error (Double DQN solves it)
$$Q(s,a) = V(s) + A(s,a)$$
By decoupling the estimation, intuitively our DDQN can learn which states are (or are not) valuable without having to learn the effect of each action at each state (since it’s also calculating V(s)
).
We’re able to calculate V(s). This is particularly useful for states where their actions do not affect the environment in a relevant way. In this case, it’s unnecessary to calculate the value of each action. For instance, moving right or left only matters if there is a risk of collision
As @emilyfy said self.target_model.set_weights(self.model.get_weights())
- the updating of the target model.
$endgroup$
add a comment |
$begingroup$
In particular, DQN is just Q-learning, which uses neural networks as a policy and use "hacks" like experience replay, target networks and reward clipping.
In original paper authors use convolutional network, which takes your image pixels and then fit it into a set of convolutional layers. However there are a couple of statistical problems:
- DQN approximate a set of values that are very interrelated (DDQN
solves it) - DQN tend to be overoptimistic. It will over-appreciate being in this state although this only happened due to the statistical error (Double DQN solves it)
$$Q(s,a) = V(s) + A(s,a)$$
By decoupling the estimation, intuitively our DDQN can learn which states are (or are not) valuable without having to learn the effect of each action at each state (since it’s also calculating V(s)
).
We’re able to calculate V(s). This is particularly useful for states where their actions do not affect the environment in a relevant way. In this case, it’s unnecessary to calculate the value of each action. For instance, moving right or left only matters if there is a risk of collision
As @emilyfy said self.target_model.set_weights(self.model.get_weights())
- the updating of the target model.
$endgroup$
add a comment |
$begingroup$
In particular, DQN is just Q-learning, which uses neural networks as a policy and use "hacks" like experience replay, target networks and reward clipping.
In original paper authors use convolutional network, which takes your image pixels and then fit it into a set of convolutional layers. However there are a couple of statistical problems:
- DQN approximate a set of values that are very interrelated (DDQN
solves it) - DQN tend to be overoptimistic. It will over-appreciate being in this state although this only happened due to the statistical error (Double DQN solves it)
$$Q(s,a) = V(s) + A(s,a)$$
By decoupling the estimation, intuitively our DDQN can learn which states are (or are not) valuable without having to learn the effect of each action at each state (since it’s also calculating V(s)
).
We’re able to calculate V(s). This is particularly useful for states where their actions do not affect the environment in a relevant way. In this case, it’s unnecessary to calculate the value of each action. For instance, moving right or left only matters if there is a risk of collision
As @emilyfy said self.target_model.set_weights(self.model.get_weights())
- the updating of the target model.
$endgroup$
In particular, DQN is just Q-learning, which uses neural networks as a policy and use "hacks" like experience replay, target networks and reward clipping.
In original paper authors use convolutional network, which takes your image pixels and then fit it into a set of convolutional layers. However there are a couple of statistical problems:
- DQN approximate a set of values that are very interrelated (DDQN
solves it) - DQN tend to be overoptimistic. It will over-appreciate being in this state although this only happened due to the statistical error (Double DQN solves it)
$$Q(s,a) = V(s) + A(s,a)$$
By decoupling the estimation, intuitively our DDQN can learn which states are (or are not) valuable without having to learn the effect of each action at each state (since it’s also calculating V(s)
).
We’re able to calculate V(s). This is particularly useful for states where their actions do not affect the environment in a relevant way. In this case, it’s unnecessary to calculate the value of each action. For instance, moving right or left only matters if there is a risk of collision
As @emilyfy said self.target_model.set_weights(self.model.get_weights())
- the updating of the target model.
answered Jan 17 at 3:47
Daniel ChepenkoDaniel Chepenko
1313
1313
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38632%2fwhat-is-difference-between-the-ddqn-and-dqn%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown