what is difference between the DDQN and DQN?

I think I did not understand what is the difference between DQN and DDQN in implementation.
I understand that we change the traget network during the running of DDQN but I do not understand how it is done in this code.

We put the self.target_model.set_weights(self.model.get_weights())
In implementation of DDQN and this is added when action is finished for DQN https://github.com/keon/deep-q-learning self.target_model.set_weights(self.model.get_weights()) is added to DQN in order to change the DQN to DDQN! But this happens when we are going out from running by break! There for there is no difference between them!

What is wrong in my mind? (Maybe the difference will be at test? Is this code for train and the test is done with setting the exploration rate=0 and then run this for just one episode with new weight we found? Is it right?

Therefore, what is difference between the presented DQN (https://github.com/keon/deep-q-learning/blob/master/dqn.py) and DDQN(https://github.com/keon/deep-q-learning/blob/master/ddqn.py) of this link:

edited Sep 24 '18 at 16:16

asked Sep 22 '18 at 5:19

user10296606

2989

bumped to the homepage by Community♦ 17 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

edited Sep 24 '18 at 16:16

asked Sep 22 '18 at 5:19

user10296606

2989

bumped to the homepage by Community♦ 17 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

edited Sep 24 '18 at 16:16

asked Sep 22 '18 at 5:19

user10296606

2989

deep-learning reinforcement-learning dqn deep-network weight-initialization

edited Sep 24 '18 at 16:16

asked Sep 22 '18 at 5:19

user10296606

2989

edited Sep 24 '18 at 16:16

asked Sep 22 '18 at 5:19

user10296606

2989

edited Sep 24 '18 at 16:16

asked Sep 22 '18 at 5:19

user10296606

2989

asked Sep 22 '18 at 5:19

user10296606

2989

asked Sep 22 '18 at 5:19

user10296606

2989

bumped to the homepage by Community♦ 17 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 17 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

2 Answers
2

active

oldest

votes

From what I understand, the difference between DQN and DDQN is in the calculation of the target Q-values of the next states. In DQN, we simply take the maximum of all the Q-values over all possible actions. This is likely to select over-estimated values, hence DDPG proposed to estimate the value of the chosen action instead. The chosen action is the one selected by our policy model.

I looked through the codes and got confused too because this bit wasn't implemented. Then I realized they are commented out. The commented lines here would have selected the action for the next state using the current model and used the target model to get the Q-values for the selected action. They changed it in a commit some time ago, no idea why.

As for the code self.target_model.set_weights(self.model.get_weights()), it is the updating of the target model. The target model is supposed to have the same function as the policy model, but the DQN algorithm purposely separates them and updates it once in a while to stabilize training. It can be done once every certain number of steps, or in this case they seem to do it every episode.

answered Oct 15 '18 at 9:31

emilyfy

add a comment |

In particular, DQN is just Q-learning, which uses neural networks as a policy and use "hacks" like experience replay, target networks and reward clipping.

In original paper authors use convolutional network, which takes your image pixels and then fit it into a set of convolutional layers. However there are a couple of statistical problems:

DQN approximate a set of values that are very interrelated (DDQN
solves it)

DQN tend to be overoptimistic. It will over-appreciate being in this state although this only happened due to the statistical error (Double DQN solves it)

$$Q(s,a) = V(s) + A(s,a)$$

By decoupling the estimation, intuitively our DDQN can learn which states are (or are not) valuable without having to learn the effect of each action at each state (since it’s also calculating V(s)).

We’re able to calculate V(s). This is particularly useful for states where their actions do not affect the environment in a relevant way. In this case, it’s unnecessary to calculate the value of each action. For instance, moving right or left only matters if there is a risk of collision

As @emilyfy said self.target_model.set_weights(self.model.get_weights()) - the updating of the target model.

answered Jan 17 at 3:47

Daniel Chepenko

1313

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f38632%2fwhat-is-difference-between-the-ddqn-and-dqn%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

answered Oct 15 '18 at 9:31

emilyfy

add a comment |

answered Oct 15 '18 at 9:31

emilyfy

add a comment |

answered Oct 15 '18 at 9:31

emilyfy

answered Oct 15 '18 at 9:31

emilyfy

answered Oct 15 '18 at 9:31

emilyfy

answered Oct 15 '18 at 9:31

emilyfy

answered Oct 15 '18 at 9:31

emilyfy

add a comment |

In particular, DQN is just Q-learning, which uses neural networks as a policy and use "hacks" like experience replay, target networks and reward clipping.

In original paper authors use convolutional network, which takes your image pixels and then fit it into a set of convolutional layers. However there are a couple of statistical problems:

DQN approximate a set of values that are very interrelated (DDQN
solves it)

DQN tend to be overoptimistic. It will over-appreciate being in this state although this only happened due to the statistical error (Double DQN solves it)

$$Q(s,a) = V(s) + A(s,a)$$

As @emilyfy said self.target_model.set_weights(self.model.get_weights()) - the updating of the target model.

answered Jan 17 at 3:47

Daniel Chepenko

1313

add a comment |

In particular, DQN is just Q-learning, which uses neural networks as a policy and use "hacks" like experience replay, target networks and reward clipping.

In original paper authors use convolutional network, which takes your image pixels and then fit it into a set of convolutional layers. However there are a couple of statistical problems:

DQN approximate a set of values that are very interrelated (DDQN
solves it)

DQN tend to be overoptimistic. It will over-appreciate being in this state although this only happened due to the statistical error (Double DQN solves it)

$$Q(s,a) = V(s) + A(s,a)$$

As @emilyfy said self.target_model.set_weights(self.model.get_weights()) - the updating of the target model.

answered Jan 17 at 3:47

Daniel Chepenko

1313

add a comment |

In particular, DQN is just Q-learning, which uses neural networks as a policy and use "hacks" like experience replay, target networks and reward clipping.

In original paper authors use convolutional network, which takes your image pixels and then fit it into a set of convolutional layers. However there are a couple of statistical problems:

DQN approximate a set of values that are very interrelated (DDQN
solves it)

DQN tend to be overoptimistic. It will over-appreciate being in this state although this only happened due to the statistical error (Double DQN solves it)

$$Q(s,a) = V(s) + A(s,a)$$

As @emilyfy said self.target_model.set_weights(self.model.get_weights()) - the updating of the target model.

answered Jan 17 at 3:47

Daniel Chepenko

1313

In particular, DQN is just Q-learning, which uses neural networks as a policy and use "hacks" like experience replay, target networks and reward clipping.

In original paper authors use convolutional network, which takes your image pixels and then fit it into a set of convolutional layers. However there are a couple of statistical problems:

DQN approximate a set of values that are very interrelated (DDQN
solves it)

DQN tend to be overoptimistic. It will over-appreciate being in this state although this only happened due to the statistical error (Double DQN solves it)

$$Q(s,a) = V(s) + A(s,a)$$

As @emilyfy said self.target_model.set_weights(self.model.get_weights()) - the updating of the target model.

answered Jan 17 at 3:47

Daniel Chepenko

1313

answered Jan 17 at 3:47

Daniel Chepenko

1313

answered Jan 17 at 3:47

Daniel Chepenko

1313

answered Jan 17 at 3:47

Daniel Chepenko

1313

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Gfyuki