Constant Learning Rate for Gradient Decent
$begingroup$
Given, we have a learning rate, $alpha_n$ for the $n^{th}$ step
of the gradient descent process. What would be the impact of using a constant value for $alpha_n$ in gradient descent?
gradient-descent learning-rate
New contributor
$endgroup$
add a comment |
$begingroup$
Given, we have a learning rate, $alpha_n$ for the $n^{th}$ step
of the gradient descent process. What would be the impact of using a constant value for $alpha_n$ in gradient descent?
gradient-descent learning-rate
New contributor
$endgroup$
$begingroup$
Do you mean a constant value of $alpha$ for each step?
$endgroup$
– Wes
4 hours ago
add a comment |
$begingroup$
Given, we have a learning rate, $alpha_n$ for the $n^{th}$ step
of the gradient descent process. What would be the impact of using a constant value for $alpha_n$ in gradient descent?
gradient-descent learning-rate
New contributor
$endgroup$
Given, we have a learning rate, $alpha_n$ for the $n^{th}$ step
of the gradient descent process. What would be the impact of using a constant value for $alpha_n$ in gradient descent?
gradient-descent learning-rate
gradient-descent learning-rate
New contributor
New contributor
New contributor
asked 5 hours ago
UmbrageUmbrage
82
82
New contributor
New contributor
$begingroup$
Do you mean a constant value of $alpha$ for each step?
$endgroup$
– Wes
4 hours ago
add a comment |
$begingroup$
Do you mean a constant value of $alpha$ for each step?
$endgroup$
– Wes
4 hours ago
$begingroup$
Do you mean a constant value of $alpha$ for each step?
$endgroup$
– Wes
4 hours ago
$begingroup$
Do you mean a constant value of $alpha$ for each step?
$endgroup$
– Wes
4 hours ago
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Intuitively, if $alpha$ is too large you may "shoot over" your target and end up bouncing around the search space without converging. If $alpha$ is too small your convergence will be slow and you could end up stuck on a plateau or a local minimum.
That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.
$endgroup$
add a comment |
$begingroup$
Gradient descent has the following rule:
$theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$
Here $theta_{j}$ is a parameter of your model, and $J$ is the cost/loss function. At each step the product $alpha frac{delta}{delta theta_{j}} J(theta)$ gets smaller as we get closer to the gradient $frac{delta}{delta theta_{j}} J(theta)$ converging to 0. $alpha$ can be constant, and in many cases, it is, but varying $alpha$ might help converge faster.
New contributor
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Umbrage is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45408%2fconstant-learning-rate-for-gradient-decent%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Intuitively, if $alpha$ is too large you may "shoot over" your target and end up bouncing around the search space without converging. If $alpha$ is too small your convergence will be slow and you could end up stuck on a plateau or a local minimum.
That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.
$endgroup$
add a comment |
$begingroup$
Intuitively, if $alpha$ is too large you may "shoot over" your target and end up bouncing around the search space without converging. If $alpha$ is too small your convergence will be slow and you could end up stuck on a plateau or a local minimum.
That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.
$endgroup$
add a comment |
$begingroup$
Intuitively, if $alpha$ is too large you may "shoot over" your target and end up bouncing around the search space without converging. If $alpha$ is too small your convergence will be slow and you could end up stuck on a plateau or a local minimum.
That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.
$endgroup$
Intuitively, if $alpha$ is too large you may "shoot over" your target and end up bouncing around the search space without converging. If $alpha$ is too small your convergence will be slow and you could end up stuck on a plateau or a local minimum.
That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.
answered 4 hours ago
oW_oW_
3,046729
3,046729
add a comment |
add a comment |
$begingroup$
Gradient descent has the following rule:
$theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$
Here $theta_{j}$ is a parameter of your model, and $J$ is the cost/loss function. At each step the product $alpha frac{delta}{delta theta_{j}} J(theta)$ gets smaller as we get closer to the gradient $frac{delta}{delta theta_{j}} J(theta)$ converging to 0. $alpha$ can be constant, and in many cases, it is, but varying $alpha$ might help converge faster.
New contributor
$endgroup$
add a comment |
$begingroup$
Gradient descent has the following rule:
$theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$
Here $theta_{j}$ is a parameter of your model, and $J$ is the cost/loss function. At each step the product $alpha frac{delta}{delta theta_{j}} J(theta)$ gets smaller as we get closer to the gradient $frac{delta}{delta theta_{j}} J(theta)$ converging to 0. $alpha$ can be constant, and in many cases, it is, but varying $alpha$ might help converge faster.
New contributor
$endgroup$
add a comment |
$begingroup$
Gradient descent has the following rule:
$theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$
Here $theta_{j}$ is a parameter of your model, and $J$ is the cost/loss function. At each step the product $alpha frac{delta}{delta theta_{j}} J(theta)$ gets smaller as we get closer to the gradient $frac{delta}{delta theta_{j}} J(theta)$ converging to 0. $alpha$ can be constant, and in many cases, it is, but varying $alpha$ might help converge faster.
New contributor
$endgroup$
Gradient descent has the following rule:
$theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$
Here $theta_{j}$ is a parameter of your model, and $J$ is the cost/loss function. At each step the product $alpha frac{delta}{delta theta_{j}} J(theta)$ gets smaller as we get closer to the gradient $frac{delta}{delta theta_{j}} J(theta)$ converging to 0. $alpha$ can be constant, and in many cases, it is, but varying $alpha$ might help converge faster.
New contributor
New contributor
answered 4 hours ago
WesWes
965
965
New contributor
New contributor
add a comment |
add a comment |
Umbrage is a new contributor. Be nice, and check out our Code of Conduct.
Umbrage is a new contributor. Be nice, and check out our Code of Conduct.
Umbrage is a new contributor. Be nice, and check out our Code of Conduct.
Umbrage is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45408%2fconstant-learning-rate-for-gradient-decent%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Do you mean a constant value of $alpha$ for each step?
$endgroup$
– Wes
4 hours ago