Constant Learning Rate for Gradient Decent

Multi tool use
$begingroup$
Given, we have a learning rate, $alpha_n$ for the $n^{th}$ step
of the gradient descent process. What would be the impact of using a constant value for $alpha_n$ in gradient descent?
gradient-descent learning-rate
New contributor
Umbrage is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
Given, we have a learning rate, $alpha_n$ for the $n^{th}$ step
of the gradient descent process. What would be the impact of using a constant value for $alpha_n$ in gradient descent?
gradient-descent learning-rate
New contributor
Umbrage is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
$begingroup$
Do you mean a constant value of $alpha$ for each step?
$endgroup$
– Wes
4 hours ago
add a comment |
$begingroup$
Given, we have a learning rate, $alpha_n$ for the $n^{th}$ step
of the gradient descent process. What would be the impact of using a constant value for $alpha_n$ in gradient descent?
gradient-descent learning-rate
New contributor
Umbrage is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
Given, we have a learning rate, $alpha_n$ for the $n^{th}$ step
of the gradient descent process. What would be the impact of using a constant value for $alpha_n$ in gradient descent?
gradient-descent learning-rate
gradient-descent learning-rate
New contributor
Umbrage is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Umbrage is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Umbrage is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 5 hours ago
UmbrageUmbrage
82
82
New contributor
Umbrage is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Umbrage is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Umbrage is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$begingroup$
Do you mean a constant value of $alpha$ for each step?
$endgroup$
– Wes
4 hours ago
add a comment |
$begingroup$
Do you mean a constant value of $alpha$ for each step?
$endgroup$
– Wes
4 hours ago
$begingroup$
Do you mean a constant value of $alpha$ for each step?
$endgroup$
– Wes
4 hours ago
$begingroup$
Do you mean a constant value of $alpha$ for each step?
$endgroup$
– Wes
4 hours ago
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Intuitively, if $alpha$ is too large you may "shoot over" your target and end up bouncing around the search space without converging. If $alpha$ is too small your convergence will be slow and you could end up stuck on a plateau or a local minimum.
That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.
$endgroup$
add a comment |
$begingroup$
Gradient descent has the following rule:
$theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$
Here $theta_{j}$ is a parameter of your model, and $J$ is the cost/loss function. At each step the product $alpha frac{delta}{delta theta_{j}} J(theta)$ gets smaller as we get closer to the gradient $frac{delta}{delta theta_{j}} J(theta)$ converging to 0. $alpha$ can be constant, and in many cases, it is, but varying $alpha$ might help converge faster.
New contributor
Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Umbrage is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45408%2fconstant-learning-rate-for-gradient-decent%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Intuitively, if $alpha$ is too large you may "shoot over" your target and end up bouncing around the search space without converging. If $alpha$ is too small your convergence will be slow and you could end up stuck on a plateau or a local minimum.
That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.
$endgroup$
add a comment |
$begingroup$
Intuitively, if $alpha$ is too large you may "shoot over" your target and end up bouncing around the search space without converging. If $alpha$ is too small your convergence will be slow and you could end up stuck on a plateau or a local minimum.
That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.
$endgroup$
add a comment |
$begingroup$
Intuitively, if $alpha$ is too large you may "shoot over" your target and end up bouncing around the search space without converging. If $alpha$ is too small your convergence will be slow and you could end up stuck on a plateau or a local minimum.
That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.
$endgroup$
Intuitively, if $alpha$ is too large you may "shoot over" your target and end up bouncing around the search space without converging. If $alpha$ is too small your convergence will be slow and you could end up stuck on a plateau or a local minimum.
That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.
answered 4 hours ago
oW_oW_
3,046729
3,046729
add a comment |
add a comment |
$begingroup$
Gradient descent has the following rule:
$theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$
Here $theta_{j}$ is a parameter of your model, and $J$ is the cost/loss function. At each step the product $alpha frac{delta}{delta theta_{j}} J(theta)$ gets smaller as we get closer to the gradient $frac{delta}{delta theta_{j}} J(theta)$ converging to 0. $alpha$ can be constant, and in many cases, it is, but varying $alpha$ might help converge faster.
New contributor
Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
Gradient descent has the following rule:
$theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$
Here $theta_{j}$ is a parameter of your model, and $J$ is the cost/loss function. At each step the product $alpha frac{delta}{delta theta_{j}} J(theta)$ gets smaller as we get closer to the gradient $frac{delta}{delta theta_{j}} J(theta)$ converging to 0. $alpha$ can be constant, and in many cases, it is, but varying $alpha$ might help converge faster.
New contributor
Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
Gradient descent has the following rule:
$theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$
Here $theta_{j}$ is a parameter of your model, and $J$ is the cost/loss function. At each step the product $alpha frac{delta}{delta theta_{j}} J(theta)$ gets smaller as we get closer to the gradient $frac{delta}{delta theta_{j}} J(theta)$ converging to 0. $alpha$ can be constant, and in many cases, it is, but varying $alpha$ might help converge faster.
New contributor
Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
Gradient descent has the following rule:
$theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$
Here $theta_{j}$ is a parameter of your model, and $J$ is the cost/loss function. At each step the product $alpha frac{delta}{delta theta_{j}} J(theta)$ gets smaller as we get closer to the gradient $frac{delta}{delta theta_{j}} J(theta)$ converging to 0. $alpha$ can be constant, and in many cases, it is, but varying $alpha$ might help converge faster.
New contributor
Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
answered 4 hours ago
WesWes
965
965
New contributor
Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
add a comment |
Umbrage is a new contributor. Be nice, and check out our Code of Conduct.
Umbrage is a new contributor. Be nice, and check out our Code of Conduct.
Umbrage is a new contributor. Be nice, and check out our Code of Conduct.
Umbrage is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45408%2fconstant-learning-rate-for-gradient-decent%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Tv3s 3x0UUFjgXv0VT7ErwLiUci6AYtrGo6uVGktfqYgI 7F,AwQqY1e mG wP4 MD68MUK
$begingroup$
Do you mean a constant value of $alpha$ for each step?
$endgroup$
– Wes
4 hours ago