Constant Learning Rate for Gradient Decent

Given, we have a learning rate, $alpha_n$ for the $n^{th}$ step
of the gradient descent process. What would be the impact of using a constant value for $alpha_n$ in gradient descent?

asked 5 hours ago

Umbrage

New contributor

$begingroup$
Do you mean a constant value of $alpha$ for each step?
$endgroup$
– Wes
4 hours ago

add a comment |

Given, we have a learning rate, $alpha_n$ for the $n^{th}$ step
of the gradient descent process. What would be the impact of using a constant value for $alpha_n$ in gradient descent?

asked 5 hours ago

Umbrage

New contributor

$begingroup$
Do you mean a constant value of $alpha$ for each step?
$endgroup$
– Wes
4 hours ago

add a comment |

Given, we have a learning rate, $alpha_n$ for the $n^{th}$ step
of the gradient descent process. What would be the impact of using a constant value for $alpha_n$ in gradient descent?

asked 5 hours ago

Umbrage

New contributor

Given, we have a learning rate, $alpha_n$ for the $n^{th}$ step
of the gradient descent process. What would be the impact of using a constant value for $alpha_n$ in gradient descent?

gradient-descent learning-rate

asked 5 hours ago

Umbrage

New contributor

asked 5 hours ago

Umbrage

New contributor

asked 5 hours ago

Umbrage

New contributor

asked 5 hours ago

Umbrage

asked 5 hours ago

Umbrage

New contributor

Umbrage is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

$begingroup$
Do you mean a constant value of $alpha$ for each step?
$endgroup$
– Wes
4 hours ago

add a comment |

$begingroup$
Do you mean a constant value of $alpha$ for each step?
$endgroup$
– Wes
4 hours ago

Do you mean a constant value of $alpha$ for each step?

– Wes
4 hours ago

add a comment |

2 Answers
2

active

oldest

votes

Intuitively, if $alpha$ is too large you may "shoot over" your target and end up bouncing around the search space without converging. If $alpha$ is too small your convergence will be slow and you could end up stuck on a plateau or a local minimum.

That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.

answered 4 hours ago

oW_

3,046729

add a comment |

Gradient descent has the following rule:

$theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$

Here $theta_{j}$ is a parameter of your model, and $J$ is the cost/loss function. At each step the product $alpha frac{delta}{delta theta_{j}} J(theta)$ gets smaller as we get closer to the gradient $frac{delta}{delta theta_{j}} J(theta)$ converging to 0. $alpha$ can be constant, and in many cases, it is, but varying $alpha$ might help converge faster.

answered 4 hours ago

Wes

965

New contributor

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Umbrage is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45408%2fconstant-learning-rate-for-gradient-decent%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.

answered 4 hours ago

oW_

3,046729

add a comment |

That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.

answered 4 hours ago

oW_

3,046729

add a comment |

That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.

answered 4 hours ago

oW_

3,046729

That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.

answered 4 hours ago

oW_

3,046729

answered 4 hours ago

oW_

3,046729

answered 4 hours ago

oW_

3,046729

answered 4 hours ago

oW_

3,046729

add a comment |

Gradient descent has the following rule:

$theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$

answered 4 hours ago

Wes

965

New contributor

add a comment |

Gradient descent has the following rule:

$theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$

answered 4 hours ago

Wes

965

New contributor

add a comment |

Gradient descent has the following rule:

$theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$

answered 4 hours ago

Wes

965

New contributor

Gradient descent has the following rule:

$theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$

answered 4 hours ago

Wes

965

New contributor

answered 4 hours ago

Wes

965

New contributor

answered 4 hours ago

Wes

965

answered 4 hours ago

Wes

965

New contributor

Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

Umbrage is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Umbrage is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Tv3s 3x0UUFjgXv0VT7ErwLiUci6AYtrGo6uVGktfqYgI 7F,AwQqY1e mG wP4 MD68MUK

搜尋此網誌

Gfyuki