Constant Learning Rate for Gradient Decent












1












$begingroup$


Given, we have a learning rate, $alpha_n$ for the $n^{th}$ step
of the gradient descent process. What would be the impact of using a constant value for $alpha_n$ in gradient descent?










share|improve this question







New contributor




Umbrage is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    Do you mean a constant value of $alpha$ for each step?
    $endgroup$
    – Wes
    4 hours ago
















1












$begingroup$


Given, we have a learning rate, $alpha_n$ for the $n^{th}$ step
of the gradient descent process. What would be the impact of using a constant value for $alpha_n$ in gradient descent?










share|improve this question







New contributor




Umbrage is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    Do you mean a constant value of $alpha$ for each step?
    $endgroup$
    – Wes
    4 hours ago














1












1








1





$begingroup$


Given, we have a learning rate, $alpha_n$ for the $n^{th}$ step
of the gradient descent process. What would be the impact of using a constant value for $alpha_n$ in gradient descent?










share|improve this question







New contributor




Umbrage is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




Given, we have a learning rate, $alpha_n$ for the $n^{th}$ step
of the gradient descent process. What would be the impact of using a constant value for $alpha_n$ in gradient descent?







gradient-descent learning-rate






share|improve this question







New contributor




Umbrage is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question







New contributor




Umbrage is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question






New contributor




Umbrage is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 5 hours ago









UmbrageUmbrage

82




82




New contributor




Umbrage is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Umbrage is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Umbrage is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • $begingroup$
    Do you mean a constant value of $alpha$ for each step?
    $endgroup$
    – Wes
    4 hours ago


















  • $begingroup$
    Do you mean a constant value of $alpha$ for each step?
    $endgroup$
    – Wes
    4 hours ago
















$begingroup$
Do you mean a constant value of $alpha$ for each step?
$endgroup$
– Wes
4 hours ago




$begingroup$
Do you mean a constant value of $alpha$ for each step?
$endgroup$
– Wes
4 hours ago










2 Answers
2






active

oldest

votes


















0












$begingroup$

Intuitively, if $alpha$ is too large you may "shoot over" your target and end up bouncing around the search space without converging. If $alpha$ is too small your convergence will be slow and you could end up stuck on a plateau or a local minimum.



That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.






share|improve this answer









$endgroup$





















    0












    $begingroup$

    Gradient descent has the following rule:



    $theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$



    Here $theta_{j}$ is a parameter of your model, and $J$ is the cost/loss function. At each step the product $alpha frac{delta}{delta theta_{j}} J(theta)$ gets smaller as we get closer to the gradient $frac{delta}{delta theta_{j}} J(theta)$ converging to 0. $alpha$ can be constant, and in many cases, it is, but varying $alpha$ might help converge faster.






    share|improve this answer








    New contributor




    Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$













      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "557"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });






      Umbrage is a new contributor. Be nice, and check out our Code of Conduct.










      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45408%2fconstant-learning-rate-for-gradient-decent%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0












      $begingroup$

      Intuitively, if $alpha$ is too large you may "shoot over" your target and end up bouncing around the search space without converging. If $alpha$ is too small your convergence will be slow and you could end up stuck on a plateau or a local minimum.



      That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.






      share|improve this answer









      $endgroup$


















        0












        $begingroup$

        Intuitively, if $alpha$ is too large you may "shoot over" your target and end up bouncing around the search space without converging. If $alpha$ is too small your convergence will be slow and you could end up stuck on a plateau or a local minimum.



        That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.






        share|improve this answer









        $endgroup$
















          0












          0








          0





          $begingroup$

          Intuitively, if $alpha$ is too large you may "shoot over" your target and end up bouncing around the search space without converging. If $alpha$ is too small your convergence will be slow and you could end up stuck on a plateau or a local minimum.



          That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.






          share|improve this answer









          $endgroup$



          Intuitively, if $alpha$ is too large you may "shoot over" your target and end up bouncing around the search space without converging. If $alpha$ is too small your convergence will be slow and you could end up stuck on a plateau or a local minimum.



          That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 4 hours ago









          oW_oW_

          3,046729




          3,046729























              0












              $begingroup$

              Gradient descent has the following rule:



              $theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$



              Here $theta_{j}$ is a parameter of your model, and $J$ is the cost/loss function. At each step the product $alpha frac{delta}{delta theta_{j}} J(theta)$ gets smaller as we get closer to the gradient $frac{delta}{delta theta_{j}} J(theta)$ converging to 0. $alpha$ can be constant, and in many cases, it is, but varying $alpha$ might help converge faster.






              share|improve this answer








              New contributor




              Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              $endgroup$


















                0












                $begingroup$

                Gradient descent has the following rule:



                $theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$



                Here $theta_{j}$ is a parameter of your model, and $J$ is the cost/loss function. At each step the product $alpha frac{delta}{delta theta_{j}} J(theta)$ gets smaller as we get closer to the gradient $frac{delta}{delta theta_{j}} J(theta)$ converging to 0. $alpha$ can be constant, and in many cases, it is, but varying $alpha$ might help converge faster.






                share|improve this answer








                New contributor




                Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                $endgroup$
















                  0












                  0








                  0





                  $begingroup$

                  Gradient descent has the following rule:



                  $theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$



                  Here $theta_{j}$ is a parameter of your model, and $J$ is the cost/loss function. At each step the product $alpha frac{delta}{delta theta_{j}} J(theta)$ gets smaller as we get closer to the gradient $frac{delta}{delta theta_{j}} J(theta)$ converging to 0. $alpha$ can be constant, and in many cases, it is, but varying $alpha$ might help converge faster.






                  share|improve this answer








                  New contributor




                  Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  $endgroup$



                  Gradient descent has the following rule:



                  $theta_{j} := theta_{j} - alpha frac{delta}{delta theta_{j}} J(theta)$



                  Here $theta_{j}$ is a parameter of your model, and $J$ is the cost/loss function. At each step the product $alpha frac{delta}{delta theta_{j}} J(theta)$ gets smaller as we get closer to the gradient $frac{delta}{delta theta_{j}} J(theta)$ converging to 0. $alpha$ can be constant, and in many cases, it is, but varying $alpha$ might help converge faster.







                  share|improve this answer








                  New contributor




                  Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  share|improve this answer



                  share|improve this answer






                  New contributor




                  Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  answered 4 hours ago









                  WesWes

                  965




                  965




                  New contributor




                  Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.





                  New contributor





                  Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






















                      Umbrage is a new contributor. Be nice, and check out our Code of Conduct.










                      draft saved

                      draft discarded


















                      Umbrage is a new contributor. Be nice, and check out our Code of Conduct.













                      Umbrage is a new contributor. Be nice, and check out our Code of Conduct.












                      Umbrage is a new contributor. Be nice, and check out our Code of Conduct.
















                      Thanks for contributing an answer to Data Science Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45408%2fconstant-learning-rate-for-gradient-decent%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Ponta tanko

                      Tantalo (mitologio)

                      Erzsébet Schaár