Weights not converging while cost function has converged in neural networks












2












$begingroup$


My cost/loss function drops drastically and approaches 0, which looks a sign of convergence. But the weights are still changing in a visible way, a lot faster than the cost function. Should I ensure the weights converge too?



Some details: I just calculated for 1 epoch. My loss function is mean square difference. I use no optimizer. I tried several experiments and different initial weights but all got converged loss function with ever-changing weights.










share|improve this question











$endgroup$





This question has an open bounty worth +50
reputation from feynman ending in 4 days.


Looking for an answer drawing from credible and/or official sources.
















  • $begingroup$
    What's the order of magnitude of change od weights and loss, respectively? What's the value of learning rate? What's the accuracy on train set and validation set? All these questions could help ...
    $endgroup$
    – Antonio Jurić
    Feb 18 at 15:48










  • $begingroup$
    cost function drops exponentially, while weights change quite fast. if i run it forever, the weights will keep changing while the cost function is constant. im astonished to c the cost function changes so little while weights keep changing
    $endgroup$
    – feynman
    Feb 19 at 3:22












  • $begingroup$
    Your learning rate is probably too large.
    $endgroup$
    – Antonio Jurić
    Feb 19 at 6:20










  • $begingroup$
    if learning rate too big, cost will change quickly, how will weights change so little
    $endgroup$
    – feynman
    Feb 19 at 6:30










  • $begingroup$
    Can you add in some more information such as loss vs epoch plot along with weights vs epoch plot to better look into the situation?
    $endgroup$
    – karthikeyan mg
    yesterday


















2












$begingroup$


My cost/loss function drops drastically and approaches 0, which looks a sign of convergence. But the weights are still changing in a visible way, a lot faster than the cost function. Should I ensure the weights converge too?



Some details: I just calculated for 1 epoch. My loss function is mean square difference. I use no optimizer. I tried several experiments and different initial weights but all got converged loss function with ever-changing weights.










share|improve this question











$endgroup$





This question has an open bounty worth +50
reputation from feynman ending in 4 days.


Looking for an answer drawing from credible and/or official sources.
















  • $begingroup$
    What's the order of magnitude of change od weights and loss, respectively? What's the value of learning rate? What's the accuracy on train set and validation set? All these questions could help ...
    $endgroup$
    – Antonio Jurić
    Feb 18 at 15:48










  • $begingroup$
    cost function drops exponentially, while weights change quite fast. if i run it forever, the weights will keep changing while the cost function is constant. im astonished to c the cost function changes so little while weights keep changing
    $endgroup$
    – feynman
    Feb 19 at 3:22












  • $begingroup$
    Your learning rate is probably too large.
    $endgroup$
    – Antonio Jurić
    Feb 19 at 6:20










  • $begingroup$
    if learning rate too big, cost will change quickly, how will weights change so little
    $endgroup$
    – feynman
    Feb 19 at 6:30










  • $begingroup$
    Can you add in some more information such as loss vs epoch plot along with weights vs epoch plot to better look into the situation?
    $endgroup$
    – karthikeyan mg
    yesterday
















2












2








2





$begingroup$


My cost/loss function drops drastically and approaches 0, which looks a sign of convergence. But the weights are still changing in a visible way, a lot faster than the cost function. Should I ensure the weights converge too?



Some details: I just calculated for 1 epoch. My loss function is mean square difference. I use no optimizer. I tried several experiments and different initial weights but all got converged loss function with ever-changing weights.










share|improve this question











$endgroup$




My cost/loss function drops drastically and approaches 0, which looks a sign of convergence. But the weights are still changing in a visible way, a lot faster than the cost function. Should I ensure the weights converge too?



Some details: I just calculated for 1 epoch. My loss function is mean square difference. I use no optimizer. I tried several experiments and different initial weights but all got converged loss function with ever-changing weights.







neural-network training cost-function convergence






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 13 mins ago









Alexander Gruber

1033




1033










asked Feb 18 at 15:20









feynmanfeynman

578




578






This question has an open bounty worth +50
reputation from feynman ending in 4 days.


Looking for an answer drawing from credible and/or official sources.








This question has an open bounty worth +50
reputation from feynman ending in 4 days.


Looking for an answer drawing from credible and/or official sources.














  • $begingroup$
    What's the order of magnitude of change od weights and loss, respectively? What's the value of learning rate? What's the accuracy on train set and validation set? All these questions could help ...
    $endgroup$
    – Antonio Jurić
    Feb 18 at 15:48










  • $begingroup$
    cost function drops exponentially, while weights change quite fast. if i run it forever, the weights will keep changing while the cost function is constant. im astonished to c the cost function changes so little while weights keep changing
    $endgroup$
    – feynman
    Feb 19 at 3:22












  • $begingroup$
    Your learning rate is probably too large.
    $endgroup$
    – Antonio Jurić
    Feb 19 at 6:20










  • $begingroup$
    if learning rate too big, cost will change quickly, how will weights change so little
    $endgroup$
    – feynman
    Feb 19 at 6:30










  • $begingroup$
    Can you add in some more information such as loss vs epoch plot along with weights vs epoch plot to better look into the situation?
    $endgroup$
    – karthikeyan mg
    yesterday




















  • $begingroup$
    What's the order of magnitude of change od weights and loss, respectively? What's the value of learning rate? What's the accuracy on train set and validation set? All these questions could help ...
    $endgroup$
    – Antonio Jurić
    Feb 18 at 15:48










  • $begingroup$
    cost function drops exponentially, while weights change quite fast. if i run it forever, the weights will keep changing while the cost function is constant. im astonished to c the cost function changes so little while weights keep changing
    $endgroup$
    – feynman
    Feb 19 at 3:22












  • $begingroup$
    Your learning rate is probably too large.
    $endgroup$
    – Antonio Jurić
    Feb 19 at 6:20










  • $begingroup$
    if learning rate too big, cost will change quickly, how will weights change so little
    $endgroup$
    – feynman
    Feb 19 at 6:30










  • $begingroup$
    Can you add in some more information such as loss vs epoch plot along with weights vs epoch plot to better look into the situation?
    $endgroup$
    – karthikeyan mg
    yesterday


















$begingroup$
What's the order of magnitude of change od weights and loss, respectively? What's the value of learning rate? What's the accuracy on train set and validation set? All these questions could help ...
$endgroup$
– Antonio Jurić
Feb 18 at 15:48




$begingroup$
What's the order of magnitude of change od weights and loss, respectively? What's the value of learning rate? What's the accuracy on train set and validation set? All these questions could help ...
$endgroup$
– Antonio Jurić
Feb 18 at 15:48












$begingroup$
cost function drops exponentially, while weights change quite fast. if i run it forever, the weights will keep changing while the cost function is constant. im astonished to c the cost function changes so little while weights keep changing
$endgroup$
– feynman
Feb 19 at 3:22






$begingroup$
cost function drops exponentially, while weights change quite fast. if i run it forever, the weights will keep changing while the cost function is constant. im astonished to c the cost function changes so little while weights keep changing
$endgroup$
– feynman
Feb 19 at 3:22














$begingroup$
Your learning rate is probably too large.
$endgroup$
– Antonio Jurić
Feb 19 at 6:20




$begingroup$
Your learning rate is probably too large.
$endgroup$
– Antonio Jurić
Feb 19 at 6:20












$begingroup$
if learning rate too big, cost will change quickly, how will weights change so little
$endgroup$
– feynman
Feb 19 at 6:30




$begingroup$
if learning rate too big, cost will change quickly, how will weights change so little
$endgroup$
– feynman
Feb 19 at 6:30












$begingroup$
Can you add in some more information such as loss vs epoch plot along with weights vs epoch plot to better look into the situation?
$endgroup$
– karthikeyan mg
yesterday






$begingroup$
Can you add in some more information such as loss vs epoch plot along with weights vs epoch plot to better look into the situation?
$endgroup$
– karthikeyan mg
yesterday












3 Answers
3






active

oldest

votes


















2












$begingroup$

The weights in a model do not need to converge to stop training.



One possible explanation is that the model error surface has a big, wide valley. If that is the case, the loss function would be low throughout the valley but there would be many weight combinations that would all yield similar performance on the training dataset. Once a model has reached an acceptable loss function value there is no reason to continue training, just take any set of weight values.






share|improve this answer









$endgroup$













  • $begingroup$
    'The weights in a model do not need to converge to stop training' why?
    $endgroup$
    – feynman
    2 days ago



















0












$begingroup$

Can you provide us with more info? What optimizer do you use and with what parameters, how many epochs and experiments did you run, what is your loss function?...




i just calculated for 1 epoch




This doesn't make any sense for conclussion you wrote in this post.






share|improve this answer








New contributor




sob3kx is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$













  • $begingroup$
    i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights
    $endgroup$
    – feynman
    23 hours ago



















0












$begingroup$


My cost/loss function drops drastically and approaches 0




When you didn't use any optimizer to optimize the loss as you have said, Technically it's not possible for the cost/loss function to drop drastically and approach zero. It's only because of the optimizer that the model works with the objective of reducing cost/error or in simpler terms from gradient descent hill analogy, optimizer finds"descending the hill in what way accounts for the most reduction in error". Your model just stays at the top of the hill forever!!!. The loss is just a number for your model.



Your model's loss stays in point B forever



Since there is no optimizer in your code, It's technically not possible that "cost/loss function drops drastically and approaches 0".Your model's loss stays in point B forever




But the weights are still changing in a visible way, a lot faster than the cost function




enter image description here



The above given are the update equations. Due to the random prediction of your model, At every batch, Some points tend to get predicted as correct class randomly. This accounts for some very small reduction in loss. And this change in loss is updated on the weights using the equation above. And so you may see random changes in weights for each batch. The overall effect of this change is negligible.



I've also made some real examples with mnist data which I computed without optimizer and the results are as follows:



enter image description here



Here you can clearly see the red line(loss) stays on top of the graph forever. I had a batch size of 5 and ran it for 5 epochs






share|improve this answer









$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "557"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45764%2fweights-not-converging-while-cost-function-has-converged-in-neural-networks%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2












    $begingroup$

    The weights in a model do not need to converge to stop training.



    One possible explanation is that the model error surface has a big, wide valley. If that is the case, the loss function would be low throughout the valley but there would be many weight combinations that would all yield similar performance on the training dataset. Once a model has reached an acceptable loss function value there is no reason to continue training, just take any set of weight values.






    share|improve this answer









    $endgroup$













    • $begingroup$
      'The weights in a model do not need to converge to stop training' why?
      $endgroup$
      – feynman
      2 days ago
















    2












    $begingroup$

    The weights in a model do not need to converge to stop training.



    One possible explanation is that the model error surface has a big, wide valley. If that is the case, the loss function would be low throughout the valley but there would be many weight combinations that would all yield similar performance on the training dataset. Once a model has reached an acceptable loss function value there is no reason to continue training, just take any set of weight values.






    share|improve this answer









    $endgroup$













    • $begingroup$
      'The weights in a model do not need to converge to stop training' why?
      $endgroup$
      – feynman
      2 days ago














    2












    2








    2





    $begingroup$

    The weights in a model do not need to converge to stop training.



    One possible explanation is that the model error surface has a big, wide valley. If that is the case, the loss function would be low throughout the valley but there would be many weight combinations that would all yield similar performance on the training dataset. Once a model has reached an acceptable loss function value there is no reason to continue training, just take any set of weight values.






    share|improve this answer









    $endgroup$



    The weights in a model do not need to converge to stop training.



    One possible explanation is that the model error surface has a big, wide valley. If that is the case, the loss function would be low throughout the valley but there would be many weight combinations that would all yield similar performance on the training dataset. Once a model has reached an acceptable loss function value there is no reason to continue training, just take any set of weight values.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered 2 days ago









    Brian SpieringBrian Spiering

    3,8931028




    3,8931028












    • $begingroup$
      'The weights in a model do not need to converge to stop training' why?
      $endgroup$
      – feynman
      2 days ago


















    • $begingroup$
      'The weights in a model do not need to converge to stop training' why?
      $endgroup$
      – feynman
      2 days ago
















    $begingroup$
    'The weights in a model do not need to converge to stop training' why?
    $endgroup$
    – feynman
    2 days ago




    $begingroup$
    'The weights in a model do not need to converge to stop training' why?
    $endgroup$
    – feynman
    2 days ago











    0












    $begingroup$

    Can you provide us with more info? What optimizer do you use and with what parameters, how many epochs and experiments did you run, what is your loss function?...




    i just calculated for 1 epoch




    This doesn't make any sense for conclussion you wrote in this post.






    share|improve this answer








    New contributor




    sob3kx is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$













    • $begingroup$
      i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights
      $endgroup$
      – feynman
      23 hours ago
















    0












    $begingroup$

    Can you provide us with more info? What optimizer do you use and with what parameters, how many epochs and experiments did you run, what is your loss function?...




    i just calculated for 1 epoch




    This doesn't make any sense for conclussion you wrote in this post.






    share|improve this answer








    New contributor




    sob3kx is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$













    • $begingroup$
      i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights
      $endgroup$
      – feynman
      23 hours ago














    0












    0








    0





    $begingroup$

    Can you provide us with more info? What optimizer do you use and with what parameters, how many epochs and experiments did you run, what is your loss function?...




    i just calculated for 1 epoch




    This doesn't make any sense for conclussion you wrote in this post.






    share|improve this answer








    New contributor




    sob3kx is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$



    Can you provide us with more info? What optimizer do you use and with what parameters, how many epochs and experiments did you run, what is your loss function?...




    i just calculated for 1 epoch




    This doesn't make any sense for conclussion you wrote in this post.







    share|improve this answer








    New contributor




    sob3kx is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.









    share|improve this answer



    share|improve this answer






    New contributor




    sob3kx is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.









    answered yesterday









    sob3kxsob3kx

    113




    113




    New contributor




    sob3kx is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.





    New contributor





    sob3kx is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    sob3kx is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.












    • $begingroup$
      i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights
      $endgroup$
      – feynman
      23 hours ago


















    • $begingroup$
      i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights
      $endgroup$
      – feynman
      23 hours ago
















    $begingroup$
    i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights
    $endgroup$
    – feynman
    23 hours ago




    $begingroup$
    i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights
    $endgroup$
    – feynman
    23 hours ago











    0












    $begingroup$


    My cost/loss function drops drastically and approaches 0




    When you didn't use any optimizer to optimize the loss as you have said, Technically it's not possible for the cost/loss function to drop drastically and approach zero. It's only because of the optimizer that the model works with the objective of reducing cost/error or in simpler terms from gradient descent hill analogy, optimizer finds"descending the hill in what way accounts for the most reduction in error". Your model just stays at the top of the hill forever!!!. The loss is just a number for your model.



    Your model's loss stays in point B forever



    Since there is no optimizer in your code, It's technically not possible that "cost/loss function drops drastically and approaches 0".Your model's loss stays in point B forever




    But the weights are still changing in a visible way, a lot faster than the cost function




    enter image description here



    The above given are the update equations. Due to the random prediction of your model, At every batch, Some points tend to get predicted as correct class randomly. This accounts for some very small reduction in loss. And this change in loss is updated on the weights using the equation above. And so you may see random changes in weights for each batch. The overall effect of this change is negligible.



    I've also made some real examples with mnist data which I computed without optimizer and the results are as follows:



    enter image description here



    Here you can clearly see the red line(loss) stays on top of the graph forever. I had a batch size of 5 and ran it for 5 epochs






    share|improve this answer









    $endgroup$


















      0












      $begingroup$


      My cost/loss function drops drastically and approaches 0




      When you didn't use any optimizer to optimize the loss as you have said, Technically it's not possible for the cost/loss function to drop drastically and approach zero. It's only because of the optimizer that the model works with the objective of reducing cost/error or in simpler terms from gradient descent hill analogy, optimizer finds"descending the hill in what way accounts for the most reduction in error". Your model just stays at the top of the hill forever!!!. The loss is just a number for your model.



      Your model's loss stays in point B forever



      Since there is no optimizer in your code, It's technically not possible that "cost/loss function drops drastically and approaches 0".Your model's loss stays in point B forever




      But the weights are still changing in a visible way, a lot faster than the cost function




      enter image description here



      The above given are the update equations. Due to the random prediction of your model, At every batch, Some points tend to get predicted as correct class randomly. This accounts for some very small reduction in loss. And this change in loss is updated on the weights using the equation above. And so you may see random changes in weights for each batch. The overall effect of this change is negligible.



      I've also made some real examples with mnist data which I computed without optimizer and the results are as follows:



      enter image description here



      Here you can clearly see the red line(loss) stays on top of the graph forever. I had a batch size of 5 and ran it for 5 epochs






      share|improve this answer









      $endgroup$
















        0












        0








        0





        $begingroup$


        My cost/loss function drops drastically and approaches 0




        When you didn't use any optimizer to optimize the loss as you have said, Technically it's not possible for the cost/loss function to drop drastically and approach zero. It's only because of the optimizer that the model works with the objective of reducing cost/error or in simpler terms from gradient descent hill analogy, optimizer finds"descending the hill in what way accounts for the most reduction in error". Your model just stays at the top of the hill forever!!!. The loss is just a number for your model.



        Your model's loss stays in point B forever



        Since there is no optimizer in your code, It's technically not possible that "cost/loss function drops drastically and approaches 0".Your model's loss stays in point B forever




        But the weights are still changing in a visible way, a lot faster than the cost function




        enter image description here



        The above given are the update equations. Due to the random prediction of your model, At every batch, Some points tend to get predicted as correct class randomly. This accounts for some very small reduction in loss. And this change in loss is updated on the weights using the equation above. And so you may see random changes in weights for each batch. The overall effect of this change is negligible.



        I've also made some real examples with mnist data which I computed without optimizer and the results are as follows:



        enter image description here



        Here you can clearly see the red line(loss) stays on top of the graph forever. I had a batch size of 5 and ran it for 5 epochs






        share|improve this answer









        $endgroup$




        My cost/loss function drops drastically and approaches 0




        When you didn't use any optimizer to optimize the loss as you have said, Technically it's not possible for the cost/loss function to drop drastically and approach zero. It's only because of the optimizer that the model works with the objective of reducing cost/error or in simpler terms from gradient descent hill analogy, optimizer finds"descending the hill in what way accounts for the most reduction in error". Your model just stays at the top of the hill forever!!!. The loss is just a number for your model.



        Your model's loss stays in point B forever



        Since there is no optimizer in your code, It's technically not possible that "cost/loss function drops drastically and approaches 0".Your model's loss stays in point B forever




        But the weights are still changing in a visible way, a lot faster than the cost function




        enter image description here



        The above given are the update equations. Due to the random prediction of your model, At every batch, Some points tend to get predicted as correct class randomly. This accounts for some very small reduction in loss. And this change in loss is updated on the weights using the equation above. And so you may see random changes in weights for each batch. The overall effect of this change is negligible.



        I've also made some real examples with mnist data which I computed without optimizer and the results are as follows:



        enter image description here



        Here you can clearly see the red line(loss) stays on top of the graph forever. I had a batch size of 5 and ran it for 5 epochs







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 21 hours ago









        karthikeyan mgkarthikeyan mg

        15010




        15010






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45764%2fweights-not-converging-while-cost-function-has-converged-in-neural-networks%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Ponta tanko

            Tantalo (mitologio)

            Erzsébet Schaár