Weights not converging while cost function has converged in neural networks

My cost/loss function drops drastically and approaches 0, which looks a sign of convergence. But the weights are still changing in a visible way, a lot faster than the cost function. Should I ensure the weights converge too?

Some details: I just calculated for 1 epoch. My loss function is mean square difference. I use no optimizer. I tried several experiments and different initial weights but all got converged loss function with ever-changing weights.

edited 13 mins ago

Alexander Gruber

1033

asked Feb 18 at 15:20

feynman

578

This question has an open bounty worth +50
reputation from feynman ending in 4 days.

Looking for an answer drawing from credible and/or official sources.

$begingroup$
What's the order of magnitude of change od weights and loss, respectively? What's the value of learning rate? What's the accuracy on train set and validation set? All these questions could help ...
$endgroup$
– Antonio Jurić
Feb 18 at 15:48

$begingroup$
cost function drops exponentially, while weights change quite fast. if i run it forever, the weights will keep changing while the cost function is constant. im astonished to c the cost function changes so little while weights keep changing
$endgroup$
– feynman
Feb 19 at 3:22

$begingroup$
Your learning rate is probably too large.
$endgroup$
– Antonio Jurić
Feb 19 at 6:20

$begingroup$
if learning rate too big, cost will change quickly, how will weights change so little
$endgroup$
– feynman
Feb 19 at 6:30

$begingroup$
Can you add in some more information such as loss vs epoch plot along with weights vs epoch plot to better look into the situation?
$endgroup$
– karthikeyan mg
yesterday

|
show 1 more comment

edited 13 mins ago

Alexander Gruber

1033

asked Feb 18 at 15:20

feynman

578

This question has an open bounty worth +50
reputation from feynman ending in 4 days.

Looking for an answer drawing from credible and/or official sources.

$begingroup$
What's the order of magnitude of change od weights and loss, respectively? What's the value of learning rate? What's the accuracy on train set and validation set? All these questions could help ...
$endgroup$
– Antonio Jurić
Feb 18 at 15:48

$begingroup$
cost function drops exponentially, while weights change quite fast. if i run it forever, the weights will keep changing while the cost function is constant. im astonished to c the cost function changes so little while weights keep changing
$endgroup$
– feynman
Feb 19 at 3:22

$begingroup$
Your learning rate is probably too large.
$endgroup$
– Antonio Jurić
Feb 19 at 6:20

$begingroup$
if learning rate too big, cost will change quickly, how will weights change so little
$endgroup$
– feynman
Feb 19 at 6:30

$begingroup$
Can you add in some more information such as loss vs epoch plot along with weights vs epoch plot to better look into the situation?
$endgroup$
– karthikeyan mg
yesterday

|
show 1 more comment

edited 13 mins ago

Alexander Gruber

1033

asked Feb 18 at 15:20

feynman

578

neural-network training cost-function convergence

edited 13 mins ago

Alexander Gruber

1033

asked Feb 18 at 15:20

feynman

578

edited 13 mins ago

Alexander Gruber

1033

asked Feb 18 at 15:20

feynman

578

edited 13 mins ago

Alexander Gruber

1033

edited 13 mins ago

Alexander Gruber

1033

edited 13 mins ago

Alexander Gruber

1033

asked Feb 18 at 15:20

feynman

578

asked Feb 18 at 15:20

feynman

578

asked Feb 18 at 15:20

feynman

578

This question has an open bounty worth +50
reputation from feynman ending in 4 days.

Looking for an answer drawing from credible and/or official sources.

This question has an open bounty worth +50
reputation from feynman ending in 4 days.

Looking for an answer drawing from credible and/or official sources.

$begingroup$
What's the order of magnitude of change od weights and loss, respectively? What's the value of learning rate? What's the accuracy on train set and validation set? All these questions could help ...
$endgroup$
– Antonio Jurić
Feb 18 at 15:48

$begingroup$
cost function drops exponentially, while weights change quite fast. if i run it forever, the weights will keep changing while the cost function is constant. im astonished to c the cost function changes so little while weights keep changing
$endgroup$
– feynman
Feb 19 at 3:22

$begingroup$
Your learning rate is probably too large.
$endgroup$
– Antonio Jurić
Feb 19 at 6:20

$begingroup$
if learning rate too big, cost will change quickly, how will weights change so little
$endgroup$
– feynman
Feb 19 at 6:30

$begingroup$
Can you add in some more information such as loss vs epoch plot along with weights vs epoch plot to better look into the situation?
$endgroup$
– karthikeyan mg
yesterday

|
show 1 more comment

$begingroup$
What's the order of magnitude of change od weights and loss, respectively? What's the value of learning rate? What's the accuracy on train set and validation set? All these questions could help ...
$endgroup$
– Antonio Jurić
Feb 18 at 15:48

$begingroup$
cost function drops exponentially, while weights change quite fast. if i run it forever, the weights will keep changing while the cost function is constant. im astonished to c the cost function changes so little while weights keep changing
$endgroup$
– feynman
Feb 19 at 3:22

$begingroup$
Your learning rate is probably too large.
$endgroup$
– Antonio Jurić
Feb 19 at 6:20

$begingroup$
if learning rate too big, cost will change quickly, how will weights change so little
$endgroup$
– feynman
Feb 19 at 6:30

$begingroup$
Can you add in some more information such as loss vs epoch plot along with weights vs epoch plot to better look into the situation?
$endgroup$
– karthikeyan mg
yesterday

What's the order of magnitude of change od weights and loss, respectively? What's the value of learning rate? What's the accuracy on train set and validation set? All these questions could help ...

– Antonio Jurić
Feb 18 at 15:48

cost function drops exponentially, while weights change quite fast. if i run it forever, the weights will keep changing while the cost function is constant. im astonished to c the cost function changes so little while weights keep changing

– feynman
Feb 19 at 3:22

Your learning rate is probably too large.

– Antonio Jurić
Feb 19 at 6:20

if learning rate too big, cost will change quickly, how will weights change so little

– feynman
Feb 19 at 6:30

Can you add in some more information such as loss vs epoch plot along with weights vs epoch plot to better look into the situation?

– karthikeyan mg
yesterday

|
show 1 more comment

3 Answers
3

active

oldest

votes

The weights in a model do not need to converge to stop training.

One possible explanation is that the model error surface has a big, wide valley. If that is the case, the loss function would be low throughout the valley but there would be many weight combinations that would all yield similar performance on the training dataset. Once a model has reached an acceptable loss function value there is no reason to continue training, just take any set of weight values.

answered 2 days ago

Brian Spiering

3,8931028

$begingroup$
'The weights in a model do not need to converge to stop training' why?
$endgroup$
– feynman
2 days ago

add a comment |

Can you provide us with more info? What optimizer do you use and with what parameters, how many epochs and experiments did you run, what is your loss function?...

i just calculated for 1 epoch

This doesn't make any sense for conclussion you wrote in this post.

answered yesterday

sob3kx

113

New contributor

$begingroup$
i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights
$endgroup$
– feynman
23 hours ago

add a comment |

My cost/loss function drops drastically and approaches 0

When you didn't use any optimizer to optimize the loss as you have said, Technically it's not possible for the cost/loss function to drop drastically and approach zero. It's only because of the optimizer that the model works with the objective of reducing cost/error or in simpler terms from gradient descent hill analogy, optimizer finds"descending the hill in what way accounts for the most reduction in error". Your model just stays at the top of the hill forever!!!. The loss is just a number for your model.

Your model's loss stays in point B forever

Since there is no optimizer in your code, It's technically not possible that "cost/loss function drops drastically and approaches 0".Your model's loss stays in point B forever

But the weights are still changing in a visible way, a lot faster than the cost function

enter image description here

The above given are the update equations. Due to the random prediction of your model, At every batch, Some points tend to get predicted as correct class randomly. This accounts for some very small reduction in loss. And this change in loss is updated on the weights using the equation above. And so you may see random changes in weights for each batch. The overall effect of this change is negligible.

I've also made some real examples with mnist data which I computed without optimizer and the results are as follows:

enter image description here

Here you can clearly see the red line(loss) stays on top of the graph forever. I had a batch size of 5 and ran it for 5 epochs

answered 21 hours ago

karthikeyan mg

15010

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45764%2fweights-not-converging-while-cost-function-has-converged-in-neural-networks%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

The weights in a model do not need to converge to stop training.

answered 2 days ago

Brian Spiering

3,8931028

$begingroup$
'The weights in a model do not need to converge to stop training' why?
$endgroup$
– feynman
2 days ago

add a comment |

The weights in a model do not need to converge to stop training.

answered 2 days ago

Brian Spiering

3,8931028

$begingroup$
'The weights in a model do not need to converge to stop training' why?
$endgroup$
– feynman
2 days ago

add a comment |

The weights in a model do not need to converge to stop training.

answered 2 days ago

Brian Spiering

3,8931028

The weights in a model do not need to converge to stop training.

answered 2 days ago

Brian Spiering

3,8931028

answered 2 days ago

Brian Spiering

3,8931028

answered 2 days ago

Brian Spiering

3,8931028

answered 2 days ago

Brian Spiering

3,8931028

$begingroup$
'The weights in a model do not need to converge to stop training' why?
$endgroup$
– feynman
2 days ago

add a comment |

$begingroup$
'The weights in a model do not need to converge to stop training' why?
$endgroup$
– feynman
2 days ago

'The weights in a model do not need to converge to stop training' why?

– feynman
2 days ago

add a comment |

Can you provide us with more info? What optimizer do you use and with what parameters, how many epochs and experiments did you run, what is your loss function?...

i just calculated for 1 epoch

This doesn't make any sense for conclussion you wrote in this post.

answered yesterday

sob3kx

113

New contributor

$begingroup$
i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights
$endgroup$
– feynman
23 hours ago

add a comment |

Can you provide us with more info? What optimizer do you use and with what parameters, how many epochs and experiments did you run, what is your loss function?...

i just calculated for 1 epoch

This doesn't make any sense for conclussion you wrote in this post.

answered yesterday

sob3kx

113

New contributor

$begingroup$
i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights
$endgroup$
– feynman
23 hours ago

add a comment |

Can you provide us with more info? What optimizer do you use and with what parameters, how many epochs and experiments did you run, what is your loss function?...

i just calculated for 1 epoch

This doesn't make any sense for conclussion you wrote in this post.

answered yesterday

sob3kx

113

New contributor

Can you provide us with more info? What optimizer do you use and with what parameters, how many epochs and experiments did you run, what is your loss function?...

i just calculated for 1 epoch

This doesn't make any sense for conclussion you wrote in this post.

answered yesterday

sob3kx

113

New contributor

answered yesterday

sob3kx

113

New contributor

answered yesterday

sob3kx

113

answered yesterday

sob3kx

113

New contributor

sob3kx is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

$begingroup$
i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights
$endgroup$
– feynman
23 hours ago

add a comment |

$begingroup$
i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights
$endgroup$
– feynman
23 hours ago

i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights

– feynman
23 hours ago

add a comment |

My cost/loss function drops drastically and approaches 0

Your model's loss stays in point B forever

Since there is no optimizer in your code, It's technically not possible that "cost/loss function drops drastically and approaches 0".Your model's loss stays in point B forever

But the weights are still changing in a visible way, a lot faster than the cost function

enter image description here

I've also made some real examples with mnist data which I computed without optimizer and the results are as follows:

enter image description here

Here you can clearly see the red line(loss) stays on top of the graph forever. I had a batch size of 5 and ran it for 5 epochs

answered 21 hours ago

karthikeyan mg

15010

add a comment |

My cost/loss function drops drastically and approaches 0

Your model's loss stays in point B forever

Since there is no optimizer in your code, It's technically not possible that "cost/loss function drops drastically and approaches 0".Your model's loss stays in point B forever

But the weights are still changing in a visible way, a lot faster than the cost function

enter image description here

I've also made some real examples with mnist data which I computed without optimizer and the results are as follows:

enter image description here

Here you can clearly see the red line(loss) stays on top of the graph forever. I had a batch size of 5 and ran it for 5 epochs

answered 21 hours ago

karthikeyan mg

15010

add a comment |

My cost/loss function drops drastically and approaches 0

Your model's loss stays in point B forever

Since there is no optimizer in your code, It's technically not possible that "cost/loss function drops drastically and approaches 0".Your model's loss stays in point B forever

But the weights are still changing in a visible way, a lot faster than the cost function

enter image description here

I've also made some real examples with mnist data which I computed without optimizer and the results are as follows:

enter image description here

Here you can clearly see the red line(loss) stays on top of the graph forever. I had a batch size of 5 and ran it for 5 epochs

answered 21 hours ago

karthikeyan mg

15010

My cost/loss function drops drastically and approaches 0

Your model's loss stays in point B forever

Since there is no optimizer in your code, It's technically not possible that "cost/loss function drops drastically and approaches 0".Your model's loss stays in point B forever

But the weights are still changing in a visible way, a lot faster than the cost function

enter image description here

I've also made some real examples with mnist data which I computed without optimizer and the results are as follows:

enter image description here

Here you can clearly see the red line(loss) stays on top of the graph forever. I had a batch size of 5 and ran it for 5 epochs

answered 21 hours ago

karthikeyan mg

15010

answered 21 hours ago

karthikeyan mg

15010

answered 21 hours ago

karthikeyan mg

15010

answered 21 hours ago

karthikeyan mg

15010

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

q hOagCB k3,R,lVNNL2hp91LHLfZLsDNW9XAU8,u,eXNeoUZuaGEdqBTruI,xv67eyU9,ka,SGfHUebmd1aAaKeS,o3BuPFeFgMpWohgtwPU

搜尋此網誌

Gfyuki