Weights not converging while cost function has converged in neural networks
$begingroup$
My cost/loss function drops drastically and approaches 0, which looks a sign of convergence. But the weights are still changing in a visible way, a lot faster than the cost function. Should I ensure the weights converge too?
Some details: I just calculated for 1 epoch. My loss function is mean square difference. I use no optimizer. I tried several experiments and different initial weights but all got converged loss function with ever-changing weights.
neural-network training cost-function convergence
$endgroup$
This question has an open bounty worth +50
reputation from feynman ending in 4 days.
Looking for an answer drawing from credible and/or official sources.
|
show 1 more comment
$begingroup$
My cost/loss function drops drastically and approaches 0, which looks a sign of convergence. But the weights are still changing in a visible way, a lot faster than the cost function. Should I ensure the weights converge too?
Some details: I just calculated for 1 epoch. My loss function is mean square difference. I use no optimizer. I tried several experiments and different initial weights but all got converged loss function with ever-changing weights.
neural-network training cost-function convergence
$endgroup$
This question has an open bounty worth +50
reputation from feynman ending in 4 days.
Looking for an answer drawing from credible and/or official sources.
$begingroup$
What's the order of magnitude of change od weights and loss, respectively? What's the value of learning rate? What's the accuracy on train set and validation set? All these questions could help ...
$endgroup$
– Antonio Jurić
Feb 18 at 15:48
$begingroup$
cost function drops exponentially, while weights change quite fast. if i run it forever, the weights will keep changing while the cost function is constant. im astonished to c the cost function changes so little while weights keep changing
$endgroup$
– feynman
Feb 19 at 3:22
$begingroup$
Your learning rate is probably too large.
$endgroup$
– Antonio Jurić
Feb 19 at 6:20
$begingroup$
if learning rate too big, cost will change quickly, how will weights change so little
$endgroup$
– feynman
Feb 19 at 6:30
$begingroup$
Can you add in some more information such as loss vs epoch plot along with weights vs epoch plot to better look into the situation?
$endgroup$
– karthikeyan mg
yesterday
|
show 1 more comment
$begingroup$
My cost/loss function drops drastically and approaches 0, which looks a sign of convergence. But the weights are still changing in a visible way, a lot faster than the cost function. Should I ensure the weights converge too?
Some details: I just calculated for 1 epoch. My loss function is mean square difference. I use no optimizer. I tried several experiments and different initial weights but all got converged loss function with ever-changing weights.
neural-network training cost-function convergence
$endgroup$
My cost/loss function drops drastically and approaches 0, which looks a sign of convergence. But the weights are still changing in a visible way, a lot faster than the cost function. Should I ensure the weights converge too?
Some details: I just calculated for 1 epoch. My loss function is mean square difference. I use no optimizer. I tried several experiments and different initial weights but all got converged loss function with ever-changing weights.
neural-network training cost-function convergence
neural-network training cost-function convergence
edited 13 mins ago
Alexander Gruber
1033
1033
asked Feb 18 at 15:20
feynmanfeynman
578
578
This question has an open bounty worth +50
reputation from feynman ending in 4 days.
Looking for an answer drawing from credible and/or official sources.
This question has an open bounty worth +50
reputation from feynman ending in 4 days.
Looking for an answer drawing from credible and/or official sources.
$begingroup$
What's the order of magnitude of change od weights and loss, respectively? What's the value of learning rate? What's the accuracy on train set and validation set? All these questions could help ...
$endgroup$
– Antonio Jurić
Feb 18 at 15:48
$begingroup$
cost function drops exponentially, while weights change quite fast. if i run it forever, the weights will keep changing while the cost function is constant. im astonished to c the cost function changes so little while weights keep changing
$endgroup$
– feynman
Feb 19 at 3:22
$begingroup$
Your learning rate is probably too large.
$endgroup$
– Antonio Jurić
Feb 19 at 6:20
$begingroup$
if learning rate too big, cost will change quickly, how will weights change so little
$endgroup$
– feynman
Feb 19 at 6:30
$begingroup$
Can you add in some more information such as loss vs epoch plot along with weights vs epoch plot to better look into the situation?
$endgroup$
– karthikeyan mg
yesterday
|
show 1 more comment
$begingroup$
What's the order of magnitude of change od weights and loss, respectively? What's the value of learning rate? What's the accuracy on train set and validation set? All these questions could help ...
$endgroup$
– Antonio Jurić
Feb 18 at 15:48
$begingroup$
cost function drops exponentially, while weights change quite fast. if i run it forever, the weights will keep changing while the cost function is constant. im astonished to c the cost function changes so little while weights keep changing
$endgroup$
– feynman
Feb 19 at 3:22
$begingroup$
Your learning rate is probably too large.
$endgroup$
– Antonio Jurić
Feb 19 at 6:20
$begingroup$
if learning rate too big, cost will change quickly, how will weights change so little
$endgroup$
– feynman
Feb 19 at 6:30
$begingroup$
Can you add in some more information such as loss vs epoch plot along with weights vs epoch plot to better look into the situation?
$endgroup$
– karthikeyan mg
yesterday
$begingroup$
What's the order of magnitude of change od weights and loss, respectively? What's the value of learning rate? What's the accuracy on train set and validation set? All these questions could help ...
$endgroup$
– Antonio Jurić
Feb 18 at 15:48
$begingroup$
What's the order of magnitude of change od weights and loss, respectively? What's the value of learning rate? What's the accuracy on train set and validation set? All these questions could help ...
$endgroup$
– Antonio Jurić
Feb 18 at 15:48
$begingroup$
cost function drops exponentially, while weights change quite fast. if i run it forever, the weights will keep changing while the cost function is constant. im astonished to c the cost function changes so little while weights keep changing
$endgroup$
– feynman
Feb 19 at 3:22
$begingroup$
cost function drops exponentially, while weights change quite fast. if i run it forever, the weights will keep changing while the cost function is constant. im astonished to c the cost function changes so little while weights keep changing
$endgroup$
– feynman
Feb 19 at 3:22
$begingroup$
Your learning rate is probably too large.
$endgroup$
– Antonio Jurić
Feb 19 at 6:20
$begingroup$
Your learning rate is probably too large.
$endgroup$
– Antonio Jurić
Feb 19 at 6:20
$begingroup$
if learning rate too big, cost will change quickly, how will weights change so little
$endgroup$
– feynman
Feb 19 at 6:30
$begingroup$
if learning rate too big, cost will change quickly, how will weights change so little
$endgroup$
– feynman
Feb 19 at 6:30
$begingroup$
Can you add in some more information such as loss vs epoch plot along with weights vs epoch plot to better look into the situation?
$endgroup$
– karthikeyan mg
yesterday
$begingroup$
Can you add in some more information such as loss vs epoch plot along with weights vs epoch plot to better look into the situation?
$endgroup$
– karthikeyan mg
yesterday
|
show 1 more comment
3 Answers
3
active
oldest
votes
$begingroup$
The weights in a model do not need to converge to stop training.
One possible explanation is that the model error surface has a big, wide valley. If that is the case, the loss function would be low throughout the valley but there would be many weight combinations that would all yield similar performance on the training dataset. Once a model has reached an acceptable loss function value there is no reason to continue training, just take any set of weight values.
$endgroup$
$begingroup$
'The weights in a model do not need to converge to stop training' why?
$endgroup$
– feynman
2 days ago
add a comment |
$begingroup$
Can you provide us with more info? What optimizer do you use and with what parameters, how many epochs and experiments did you run, what is your loss function?...
i just calculated for 1 epoch
This doesn't make any sense for conclussion you wrote in this post.
New contributor
$endgroup$
$begingroup$
i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights
$endgroup$
– feynman
23 hours ago
add a comment |
$begingroup$
My cost/loss function drops drastically and approaches 0
When you didn't use any optimizer to optimize the loss as you have said, Technically it's not possible for the cost/loss function to drop drastically and approach zero. It's only because of the optimizer that the model works with the objective of reducing cost/error or in simpler terms from gradient descent hill analogy, optimizer finds"descending the hill in what way accounts for the most reduction in error". Your model just stays at the top of the hill forever!!!. The loss is just a number for your model.
Since there is no optimizer in your code, It's technically not possible that "cost/loss function drops drastically and approaches 0".Your model's loss stays in point B forever
But the weights are still changing in a visible way, a lot faster than the cost function
The above given are the update equations. Due to the random prediction of your model, At every batch, Some points tend to get predicted as correct class randomly. This accounts for some very small reduction in loss. And this change in loss is updated on the weights using the equation above. And so you may see random changes in weights for each batch. The overall effect of this change is negligible.
I've also made some real examples with mnist data which I computed without optimizer and the results are as follows:
Here you can clearly see the red line(loss) stays on top of the graph forever. I had a batch size of 5 and ran it for 5 epochs
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45764%2fweights-not-converging-while-cost-function-has-converged-in-neural-networks%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The weights in a model do not need to converge to stop training.
One possible explanation is that the model error surface has a big, wide valley. If that is the case, the loss function would be low throughout the valley but there would be many weight combinations that would all yield similar performance on the training dataset. Once a model has reached an acceptable loss function value there is no reason to continue training, just take any set of weight values.
$endgroup$
$begingroup$
'The weights in a model do not need to converge to stop training' why?
$endgroup$
– feynman
2 days ago
add a comment |
$begingroup$
The weights in a model do not need to converge to stop training.
One possible explanation is that the model error surface has a big, wide valley. If that is the case, the loss function would be low throughout the valley but there would be many weight combinations that would all yield similar performance on the training dataset. Once a model has reached an acceptable loss function value there is no reason to continue training, just take any set of weight values.
$endgroup$
$begingroup$
'The weights in a model do not need to converge to stop training' why?
$endgroup$
– feynman
2 days ago
add a comment |
$begingroup$
The weights in a model do not need to converge to stop training.
One possible explanation is that the model error surface has a big, wide valley. If that is the case, the loss function would be low throughout the valley but there would be many weight combinations that would all yield similar performance on the training dataset. Once a model has reached an acceptable loss function value there is no reason to continue training, just take any set of weight values.
$endgroup$
The weights in a model do not need to converge to stop training.
One possible explanation is that the model error surface has a big, wide valley. If that is the case, the loss function would be low throughout the valley but there would be many weight combinations that would all yield similar performance on the training dataset. Once a model has reached an acceptable loss function value there is no reason to continue training, just take any set of weight values.
answered 2 days ago
Brian SpieringBrian Spiering
3,8931028
3,8931028
$begingroup$
'The weights in a model do not need to converge to stop training' why?
$endgroup$
– feynman
2 days ago
add a comment |
$begingroup$
'The weights in a model do not need to converge to stop training' why?
$endgroup$
– feynman
2 days ago
$begingroup$
'The weights in a model do not need to converge to stop training' why?
$endgroup$
– feynman
2 days ago
$begingroup$
'The weights in a model do not need to converge to stop training' why?
$endgroup$
– feynman
2 days ago
add a comment |
$begingroup$
Can you provide us with more info? What optimizer do you use and with what parameters, how many epochs and experiments did you run, what is your loss function?...
i just calculated for 1 epoch
This doesn't make any sense for conclussion you wrote in this post.
New contributor
$endgroup$
$begingroup$
i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights
$endgroup$
– feynman
23 hours ago
add a comment |
$begingroup$
Can you provide us with more info? What optimizer do you use and with what parameters, how many epochs and experiments did you run, what is your loss function?...
i just calculated for 1 epoch
This doesn't make any sense for conclussion you wrote in this post.
New contributor
$endgroup$
$begingroup$
i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights
$endgroup$
– feynman
23 hours ago
add a comment |
$begingroup$
Can you provide us with more info? What optimizer do you use and with what parameters, how many epochs and experiments did you run, what is your loss function?...
i just calculated for 1 epoch
This doesn't make any sense for conclussion you wrote in this post.
New contributor
$endgroup$
Can you provide us with more info? What optimizer do you use and with what parameters, how many epochs and experiments did you run, what is your loss function?...
i just calculated for 1 epoch
This doesn't make any sense for conclussion you wrote in this post.
New contributor
New contributor
answered yesterday
sob3kxsob3kx
113
113
New contributor
New contributor
$begingroup$
i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights
$endgroup$
– feynman
23 hours ago
add a comment |
$begingroup$
i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights
$endgroup$
– feynman
23 hours ago
$begingroup$
i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights
$endgroup$
– feynman
23 hours ago
$begingroup$
i just calculated for 1 epoch. my loss function is mean square difference. i use no optimizer. i tried several experiments and different initial weights but all got converged loss function with ever-changing weights
$endgroup$
– feynman
23 hours ago
add a comment |
$begingroup$
My cost/loss function drops drastically and approaches 0
When you didn't use any optimizer to optimize the loss as you have said, Technically it's not possible for the cost/loss function to drop drastically and approach zero. It's only because of the optimizer that the model works with the objective of reducing cost/error or in simpler terms from gradient descent hill analogy, optimizer finds"descending the hill in what way accounts for the most reduction in error". Your model just stays at the top of the hill forever!!!. The loss is just a number for your model.
Since there is no optimizer in your code, It's technically not possible that "cost/loss function drops drastically and approaches 0".Your model's loss stays in point B forever
But the weights are still changing in a visible way, a lot faster than the cost function
The above given are the update equations. Due to the random prediction of your model, At every batch, Some points tend to get predicted as correct class randomly. This accounts for some very small reduction in loss. And this change in loss is updated on the weights using the equation above. And so you may see random changes in weights for each batch. The overall effect of this change is negligible.
I've also made some real examples with mnist data which I computed without optimizer and the results are as follows:
Here you can clearly see the red line(loss) stays on top of the graph forever. I had a batch size of 5 and ran it for 5 epochs
$endgroup$
add a comment |
$begingroup$
My cost/loss function drops drastically and approaches 0
When you didn't use any optimizer to optimize the loss as you have said, Technically it's not possible for the cost/loss function to drop drastically and approach zero. It's only because of the optimizer that the model works with the objective of reducing cost/error or in simpler terms from gradient descent hill analogy, optimizer finds"descending the hill in what way accounts for the most reduction in error". Your model just stays at the top of the hill forever!!!. The loss is just a number for your model.
Since there is no optimizer in your code, It's technically not possible that "cost/loss function drops drastically and approaches 0".Your model's loss stays in point B forever
But the weights are still changing in a visible way, a lot faster than the cost function
The above given are the update equations. Due to the random prediction of your model, At every batch, Some points tend to get predicted as correct class randomly. This accounts for some very small reduction in loss. And this change in loss is updated on the weights using the equation above. And so you may see random changes in weights for each batch. The overall effect of this change is negligible.
I've also made some real examples with mnist data which I computed without optimizer and the results are as follows:
Here you can clearly see the red line(loss) stays on top of the graph forever. I had a batch size of 5 and ran it for 5 epochs
$endgroup$
add a comment |
$begingroup$
My cost/loss function drops drastically and approaches 0
When you didn't use any optimizer to optimize the loss as you have said, Technically it's not possible for the cost/loss function to drop drastically and approach zero. It's only because of the optimizer that the model works with the objective of reducing cost/error or in simpler terms from gradient descent hill analogy, optimizer finds"descending the hill in what way accounts for the most reduction in error". Your model just stays at the top of the hill forever!!!. The loss is just a number for your model.
Since there is no optimizer in your code, It's technically not possible that "cost/loss function drops drastically and approaches 0".Your model's loss stays in point B forever
But the weights are still changing in a visible way, a lot faster than the cost function
The above given are the update equations. Due to the random prediction of your model, At every batch, Some points tend to get predicted as correct class randomly. This accounts for some very small reduction in loss. And this change in loss is updated on the weights using the equation above. And so you may see random changes in weights for each batch. The overall effect of this change is negligible.
I've also made some real examples with mnist data which I computed without optimizer and the results are as follows:
Here you can clearly see the red line(loss) stays on top of the graph forever. I had a batch size of 5 and ran it for 5 epochs
$endgroup$
My cost/loss function drops drastically and approaches 0
When you didn't use any optimizer to optimize the loss as you have said, Technically it's not possible for the cost/loss function to drop drastically and approach zero. It's only because of the optimizer that the model works with the objective of reducing cost/error or in simpler terms from gradient descent hill analogy, optimizer finds"descending the hill in what way accounts for the most reduction in error". Your model just stays at the top of the hill forever!!!. The loss is just a number for your model.
Since there is no optimizer in your code, It's technically not possible that "cost/loss function drops drastically and approaches 0".Your model's loss stays in point B forever
But the weights are still changing in a visible way, a lot faster than the cost function
The above given are the update equations. Due to the random prediction of your model, At every batch, Some points tend to get predicted as correct class randomly. This accounts for some very small reduction in loss. And this change in loss is updated on the weights using the equation above. And so you may see random changes in weights for each batch. The overall effect of this change is negligible.
I've also made some real examples with mnist data which I computed without optimizer and the results are as follows:
Here you can clearly see the red line(loss) stays on top of the graph forever. I had a batch size of 5 and ran it for 5 epochs
answered 21 hours ago
karthikeyan mgkarthikeyan mg
15010
15010
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45764%2fweights-not-converging-while-cost-function-has-converged-in-neural-networks%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
What's the order of magnitude of change od weights and loss, respectively? What's the value of learning rate? What's the accuracy on train set and validation set? All these questions could help ...
$endgroup$
– Antonio Jurić
Feb 18 at 15:48
$begingroup$
cost function drops exponentially, while weights change quite fast. if i run it forever, the weights will keep changing while the cost function is constant. im astonished to c the cost function changes so little while weights keep changing
$endgroup$
– feynman
Feb 19 at 3:22
$begingroup$
Your learning rate is probably too large.
$endgroup$
– Antonio Jurić
Feb 19 at 6:20
$begingroup$
if learning rate too big, cost will change quickly, how will weights change so little
$endgroup$
– feynman
Feb 19 at 6:30
$begingroup$
Can you add in some more information such as loss vs epoch plot along with weights vs epoch plot to better look into the situation?
$endgroup$
– karthikeyan mg
yesterday