How does Sigmoid activation work in multi-class classification problems
$begingroup$
I know that for a problem with multiple classes we usually use softmax, but can we also use sigmoid? I have tried to implement digit classification with sigmoid at the output layer, it works. What I don't understand is how does it work?
machine-learning neural-network deep-learning multiclass-classification activation-function
$endgroup$
add a comment |
$begingroup$
I know that for a problem with multiple classes we usually use softmax, but can we also use sigmoid? I have tried to implement digit classification with sigmoid at the output layer, it works. What I don't understand is how does it work?
machine-learning neural-network deep-learning multiclass-classification activation-function
$endgroup$
add a comment |
$begingroup$
I know that for a problem with multiple classes we usually use softmax, but can we also use sigmoid? I have tried to implement digit classification with sigmoid at the output layer, it works. What I don't understand is how does it work?
machine-learning neural-network deep-learning multiclass-classification activation-function
$endgroup$
I know that for a problem with multiple classes we usually use softmax, but can we also use sigmoid? I have tried to implement digit classification with sigmoid at the output layer, it works. What I don't understand is how does it work?
machine-learning neural-network deep-learning multiclass-classification activation-function
machine-learning neural-network deep-learning multiclass-classification activation-function
edited Oct 6 '18 at 19:56
Media
6,81052057
6,81052057
asked Oct 6 '18 at 8:41
bharath chandrabharath chandra
92
92
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
softmax() will give you the probability distribution which means all output will sum to 1. While, sigmoid() will make sure the output value of neuron is between 0 to 1.
In case of digit classification and sigmoid(), you will have output of 10 output neurons between 0 to 1. Then, you can take biggest one of them and classify as that digit.
$endgroup$
$begingroup$
So what you are saying is both works same? So softmax calculates the probability of one neuron with respect of all others and then returns neuron that has maximum probability whereas when using sigmoid it generates output for each neuron independently and the neuron that has maximum output is returned. Please correct me if I am wrong..
$endgroup$
– bharath chandra
Oct 7 '18 at 3:00
add a comment |
$begingroup$
If your task is a kind of classification that the labels are mutually exclusive, each input just has one label, you have to use Softmax
. If the inputs of your classification task have multiple labels for an input, your classes are not mutually exclusive and you can use Sigmoid
for each output. For the former case, you should choose the output entry with the maximum value as the output. For the latter case, for each class, you have an activation value which belongs to the last sigmoid. If each activation is more than 0.5
you can say that entry exists in the input.
$endgroup$
$begingroup$
Yes sir, but my intention is to know how they work within the network. For example, consider a training example using softmax I got expected value 3 when it the actual output is 4 so this can be compared and the weights can be adjusted, but when using sigmoid I always get the output between 0 to 1 how can I compare this with the actual output which can anything between 0 to 9. I am getting an accuracy of 98% when using sigmoid and 99% when using softmax but I don't understand how sigmoid is working.
$endgroup$
– bharath chandra
Oct 6 '18 at 23:22
$begingroup$
I didn't understand.
$endgroup$
– Media
Oct 7 '18 at 6:56
add a comment |
$begingroup$
@bharath chandra A Softmax function will never give 3 as output. It will always output real values between 0 and 1. A Sigmoid function also gives output between 0 and 1. The difference is that in the former one, the sum of all the outputs will be equal to 1 (due to mutually exclusive nature) while in the latter case, the sum of all the outputs need not necessarily be equal to 1 (due to independent nature).
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f39264%2fhow-does-sigmoid-activation-work-in-multi-class-classification-problems%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
softmax() will give you the probability distribution which means all output will sum to 1. While, sigmoid() will make sure the output value of neuron is between 0 to 1.
In case of digit classification and sigmoid(), you will have output of 10 output neurons between 0 to 1. Then, you can take biggest one of them and classify as that digit.
$endgroup$
$begingroup$
So what you are saying is both works same? So softmax calculates the probability of one neuron with respect of all others and then returns neuron that has maximum probability whereas when using sigmoid it generates output for each neuron independently and the neuron that has maximum output is returned. Please correct me if I am wrong..
$endgroup$
– bharath chandra
Oct 7 '18 at 3:00
add a comment |
$begingroup$
softmax() will give you the probability distribution which means all output will sum to 1. While, sigmoid() will make sure the output value of neuron is between 0 to 1.
In case of digit classification and sigmoid(), you will have output of 10 output neurons between 0 to 1. Then, you can take biggest one of them and classify as that digit.
$endgroup$
$begingroup$
So what you are saying is both works same? So softmax calculates the probability of one neuron with respect of all others and then returns neuron that has maximum probability whereas when using sigmoid it generates output for each neuron independently and the neuron that has maximum output is returned. Please correct me if I am wrong..
$endgroup$
– bharath chandra
Oct 7 '18 at 3:00
add a comment |
$begingroup$
softmax() will give you the probability distribution which means all output will sum to 1. While, sigmoid() will make sure the output value of neuron is between 0 to 1.
In case of digit classification and sigmoid(), you will have output of 10 output neurons between 0 to 1. Then, you can take biggest one of them and classify as that digit.
$endgroup$
softmax() will give you the probability distribution which means all output will sum to 1. While, sigmoid() will make sure the output value of neuron is between 0 to 1.
In case of digit classification and sigmoid(), you will have output of 10 output neurons between 0 to 1. Then, you can take biggest one of them and classify as that digit.
answered Oct 6 '18 at 19:01
PreetPreet
1411
1411
$begingroup$
So what you are saying is both works same? So softmax calculates the probability of one neuron with respect of all others and then returns neuron that has maximum probability whereas when using sigmoid it generates output for each neuron independently and the neuron that has maximum output is returned. Please correct me if I am wrong..
$endgroup$
– bharath chandra
Oct 7 '18 at 3:00
add a comment |
$begingroup$
So what you are saying is both works same? So softmax calculates the probability of one neuron with respect of all others and then returns neuron that has maximum probability whereas when using sigmoid it generates output for each neuron independently and the neuron that has maximum output is returned. Please correct me if I am wrong..
$endgroup$
– bharath chandra
Oct 7 '18 at 3:00
$begingroup$
So what you are saying is both works same? So softmax calculates the probability of one neuron with respect of all others and then returns neuron that has maximum probability whereas when using sigmoid it generates output for each neuron independently and the neuron that has maximum output is returned. Please correct me if I am wrong..
$endgroup$
– bharath chandra
Oct 7 '18 at 3:00
$begingroup$
So what you are saying is both works same? So softmax calculates the probability of one neuron with respect of all others and then returns neuron that has maximum probability whereas when using sigmoid it generates output for each neuron independently and the neuron that has maximum output is returned. Please correct me if I am wrong..
$endgroup$
– bharath chandra
Oct 7 '18 at 3:00
add a comment |
$begingroup$
If your task is a kind of classification that the labels are mutually exclusive, each input just has one label, you have to use Softmax
. If the inputs of your classification task have multiple labels for an input, your classes are not mutually exclusive and you can use Sigmoid
for each output. For the former case, you should choose the output entry with the maximum value as the output. For the latter case, for each class, you have an activation value which belongs to the last sigmoid. If each activation is more than 0.5
you can say that entry exists in the input.
$endgroup$
$begingroup$
Yes sir, but my intention is to know how they work within the network. For example, consider a training example using softmax I got expected value 3 when it the actual output is 4 so this can be compared and the weights can be adjusted, but when using sigmoid I always get the output between 0 to 1 how can I compare this with the actual output which can anything between 0 to 9. I am getting an accuracy of 98% when using sigmoid and 99% when using softmax but I don't understand how sigmoid is working.
$endgroup$
– bharath chandra
Oct 6 '18 at 23:22
$begingroup$
I didn't understand.
$endgroup$
– Media
Oct 7 '18 at 6:56
add a comment |
$begingroup$
If your task is a kind of classification that the labels are mutually exclusive, each input just has one label, you have to use Softmax
. If the inputs of your classification task have multiple labels for an input, your classes are not mutually exclusive and you can use Sigmoid
for each output. For the former case, you should choose the output entry with the maximum value as the output. For the latter case, for each class, you have an activation value which belongs to the last sigmoid. If each activation is more than 0.5
you can say that entry exists in the input.
$endgroup$
$begingroup$
Yes sir, but my intention is to know how they work within the network. For example, consider a training example using softmax I got expected value 3 when it the actual output is 4 so this can be compared and the weights can be adjusted, but when using sigmoid I always get the output between 0 to 1 how can I compare this with the actual output which can anything between 0 to 9. I am getting an accuracy of 98% when using sigmoid and 99% when using softmax but I don't understand how sigmoid is working.
$endgroup$
– bharath chandra
Oct 6 '18 at 23:22
$begingroup$
I didn't understand.
$endgroup$
– Media
Oct 7 '18 at 6:56
add a comment |
$begingroup$
If your task is a kind of classification that the labels are mutually exclusive, each input just has one label, you have to use Softmax
. If the inputs of your classification task have multiple labels for an input, your classes are not mutually exclusive and you can use Sigmoid
for each output. For the former case, you should choose the output entry with the maximum value as the output. For the latter case, for each class, you have an activation value which belongs to the last sigmoid. If each activation is more than 0.5
you can say that entry exists in the input.
$endgroup$
If your task is a kind of classification that the labels are mutually exclusive, each input just has one label, you have to use Softmax
. If the inputs of your classification task have multiple labels for an input, your classes are not mutually exclusive and you can use Sigmoid
for each output. For the former case, you should choose the output entry with the maximum value as the output. For the latter case, for each class, you have an activation value which belongs to the last sigmoid. If each activation is more than 0.5
you can say that entry exists in the input.
answered Oct 6 '18 at 19:55
MediaMedia
6,81052057
6,81052057
$begingroup$
Yes sir, but my intention is to know how they work within the network. For example, consider a training example using softmax I got expected value 3 when it the actual output is 4 so this can be compared and the weights can be adjusted, but when using sigmoid I always get the output between 0 to 1 how can I compare this with the actual output which can anything between 0 to 9. I am getting an accuracy of 98% when using sigmoid and 99% when using softmax but I don't understand how sigmoid is working.
$endgroup$
– bharath chandra
Oct 6 '18 at 23:22
$begingroup$
I didn't understand.
$endgroup$
– Media
Oct 7 '18 at 6:56
add a comment |
$begingroup$
Yes sir, but my intention is to know how they work within the network. For example, consider a training example using softmax I got expected value 3 when it the actual output is 4 so this can be compared and the weights can be adjusted, but when using sigmoid I always get the output between 0 to 1 how can I compare this with the actual output which can anything between 0 to 9. I am getting an accuracy of 98% when using sigmoid and 99% when using softmax but I don't understand how sigmoid is working.
$endgroup$
– bharath chandra
Oct 6 '18 at 23:22
$begingroup$
I didn't understand.
$endgroup$
– Media
Oct 7 '18 at 6:56
$begingroup$
Yes sir, but my intention is to know how they work within the network. For example, consider a training example using softmax I got expected value 3 when it the actual output is 4 so this can be compared and the weights can be adjusted, but when using sigmoid I always get the output between 0 to 1 how can I compare this with the actual output which can anything between 0 to 9. I am getting an accuracy of 98% when using sigmoid and 99% when using softmax but I don't understand how sigmoid is working.
$endgroup$
– bharath chandra
Oct 6 '18 at 23:22
$begingroup$
Yes sir, but my intention is to know how they work within the network. For example, consider a training example using softmax I got expected value 3 when it the actual output is 4 so this can be compared and the weights can be adjusted, but when using sigmoid I always get the output between 0 to 1 how can I compare this with the actual output which can anything between 0 to 9. I am getting an accuracy of 98% when using sigmoid and 99% when using softmax but I don't understand how sigmoid is working.
$endgroup$
– bharath chandra
Oct 6 '18 at 23:22
$begingroup$
I didn't understand.
$endgroup$
– Media
Oct 7 '18 at 6:56
$begingroup$
I didn't understand.
$endgroup$
– Media
Oct 7 '18 at 6:56
add a comment |
$begingroup$
@bharath chandra A Softmax function will never give 3 as output. It will always output real values between 0 and 1. A Sigmoid function also gives output between 0 and 1. The difference is that in the former one, the sum of all the outputs will be equal to 1 (due to mutually exclusive nature) while in the latter case, the sum of all the outputs need not necessarily be equal to 1 (due to independent nature).
$endgroup$
add a comment |
$begingroup$
@bharath chandra A Softmax function will never give 3 as output. It will always output real values between 0 and 1. A Sigmoid function also gives output between 0 and 1. The difference is that in the former one, the sum of all the outputs will be equal to 1 (due to mutually exclusive nature) while in the latter case, the sum of all the outputs need not necessarily be equal to 1 (due to independent nature).
$endgroup$
add a comment |
$begingroup$
@bharath chandra A Softmax function will never give 3 as output. It will always output real values between 0 and 1. A Sigmoid function also gives output between 0 and 1. The difference is that in the former one, the sum of all the outputs will be equal to 1 (due to mutually exclusive nature) while in the latter case, the sum of all the outputs need not necessarily be equal to 1 (due to independent nature).
$endgroup$
@bharath chandra A Softmax function will never give 3 as output. It will always output real values between 0 and 1. A Sigmoid function also gives output between 0 and 1. The difference is that in the former one, the sum of all the outputs will be equal to 1 (due to mutually exclusive nature) while in the latter case, the sum of all the outputs need not necessarily be equal to 1 (due to independent nature).
answered 51 secs ago
PS NayakPS Nayak
32
32
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f39264%2fhow-does-sigmoid-activation-work-in-multi-class-classification-problems%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown