How does Sigmoid activation work in multi-class classification problems












2












$begingroup$


I know that for a problem with multiple classes we usually use softmax, but can we also use sigmoid? I have tried to implement digit classification with sigmoid at the output layer, it works. What I don't understand is how does it work?










share|improve this question











$endgroup$

















    2












    $begingroup$


    I know that for a problem with multiple classes we usually use softmax, but can we also use sigmoid? I have tried to implement digit classification with sigmoid at the output layer, it works. What I don't understand is how does it work?










    share|improve this question











    $endgroup$















      2












      2








      2


      1



      $begingroup$


      I know that for a problem with multiple classes we usually use softmax, but can we also use sigmoid? I have tried to implement digit classification with sigmoid at the output layer, it works. What I don't understand is how does it work?










      share|improve this question











      $endgroup$




      I know that for a problem with multiple classes we usually use softmax, but can we also use sigmoid? I have tried to implement digit classification with sigmoid at the output layer, it works. What I don't understand is how does it work?







      machine-learning neural-network deep-learning multiclass-classification activation-function






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Oct 6 '18 at 19:56









      Media

      6,81052057




      6,81052057










      asked Oct 6 '18 at 8:41









      bharath chandrabharath chandra

      92




      92






















          3 Answers
          3






          active

          oldest

          votes


















          1












          $begingroup$

          softmax() will give you the probability distribution which means all output will sum to 1. While, sigmoid() will make sure the output value of neuron is between 0 to 1.



          In case of digit classification and sigmoid(), you will have output of 10 output neurons between 0 to 1. Then, you can take biggest one of them and classify as that digit.






          share|improve this answer









          $endgroup$













          • $begingroup$
            So what you are saying is both works same? So softmax calculates the probability of one neuron with respect of all others and then returns neuron that has maximum probability whereas when using sigmoid it generates output for each neuron independently and the neuron that has maximum output is returned. Please correct me if I am wrong..
            $endgroup$
            – bharath chandra
            Oct 7 '18 at 3:00



















          1












          $begingroup$

          If your task is a kind of classification that the labels are mutually exclusive, each input just has one label, you have to use Softmax. If the inputs of your classification task have multiple labels for an input, your classes are not mutually exclusive and you can use Sigmoid for each output. For the former case, you should choose the output entry with the maximum value as the output. For the latter case, for each class, you have an activation value which belongs to the last sigmoid. If each activation is more than 0.5 you can say that entry exists in the input.






          share|improve this answer









          $endgroup$













          • $begingroup$
            Yes sir, but my intention is to know how they work within the network. For example, consider a training example using softmax I got expected value 3 when it the actual output is 4 so this can be compared and the weights can be adjusted, but when using sigmoid I always get the output between 0 to 1 how can I compare this with the actual output which can anything between 0 to 9. I am getting an accuracy of 98% when using sigmoid and 99% when using softmax but I don't understand how sigmoid is working.
            $endgroup$
            – bharath chandra
            Oct 6 '18 at 23:22










          • $begingroup$
            I didn't understand.
            $endgroup$
            – Media
            Oct 7 '18 at 6:56



















          0












          $begingroup$

          @bharath chandra A Softmax function will never give 3 as output. It will always output real values between 0 and 1. A Sigmoid function also gives output between 0 and 1. The difference is that in the former one, the sum of all the outputs will be equal to 1 (due to mutually exclusive nature) while in the latter case, the sum of all the outputs need not necessarily be equal to 1 (due to independent nature).





          share









          $endgroup$













            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "557"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f39264%2fhow-does-sigmoid-activation-work-in-multi-class-classification-problems%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1












            $begingroup$

            softmax() will give you the probability distribution which means all output will sum to 1. While, sigmoid() will make sure the output value of neuron is between 0 to 1.



            In case of digit classification and sigmoid(), you will have output of 10 output neurons between 0 to 1. Then, you can take biggest one of them and classify as that digit.






            share|improve this answer









            $endgroup$













            • $begingroup$
              So what you are saying is both works same? So softmax calculates the probability of one neuron with respect of all others and then returns neuron that has maximum probability whereas when using sigmoid it generates output for each neuron independently and the neuron that has maximum output is returned. Please correct me if I am wrong..
              $endgroup$
              – bharath chandra
              Oct 7 '18 at 3:00
















            1












            $begingroup$

            softmax() will give you the probability distribution which means all output will sum to 1. While, sigmoid() will make sure the output value of neuron is between 0 to 1.



            In case of digit classification and sigmoid(), you will have output of 10 output neurons between 0 to 1. Then, you can take biggest one of them and classify as that digit.






            share|improve this answer









            $endgroup$













            • $begingroup$
              So what you are saying is both works same? So softmax calculates the probability of one neuron with respect of all others and then returns neuron that has maximum probability whereas when using sigmoid it generates output for each neuron independently and the neuron that has maximum output is returned. Please correct me if I am wrong..
              $endgroup$
              – bharath chandra
              Oct 7 '18 at 3:00














            1












            1








            1





            $begingroup$

            softmax() will give you the probability distribution which means all output will sum to 1. While, sigmoid() will make sure the output value of neuron is between 0 to 1.



            In case of digit classification and sigmoid(), you will have output of 10 output neurons between 0 to 1. Then, you can take biggest one of them and classify as that digit.






            share|improve this answer









            $endgroup$



            softmax() will give you the probability distribution which means all output will sum to 1. While, sigmoid() will make sure the output value of neuron is between 0 to 1.



            In case of digit classification and sigmoid(), you will have output of 10 output neurons between 0 to 1. Then, you can take biggest one of them and classify as that digit.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Oct 6 '18 at 19:01









            PreetPreet

            1411




            1411












            • $begingroup$
              So what you are saying is both works same? So softmax calculates the probability of one neuron with respect of all others and then returns neuron that has maximum probability whereas when using sigmoid it generates output for each neuron independently and the neuron that has maximum output is returned. Please correct me if I am wrong..
              $endgroup$
              – bharath chandra
              Oct 7 '18 at 3:00


















            • $begingroup$
              So what you are saying is both works same? So softmax calculates the probability of one neuron with respect of all others and then returns neuron that has maximum probability whereas when using sigmoid it generates output for each neuron independently and the neuron that has maximum output is returned. Please correct me if I am wrong..
              $endgroup$
              – bharath chandra
              Oct 7 '18 at 3:00
















            $begingroup$
            So what you are saying is both works same? So softmax calculates the probability of one neuron with respect of all others and then returns neuron that has maximum probability whereas when using sigmoid it generates output for each neuron independently and the neuron that has maximum output is returned. Please correct me if I am wrong..
            $endgroup$
            – bharath chandra
            Oct 7 '18 at 3:00




            $begingroup$
            So what you are saying is both works same? So softmax calculates the probability of one neuron with respect of all others and then returns neuron that has maximum probability whereas when using sigmoid it generates output for each neuron independently and the neuron that has maximum output is returned. Please correct me if I am wrong..
            $endgroup$
            – bharath chandra
            Oct 7 '18 at 3:00











            1












            $begingroup$

            If your task is a kind of classification that the labels are mutually exclusive, each input just has one label, you have to use Softmax. If the inputs of your classification task have multiple labels for an input, your classes are not mutually exclusive and you can use Sigmoid for each output. For the former case, you should choose the output entry with the maximum value as the output. For the latter case, for each class, you have an activation value which belongs to the last sigmoid. If each activation is more than 0.5 you can say that entry exists in the input.






            share|improve this answer









            $endgroup$













            • $begingroup$
              Yes sir, but my intention is to know how they work within the network. For example, consider a training example using softmax I got expected value 3 when it the actual output is 4 so this can be compared and the weights can be adjusted, but when using sigmoid I always get the output between 0 to 1 how can I compare this with the actual output which can anything between 0 to 9. I am getting an accuracy of 98% when using sigmoid and 99% when using softmax but I don't understand how sigmoid is working.
              $endgroup$
              – bharath chandra
              Oct 6 '18 at 23:22










            • $begingroup$
              I didn't understand.
              $endgroup$
              – Media
              Oct 7 '18 at 6:56
















            1












            $begingroup$

            If your task is a kind of classification that the labels are mutually exclusive, each input just has one label, you have to use Softmax. If the inputs of your classification task have multiple labels for an input, your classes are not mutually exclusive and you can use Sigmoid for each output. For the former case, you should choose the output entry with the maximum value as the output. For the latter case, for each class, you have an activation value which belongs to the last sigmoid. If each activation is more than 0.5 you can say that entry exists in the input.






            share|improve this answer









            $endgroup$













            • $begingroup$
              Yes sir, but my intention is to know how they work within the network. For example, consider a training example using softmax I got expected value 3 when it the actual output is 4 so this can be compared and the weights can be adjusted, but when using sigmoid I always get the output between 0 to 1 how can I compare this with the actual output which can anything between 0 to 9. I am getting an accuracy of 98% when using sigmoid and 99% when using softmax but I don't understand how sigmoid is working.
              $endgroup$
              – bharath chandra
              Oct 6 '18 at 23:22










            • $begingroup$
              I didn't understand.
              $endgroup$
              – Media
              Oct 7 '18 at 6:56














            1












            1








            1





            $begingroup$

            If your task is a kind of classification that the labels are mutually exclusive, each input just has one label, you have to use Softmax. If the inputs of your classification task have multiple labels for an input, your classes are not mutually exclusive and you can use Sigmoid for each output. For the former case, you should choose the output entry with the maximum value as the output. For the latter case, for each class, you have an activation value which belongs to the last sigmoid. If each activation is more than 0.5 you can say that entry exists in the input.






            share|improve this answer









            $endgroup$



            If your task is a kind of classification that the labels are mutually exclusive, each input just has one label, you have to use Softmax. If the inputs of your classification task have multiple labels for an input, your classes are not mutually exclusive and you can use Sigmoid for each output. For the former case, you should choose the output entry with the maximum value as the output. For the latter case, for each class, you have an activation value which belongs to the last sigmoid. If each activation is more than 0.5 you can say that entry exists in the input.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Oct 6 '18 at 19:55









            MediaMedia

            6,81052057




            6,81052057












            • $begingroup$
              Yes sir, but my intention is to know how they work within the network. For example, consider a training example using softmax I got expected value 3 when it the actual output is 4 so this can be compared and the weights can be adjusted, but when using sigmoid I always get the output between 0 to 1 how can I compare this with the actual output which can anything between 0 to 9. I am getting an accuracy of 98% when using sigmoid and 99% when using softmax but I don't understand how sigmoid is working.
              $endgroup$
              – bharath chandra
              Oct 6 '18 at 23:22










            • $begingroup$
              I didn't understand.
              $endgroup$
              – Media
              Oct 7 '18 at 6:56


















            • $begingroup$
              Yes sir, but my intention is to know how they work within the network. For example, consider a training example using softmax I got expected value 3 when it the actual output is 4 so this can be compared and the weights can be adjusted, but when using sigmoid I always get the output between 0 to 1 how can I compare this with the actual output which can anything between 0 to 9. I am getting an accuracy of 98% when using sigmoid and 99% when using softmax but I don't understand how sigmoid is working.
              $endgroup$
              – bharath chandra
              Oct 6 '18 at 23:22










            • $begingroup$
              I didn't understand.
              $endgroup$
              – Media
              Oct 7 '18 at 6:56
















            $begingroup$
            Yes sir, but my intention is to know how they work within the network. For example, consider a training example using softmax I got expected value 3 when it the actual output is 4 so this can be compared and the weights can be adjusted, but when using sigmoid I always get the output between 0 to 1 how can I compare this with the actual output which can anything between 0 to 9. I am getting an accuracy of 98% when using sigmoid and 99% when using softmax but I don't understand how sigmoid is working.
            $endgroup$
            – bharath chandra
            Oct 6 '18 at 23:22




            $begingroup$
            Yes sir, but my intention is to know how they work within the network. For example, consider a training example using softmax I got expected value 3 when it the actual output is 4 so this can be compared and the weights can be adjusted, but when using sigmoid I always get the output between 0 to 1 how can I compare this with the actual output which can anything between 0 to 9. I am getting an accuracy of 98% when using sigmoid and 99% when using softmax but I don't understand how sigmoid is working.
            $endgroup$
            – bharath chandra
            Oct 6 '18 at 23:22












            $begingroup$
            I didn't understand.
            $endgroup$
            – Media
            Oct 7 '18 at 6:56




            $begingroup$
            I didn't understand.
            $endgroup$
            – Media
            Oct 7 '18 at 6:56











            0












            $begingroup$

            @bharath chandra A Softmax function will never give 3 as output. It will always output real values between 0 and 1. A Sigmoid function also gives output between 0 and 1. The difference is that in the former one, the sum of all the outputs will be equal to 1 (due to mutually exclusive nature) while in the latter case, the sum of all the outputs need not necessarily be equal to 1 (due to independent nature).





            share









            $endgroup$


















              0












              $begingroup$

              @bharath chandra A Softmax function will never give 3 as output. It will always output real values between 0 and 1. A Sigmoid function also gives output between 0 and 1. The difference is that in the former one, the sum of all the outputs will be equal to 1 (due to mutually exclusive nature) while in the latter case, the sum of all the outputs need not necessarily be equal to 1 (due to independent nature).





              share









              $endgroup$
















                0












                0








                0





                $begingroup$

                @bharath chandra A Softmax function will never give 3 as output. It will always output real values between 0 and 1. A Sigmoid function also gives output between 0 and 1. The difference is that in the former one, the sum of all the outputs will be equal to 1 (due to mutually exclusive nature) while in the latter case, the sum of all the outputs need not necessarily be equal to 1 (due to independent nature).





                share









                $endgroup$



                @bharath chandra A Softmax function will never give 3 as output. It will always output real values between 0 and 1. A Sigmoid function also gives output between 0 and 1. The difference is that in the former one, the sum of all the outputs will be equal to 1 (due to mutually exclusive nature) while in the latter case, the sum of all the outputs need not necessarily be equal to 1 (due to independent nature).






                share











                share


                share










                answered 51 secs ago









                PS NayakPS Nayak

                32




                32






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f39264%2fhow-does-sigmoid-activation-work-in-multi-class-classification-problems%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Ponta tanko

                    Tantalo (mitologio)

                    Erzsébet Schaár