the relationship between the number of filters/kernels and the number of feature maps












3












$begingroup$


Hierarchical Models Of Perception and Reasoning by Yann LeCun



The above image is from "Deep Learning Tutorial" by Yann LeCun and Marc'Aurelio Ranzato (see pages 73 and 81).



I don't understand why 64 kernels (from the input to layer 1) produce 64 feature maps while 4096 kernels (layer 2 to layer 3) give 256 feature maps.










share|improve this question









$endgroup$

















    3












    $begingroup$


    Hierarchical Models Of Perception and Reasoning by Yann LeCun



    The above image is from "Deep Learning Tutorial" by Yann LeCun and Marc'Aurelio Ranzato (see pages 73 and 81).



    I don't understand why 64 kernels (from the input to layer 1) produce 64 feature maps while 4096 kernels (layer 2 to layer 3) give 256 feature maps.










    share|improve this question









    $endgroup$















      3












      3








      3


      0



      $begingroup$


      Hierarchical Models Of Perception and Reasoning by Yann LeCun



      The above image is from "Deep Learning Tutorial" by Yann LeCun and Marc'Aurelio Ranzato (see pages 73 and 81).



      I don't understand why 64 kernels (from the input to layer 1) produce 64 feature maps while 4096 kernels (layer 2 to layer 3) give 256 feature maps.










      share|improve this question









      $endgroup$




      Hierarchical Models Of Perception and Reasoning by Yann LeCun



      The above image is from "Deep Learning Tutorial" by Yann LeCun and Marc'Aurelio Ranzato (see pages 73 and 81).



      I don't understand why 64 kernels (from the input to layer 1) produce 64 feature maps while 4096 kernels (layer 2 to layer 3) give 256 feature maps.







      neural-network deep-learning cnn






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 30 '17 at 0:04









      chaohuangchaohuang

      12612




      12612






















          2 Answers
          2






          active

          oldest

          votes


















          1












          $begingroup$

          In the linked tutorial, each kernel is 2-dimensional, and applied to a single channel/feature map from its input. To construct a feature map, the output from $N_{input_channels}$ convolutions are added together - this is important as it allows to build feature maps that have interactions between feature maps in the previous layer.



          From the diagram, the first input layer has 1 channel (a greyscale image), so each kernel in layer 1 will generate a feature map. However, once you have 64 channels in layer 2, then to produce each feature map in layer 3 will require 64 kernels added together. If you want 256 feature maps in layer 3, and you expect all 64 inputs to affect each one, then you usually need 64 * 256 = 16384 kernels. The value 4096 is coming from some other aspect of the architecture not shown in the diagram, such as dividing the feature map into groups so that each output layer only processes a fraction of the input layers.



          There are a few notation and presentation differences between presentations and tutorials on convolutional networks, depending on the source. This is one of them. Other sources may arrange the kernels into a 4D structure: $N_{input_channels} times N_{output_channels} times K_{width} times K_{height}$ to make the relationship between kernels and input/output more explicit.



          Another way to show the same thing is to treat the kernels as 3D, and ensure that the kernel depth $K_{depth}$ is the same as the number of input channels $N_{input_channels}$ - this is same thing mathematically as summing up multiple separate convolutions, and dimensions might be $N_{output_channels} times K_{width} times K_{height} times K_{depth}$






          share|improve this answer











          $endgroup$





















            1












            $begingroup$

            The output of a convolution layer is computed as the following:



            the depth (No of feature maps) is equal to the number of filters applied in this layer (because each added channel is a result of one filter's feature map)



            the width ( the same for height) is computed according to the following equation



            $W=frac{W−F+2P}{S} + 1$ where $F$ is the receptive field (filter width), $P$ is the padding and $S$ is the stride.



            For more details see the examples in the following link



            CS231n Convolutional Neural Networks for Visual Recognition






            share|improve this answer











            $endgroup$













              Your Answer





              StackExchange.ifUsing("editor", function () {
              return StackExchange.using("mathjaxEditing", function () {
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              });
              });
              }, "mathjax-editing");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "557"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f25231%2fthe-relationship-between-the-number-of-filters-kernels-and-the-number-of-feature%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1












              $begingroup$

              In the linked tutorial, each kernel is 2-dimensional, and applied to a single channel/feature map from its input. To construct a feature map, the output from $N_{input_channels}$ convolutions are added together - this is important as it allows to build feature maps that have interactions between feature maps in the previous layer.



              From the diagram, the first input layer has 1 channel (a greyscale image), so each kernel in layer 1 will generate a feature map. However, once you have 64 channels in layer 2, then to produce each feature map in layer 3 will require 64 kernels added together. If you want 256 feature maps in layer 3, and you expect all 64 inputs to affect each one, then you usually need 64 * 256 = 16384 kernels. The value 4096 is coming from some other aspect of the architecture not shown in the diagram, such as dividing the feature map into groups so that each output layer only processes a fraction of the input layers.



              There are a few notation and presentation differences between presentations and tutorials on convolutional networks, depending on the source. This is one of them. Other sources may arrange the kernels into a 4D structure: $N_{input_channels} times N_{output_channels} times K_{width} times K_{height}$ to make the relationship between kernels and input/output more explicit.



              Another way to show the same thing is to treat the kernels as 3D, and ensure that the kernel depth $K_{depth}$ is the same as the number of input channels $N_{input_channels}$ - this is same thing mathematically as summing up multiple separate convolutions, and dimensions might be $N_{output_channels} times K_{width} times K_{height} times K_{depth}$






              share|improve this answer











              $endgroup$


















                1












                $begingroup$

                In the linked tutorial, each kernel is 2-dimensional, and applied to a single channel/feature map from its input. To construct a feature map, the output from $N_{input_channels}$ convolutions are added together - this is important as it allows to build feature maps that have interactions between feature maps in the previous layer.



                From the diagram, the first input layer has 1 channel (a greyscale image), so each kernel in layer 1 will generate a feature map. However, once you have 64 channels in layer 2, then to produce each feature map in layer 3 will require 64 kernels added together. If you want 256 feature maps in layer 3, and you expect all 64 inputs to affect each one, then you usually need 64 * 256 = 16384 kernels. The value 4096 is coming from some other aspect of the architecture not shown in the diagram, such as dividing the feature map into groups so that each output layer only processes a fraction of the input layers.



                There are a few notation and presentation differences between presentations and tutorials on convolutional networks, depending on the source. This is one of them. Other sources may arrange the kernels into a 4D structure: $N_{input_channels} times N_{output_channels} times K_{width} times K_{height}$ to make the relationship between kernels and input/output more explicit.



                Another way to show the same thing is to treat the kernels as 3D, and ensure that the kernel depth $K_{depth}$ is the same as the number of input channels $N_{input_channels}$ - this is same thing mathematically as summing up multiple separate convolutions, and dimensions might be $N_{output_channels} times K_{width} times K_{height} times K_{depth}$






                share|improve this answer











                $endgroup$
















                  1












                  1








                  1





                  $begingroup$

                  In the linked tutorial, each kernel is 2-dimensional, and applied to a single channel/feature map from its input. To construct a feature map, the output from $N_{input_channels}$ convolutions are added together - this is important as it allows to build feature maps that have interactions between feature maps in the previous layer.



                  From the diagram, the first input layer has 1 channel (a greyscale image), so each kernel in layer 1 will generate a feature map. However, once you have 64 channels in layer 2, then to produce each feature map in layer 3 will require 64 kernels added together. If you want 256 feature maps in layer 3, and you expect all 64 inputs to affect each one, then you usually need 64 * 256 = 16384 kernels. The value 4096 is coming from some other aspect of the architecture not shown in the diagram, such as dividing the feature map into groups so that each output layer only processes a fraction of the input layers.



                  There are a few notation and presentation differences between presentations and tutorials on convolutional networks, depending on the source. This is one of them. Other sources may arrange the kernels into a 4D structure: $N_{input_channels} times N_{output_channels} times K_{width} times K_{height}$ to make the relationship between kernels and input/output more explicit.



                  Another way to show the same thing is to treat the kernels as 3D, and ensure that the kernel depth $K_{depth}$ is the same as the number of input channels $N_{input_channels}$ - this is same thing mathematically as summing up multiple separate convolutions, and dimensions might be $N_{output_channels} times K_{width} times K_{height} times K_{depth}$






                  share|improve this answer











                  $endgroup$



                  In the linked tutorial, each kernel is 2-dimensional, and applied to a single channel/feature map from its input. To construct a feature map, the output from $N_{input_channels}$ convolutions are added together - this is important as it allows to build feature maps that have interactions between feature maps in the previous layer.



                  From the diagram, the first input layer has 1 channel (a greyscale image), so each kernel in layer 1 will generate a feature map. However, once you have 64 channels in layer 2, then to produce each feature map in layer 3 will require 64 kernels added together. If you want 256 feature maps in layer 3, and you expect all 64 inputs to affect each one, then you usually need 64 * 256 = 16384 kernels. The value 4096 is coming from some other aspect of the architecture not shown in the diagram, such as dividing the feature map into groups so that each output layer only processes a fraction of the input layers.



                  There are a few notation and presentation differences between presentations and tutorials on convolutional networks, depending on the source. This is one of them. Other sources may arrange the kernels into a 4D structure: $N_{input_channels} times N_{output_channels} times K_{width} times K_{height}$ to make the relationship between kernels and input/output more explicit.



                  Another way to show the same thing is to treat the kernels as 3D, and ensure that the kernel depth $K_{depth}$ is the same as the number of input channels $N_{input_channels}$ - this is same thing mathematically as summing up multiple separate convolutions, and dimensions might be $N_{output_channels} times K_{width} times K_{height} times K_{depth}$







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 30 '17 at 10:04

























                  answered Nov 30 '17 at 8:14









                  Neil SlaterNeil Slater

                  16.7k22861




                  16.7k22861























                      1












                      $begingroup$

                      The output of a convolution layer is computed as the following:



                      the depth (No of feature maps) is equal to the number of filters applied in this layer (because each added channel is a result of one filter's feature map)



                      the width ( the same for height) is computed according to the following equation



                      $W=frac{W−F+2P}{S} + 1$ where $F$ is the receptive field (filter width), $P$ is the padding and $S$ is the stride.



                      For more details see the examples in the following link



                      CS231n Convolutional Neural Networks for Visual Recognition






                      share|improve this answer











                      $endgroup$


















                        1












                        $begingroup$

                        The output of a convolution layer is computed as the following:



                        the depth (No of feature maps) is equal to the number of filters applied in this layer (because each added channel is a result of one filter's feature map)



                        the width ( the same for height) is computed according to the following equation



                        $W=frac{W−F+2P}{S} + 1$ where $F$ is the receptive field (filter width), $P$ is the padding and $S$ is the stride.



                        For more details see the examples in the following link



                        CS231n Convolutional Neural Networks for Visual Recognition






                        share|improve this answer











                        $endgroup$
















                          1












                          1








                          1





                          $begingroup$

                          The output of a convolution layer is computed as the following:



                          the depth (No of feature maps) is equal to the number of filters applied in this layer (because each added channel is a result of one filter's feature map)



                          the width ( the same for height) is computed according to the following equation



                          $W=frac{W−F+2P}{S} + 1$ where $F$ is the receptive field (filter width), $P$ is the padding and $S$ is the stride.



                          For more details see the examples in the following link



                          CS231n Convolutional Neural Networks for Visual Recognition






                          share|improve this answer











                          $endgroup$



                          The output of a convolution layer is computed as the following:



                          the depth (No of feature maps) is equal to the number of filters applied in this layer (because each added channel is a result of one filter's feature map)



                          the width ( the same for height) is computed according to the following equation



                          $W=frac{W−F+2P}{S} + 1$ where $F$ is the receptive field (filter width), $P$ is the padding and $S$ is the stride.



                          For more details see the examples in the following link



                          CS231n Convolutional Neural Networks for Visual Recognition







                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited 14 mins ago









                          Jessica Lee

                          31




                          31










                          answered Feb 28 '18 at 18:02









                          AdityaAditya

                          1,4101525




                          1,4101525






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Data Science Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f25231%2fthe-relationship-between-the-number-of-filters-kernels-and-the-number-of-feature%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Aikido

                              Tivadar Csontváry Kosztka

                              Metroo de Marsejlo