the relationship between the number of filters/kernels and the number of feature maps
$begingroup$

The above image is from "Deep Learning Tutorial" by Yann LeCun and Marc'Aurelio Ranzato (see pages 73 and 81).
I don't understand why 64 kernels (from the input to layer 1) produce 64 feature maps while 4096 kernels (layer 2 to layer 3) give 256 feature maps.
neural-network deep-learning cnn
$endgroup$
add a comment |
$begingroup$

The above image is from "Deep Learning Tutorial" by Yann LeCun and Marc'Aurelio Ranzato (see pages 73 and 81).
I don't understand why 64 kernels (from the input to layer 1) produce 64 feature maps while 4096 kernels (layer 2 to layer 3) give 256 feature maps.
neural-network deep-learning cnn
$endgroup$
add a comment |
$begingroup$

The above image is from "Deep Learning Tutorial" by Yann LeCun and Marc'Aurelio Ranzato (see pages 73 and 81).
I don't understand why 64 kernels (from the input to layer 1) produce 64 feature maps while 4096 kernels (layer 2 to layer 3) give 256 feature maps.
neural-network deep-learning cnn
$endgroup$

The above image is from "Deep Learning Tutorial" by Yann LeCun and Marc'Aurelio Ranzato (see pages 73 and 81).
I don't understand why 64 kernels (from the input to layer 1) produce 64 feature maps while 4096 kernels (layer 2 to layer 3) give 256 feature maps.
neural-network deep-learning cnn
neural-network deep-learning cnn
asked Nov 30 '17 at 0:04
chaohuangchaohuang
12612
12612
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
In the linked tutorial, each kernel is 2-dimensional, and applied to a single channel/feature map from its input. To construct a feature map, the output from $N_{input_channels}$ convolutions are added together - this is important as it allows to build feature maps that have interactions between feature maps in the previous layer.
From the diagram, the first input layer has 1 channel (a greyscale image), so each kernel in layer 1 will generate a feature map. However, once you have 64 channels in layer 2, then to produce each feature map in layer 3 will require 64 kernels added together. If you want 256 feature maps in layer 3, and you expect all 64 inputs to affect each one, then you usually need 64 * 256 = 16384 kernels. The value 4096 is coming from some other aspect of the architecture not shown in the diagram, such as dividing the feature map into groups so that each output layer only processes a fraction of the input layers.
There are a few notation and presentation differences between presentations and tutorials on convolutional networks, depending on the source. This is one of them. Other sources may arrange the kernels into a 4D structure: $N_{input_channels} times N_{output_channels} times K_{width} times K_{height}$ to make the relationship between kernels and input/output more explicit.
Another way to show the same thing is to treat the kernels as 3D, and ensure that the kernel depth $K_{depth}$ is the same as the number of input channels $N_{input_channels}$ - this is same thing mathematically as summing up multiple separate convolutions, and dimensions might be $N_{output_channels} times K_{width} times K_{height} times K_{depth}$
$endgroup$
add a comment |
$begingroup$
The output of a convolution layer is computed as the following:
the depth (No of feature maps) is equal to the number of filters applied in this layer (because each added channel is a result of one filter's feature map)
the width ( the same for height) is computed according to the following equation
$W=frac{W−F+2P}{S} + 1$ where $F$ is the receptive field (filter width), $P$ is the padding and $S$ is the stride.
For more details see the examples in the following link
CS231n Convolutional Neural Networks for Visual Recognition
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f25231%2fthe-relationship-between-the-number-of-filters-kernels-and-the-number-of-feature%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
In the linked tutorial, each kernel is 2-dimensional, and applied to a single channel/feature map from its input. To construct a feature map, the output from $N_{input_channels}$ convolutions are added together - this is important as it allows to build feature maps that have interactions between feature maps in the previous layer.
From the diagram, the first input layer has 1 channel (a greyscale image), so each kernel in layer 1 will generate a feature map. However, once you have 64 channels in layer 2, then to produce each feature map in layer 3 will require 64 kernels added together. If you want 256 feature maps in layer 3, and you expect all 64 inputs to affect each one, then you usually need 64 * 256 = 16384 kernels. The value 4096 is coming from some other aspect of the architecture not shown in the diagram, such as dividing the feature map into groups so that each output layer only processes a fraction of the input layers.
There are a few notation and presentation differences between presentations and tutorials on convolutional networks, depending on the source. This is one of them. Other sources may arrange the kernels into a 4D structure: $N_{input_channels} times N_{output_channels} times K_{width} times K_{height}$ to make the relationship between kernels and input/output more explicit.
Another way to show the same thing is to treat the kernels as 3D, and ensure that the kernel depth $K_{depth}$ is the same as the number of input channels $N_{input_channels}$ - this is same thing mathematically as summing up multiple separate convolutions, and dimensions might be $N_{output_channels} times K_{width} times K_{height} times K_{depth}$
$endgroup$
add a comment |
$begingroup$
In the linked tutorial, each kernel is 2-dimensional, and applied to a single channel/feature map from its input. To construct a feature map, the output from $N_{input_channels}$ convolutions are added together - this is important as it allows to build feature maps that have interactions between feature maps in the previous layer.
From the diagram, the first input layer has 1 channel (a greyscale image), so each kernel in layer 1 will generate a feature map. However, once you have 64 channels in layer 2, then to produce each feature map in layer 3 will require 64 kernels added together. If you want 256 feature maps in layer 3, and you expect all 64 inputs to affect each one, then you usually need 64 * 256 = 16384 kernels. The value 4096 is coming from some other aspect of the architecture not shown in the diagram, such as dividing the feature map into groups so that each output layer only processes a fraction of the input layers.
There are a few notation and presentation differences between presentations and tutorials on convolutional networks, depending on the source. This is one of them. Other sources may arrange the kernels into a 4D structure: $N_{input_channels} times N_{output_channels} times K_{width} times K_{height}$ to make the relationship between kernels and input/output more explicit.
Another way to show the same thing is to treat the kernels as 3D, and ensure that the kernel depth $K_{depth}$ is the same as the number of input channels $N_{input_channels}$ - this is same thing mathematically as summing up multiple separate convolutions, and dimensions might be $N_{output_channels} times K_{width} times K_{height} times K_{depth}$
$endgroup$
add a comment |
$begingroup$
In the linked tutorial, each kernel is 2-dimensional, and applied to a single channel/feature map from its input. To construct a feature map, the output from $N_{input_channels}$ convolutions are added together - this is important as it allows to build feature maps that have interactions between feature maps in the previous layer.
From the diagram, the first input layer has 1 channel (a greyscale image), so each kernel in layer 1 will generate a feature map. However, once you have 64 channels in layer 2, then to produce each feature map in layer 3 will require 64 kernels added together. If you want 256 feature maps in layer 3, and you expect all 64 inputs to affect each one, then you usually need 64 * 256 = 16384 kernels. The value 4096 is coming from some other aspect of the architecture not shown in the diagram, such as dividing the feature map into groups so that each output layer only processes a fraction of the input layers.
There are a few notation and presentation differences between presentations and tutorials on convolutional networks, depending on the source. This is one of them. Other sources may arrange the kernels into a 4D structure: $N_{input_channels} times N_{output_channels} times K_{width} times K_{height}$ to make the relationship between kernels and input/output more explicit.
Another way to show the same thing is to treat the kernels as 3D, and ensure that the kernel depth $K_{depth}$ is the same as the number of input channels $N_{input_channels}$ - this is same thing mathematically as summing up multiple separate convolutions, and dimensions might be $N_{output_channels} times K_{width} times K_{height} times K_{depth}$
$endgroup$
In the linked tutorial, each kernel is 2-dimensional, and applied to a single channel/feature map from its input. To construct a feature map, the output from $N_{input_channels}$ convolutions are added together - this is important as it allows to build feature maps that have interactions between feature maps in the previous layer.
From the diagram, the first input layer has 1 channel (a greyscale image), so each kernel in layer 1 will generate a feature map. However, once you have 64 channels in layer 2, then to produce each feature map in layer 3 will require 64 kernels added together. If you want 256 feature maps in layer 3, and you expect all 64 inputs to affect each one, then you usually need 64 * 256 = 16384 kernels. The value 4096 is coming from some other aspect of the architecture not shown in the diagram, such as dividing the feature map into groups so that each output layer only processes a fraction of the input layers.
There are a few notation and presentation differences between presentations and tutorials on convolutional networks, depending on the source. This is one of them. Other sources may arrange the kernels into a 4D structure: $N_{input_channels} times N_{output_channels} times K_{width} times K_{height}$ to make the relationship between kernels and input/output more explicit.
Another way to show the same thing is to treat the kernels as 3D, and ensure that the kernel depth $K_{depth}$ is the same as the number of input channels $N_{input_channels}$ - this is same thing mathematically as summing up multiple separate convolutions, and dimensions might be $N_{output_channels} times K_{width} times K_{height} times K_{depth}$
edited Nov 30 '17 at 10:04
answered Nov 30 '17 at 8:14
Neil SlaterNeil Slater
16.7k22861
16.7k22861
add a comment |
add a comment |
$begingroup$
The output of a convolution layer is computed as the following:
the depth (No of feature maps) is equal to the number of filters applied in this layer (because each added channel is a result of one filter's feature map)
the width ( the same for height) is computed according to the following equation
$W=frac{W−F+2P}{S} + 1$ where $F$ is the receptive field (filter width), $P$ is the padding and $S$ is the stride.
For more details see the examples in the following link
CS231n Convolutional Neural Networks for Visual Recognition
$endgroup$
add a comment |
$begingroup$
The output of a convolution layer is computed as the following:
the depth (No of feature maps) is equal to the number of filters applied in this layer (because each added channel is a result of one filter's feature map)
the width ( the same for height) is computed according to the following equation
$W=frac{W−F+2P}{S} + 1$ where $F$ is the receptive field (filter width), $P$ is the padding and $S$ is the stride.
For more details see the examples in the following link
CS231n Convolutional Neural Networks for Visual Recognition
$endgroup$
add a comment |
$begingroup$
The output of a convolution layer is computed as the following:
the depth (No of feature maps) is equal to the number of filters applied in this layer (because each added channel is a result of one filter's feature map)
the width ( the same for height) is computed according to the following equation
$W=frac{W−F+2P}{S} + 1$ where $F$ is the receptive field (filter width), $P$ is the padding and $S$ is the stride.
For more details see the examples in the following link
CS231n Convolutional Neural Networks for Visual Recognition
$endgroup$
The output of a convolution layer is computed as the following:
the depth (No of feature maps) is equal to the number of filters applied in this layer (because each added channel is a result of one filter's feature map)
the width ( the same for height) is computed according to the following equation
$W=frac{W−F+2P}{S} + 1$ where $F$ is the receptive field (filter width), $P$ is the padding and $S$ is the stride.
For more details see the examples in the following link
CS231n Convolutional Neural Networks for Visual Recognition
edited 14 mins ago
Jessica Lee
31
31
answered Feb 28 '18 at 18:02
AdityaAditya
1,4101525
1,4101525
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f25231%2fthe-relationship-between-the-number-of-filters-kernels-and-the-number-of-feature%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown