the relationship between the number of filters/kernels and the number of feature maps

Hierarchical Models Of Perception and Reasoning by Yann LeCun

The above image is from "Deep Learning Tutorial" by Yann LeCun and Marc'Aurelio Ranzato (see pages 73 and 81).

I don't understand why 64 kernels (from the input to layer 1) produce 64 feature maps while 4096 kernels (layer 2 to layer 3) give 256 feature maps.

asked Nov 30 '17 at 0:04

chaohuang

12612

add a comment |

Hierarchical Models Of Perception and Reasoning by Yann LeCun

The above image is from "Deep Learning Tutorial" by Yann LeCun and Marc'Aurelio Ranzato (see pages 73 and 81).

I don't understand why 64 kernels (from the input to layer 1) produce 64 feature maps while 4096 kernels (layer 2 to layer 3) give 256 feature maps.

asked Nov 30 '17 at 0:04

chaohuang

12612

add a comment |

Hierarchical Models Of Perception and Reasoning by Yann LeCun

The above image is from "Deep Learning Tutorial" by Yann LeCun and Marc'Aurelio Ranzato (see pages 73 and 81).

I don't understand why 64 kernels (from the input to layer 1) produce 64 feature maps while 4096 kernels (layer 2 to layer 3) give 256 feature maps.

asked Nov 30 '17 at 0:04

chaohuang

12612

Hierarchical Models Of Perception and Reasoning by Yann LeCun

The above image is from "Deep Learning Tutorial" by Yann LeCun and Marc'Aurelio Ranzato (see pages 73 and 81).

I don't understand why 64 kernels (from the input to layer 1) produce 64 feature maps while 4096 kernels (layer 2 to layer 3) give 256 feature maps.

neural-network deep-learning cnn

asked Nov 30 '17 at 0:04

chaohuang

12612

asked Nov 30 '17 at 0:04

chaohuang

12612

asked Nov 30 '17 at 0:04

chaohuang

12612

asked Nov 30 '17 at 0:04

chaohuang

12612

asked Nov 30 '17 at 0:04

chaohuang

12612

add a comment |

2 Answers
2

active

oldest

votes

In the linked tutorial, each kernel is 2-dimensional, and applied to a single channel/feature map from its input. To construct a feature map, the output from $N_{input_channels}$ convolutions are added together - this is important as it allows to build feature maps that have interactions between feature maps in the previous layer.

From the diagram, the first input layer has 1 channel (a greyscale image), so each kernel in layer 1 will generate a feature map. However, once you have 64 channels in layer 2, then to produce each feature map in layer 3 will require 64 kernels added together. If you want 256 feature maps in layer 3, and you expect all 64 inputs to affect each one, then you usually need 64 * 256 = 16384 kernels. The value 4096 is coming from some other aspect of the architecture not shown in the diagram, such as dividing the feature map into groups so that each output layer only processes a fraction of the input layers.

There are a few notation and presentation differences between presentations and tutorials on convolutional networks, depending on the source. This is one of them. Other sources may arrange the kernels into a 4D structure: $N_{input_channels} times N_{output_channels} times K_{width} times K_{height}$ to make the relationship between kernels and input/output more explicit.

Another way to show the same thing is to treat the kernels as 3D, and ensure that the kernel depth $K_{depth}$ is the same as the number of input channels $N_{input_channels}$ - this is same thing mathematically as summing up multiple separate convolutions, and dimensions might be $N_{output_channels} times K_{width} times K_{height} times K_{depth}$

edited Nov 30 '17 at 10:04

answered Nov 30 '17 at 8:14

Neil Slater

16.7k22861

add a comment |

The output of a convolution layer is computed as the following:

the depth (No of feature maps) is equal to the number of filters applied in this layer (because each added channel is a result of one filter's feature map)

the width ( the same for height) is computed according to the following equation

$W=frac{W−F+2P}{S} + 1$ where $F$ is the receptive field (filter width), $P$ is the padding and $S$ is the stride.

For more details see the examples in the following link

CS231n Convolutional Neural Networks for Visual Recognition

edited 14 mins ago

Jessica Lee

answered Feb 28 '18 at 18:02

Aditya

1,4101525

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f25231%2fthe-relationship-between-the-number-of-filters-kernels-and-the-number-of-feature%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

edited Nov 30 '17 at 10:04

answered Nov 30 '17 at 8:14

Neil Slater

16.7k22861

add a comment |

edited Nov 30 '17 at 10:04

answered Nov 30 '17 at 8:14

Neil Slater

16.7k22861

add a comment |

edited Nov 30 '17 at 10:04

answered Nov 30 '17 at 8:14

Neil Slater

16.7k22861

edited Nov 30 '17 at 10:04

answered Nov 30 '17 at 8:14

Neil Slater

16.7k22861

edited Nov 30 '17 at 10:04

answered Nov 30 '17 at 8:14

Neil Slater

16.7k22861

answered Nov 30 '17 at 8:14

Neil Slater

16.7k22861

answered Nov 30 '17 at 8:14

Neil Slater

16.7k22861

add a comment |

The output of a convolution layer is computed as the following:

the depth (No of feature maps) is equal to the number of filters applied in this layer (because each added channel is a result of one filter's feature map)

the width ( the same for height) is computed according to the following equation

$W=frac{W−F+2P}{S} + 1$ where $F$ is the receptive field (filter width), $P$ is the padding and $S$ is the stride.

For more details see the examples in the following link

CS231n Convolutional Neural Networks for Visual Recognition

edited 14 mins ago

Jessica Lee

answered Feb 28 '18 at 18:02

Aditya

1,4101525

add a comment |

The output of a convolution layer is computed as the following:

the depth (No of feature maps) is equal to the number of filters applied in this layer (because each added channel is a result of one filter's feature map)

the width ( the same for height) is computed according to the following equation

$W=frac{W−F+2P}{S} + 1$ where $F$ is the receptive field (filter width), $P$ is the padding and $S$ is the stride.

For more details see the examples in the following link

CS231n Convolutional Neural Networks for Visual Recognition

edited 14 mins ago

Jessica Lee

answered Feb 28 '18 at 18:02

Aditya

1,4101525

add a comment |

The output of a convolution layer is computed as the following:

the depth (No of feature maps) is equal to the number of filters applied in this layer (because each added channel is a result of one filter's feature map)

the width ( the same for height) is computed according to the following equation

$W=frac{W−F+2P}{S} + 1$ where $F$ is the receptive field (filter width), $P$ is the padding and $S$ is the stride.

For more details see the examples in the following link

CS231n Convolutional Neural Networks for Visual Recognition

edited 14 mins ago

Jessica Lee

answered Feb 28 '18 at 18:02

Aditya

1,4101525

The output of a convolution layer is computed as the following:

the depth (No of feature maps) is equal to the number of filters applied in this layer (because each added channel is a result of one filter's feature map)

the width ( the same for height) is computed according to the following equation

$W=frac{W−F+2P}{S} + 1$ where $F$ is the receptive field (filter width), $P$ is the padding and $S$ is the stride.

For more details see the examples in the following link

CS231n Convolutional Neural Networks for Visual Recognition

edited 14 mins ago

Jessica Lee

answered Feb 28 '18 at 18:02

Aditya

1,4101525

edited 14 mins ago

Jessica Lee

edited 14 mins ago

Jessica Lee

edited 14 mins ago

Jessica Lee

answered Feb 28 '18 at 18:02

Aditya

1,4101525

answered Feb 28 '18 at 18:02

Aditya

1,4101525

answered Feb 28 '18 at 18:02

Aditya

1,4101525

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Gfyuki