Training of Region Proposal Network (RPN)
$begingroup$
There are some interesting literature about RPNs (Region Proposal Network). The most concise and helpful documentation that I found so far is the following: https://www.quora.com/How-does-the-region-proposal-network-RPN-in-Faster-R-CNN-work?share=1.
But there is something that I still don't understand through my various lectures. RPNs are designed to propose several candidate regions. From which, a selection will be done to know which candidates fits our needs.
But, RPNs and neural network in general are deterministic. Thus, once trained, they will always produce the same output for a given input; there is no way to query new candidates given the same input image. As far as I understood, RPNs are trained to produce a fix number of proposal, for each new image. But how the training work then? If the RPN has to produce 300 candidates, what should be the labeled data that we use for the training, knowing that a training image probably won't have more than 5 golden truth bounding boxes?
And then, knowing that the bounding box sizes are not consistent among candidates, how does the CNN behind operates with the different size of the input?
machine-learning neural-network deep-learning
$endgroup$
add a comment |
$begingroup$
There are some interesting literature about RPNs (Region Proposal Network). The most concise and helpful documentation that I found so far is the following: https://www.quora.com/How-does-the-region-proposal-network-RPN-in-Faster-R-CNN-work?share=1.
But there is something that I still don't understand through my various lectures. RPNs are designed to propose several candidate regions. From which, a selection will be done to know which candidates fits our needs.
But, RPNs and neural network in general are deterministic. Thus, once trained, they will always produce the same output for a given input; there is no way to query new candidates given the same input image. As far as I understood, RPNs are trained to produce a fix number of proposal, for each new image. But how the training work then? If the RPN has to produce 300 candidates, what should be the labeled data that we use for the training, knowing that a training image probably won't have more than 5 golden truth bounding boxes?
And then, knowing that the bounding box sizes are not consistent among candidates, how does the CNN behind operates with the different size of the input?
machine-learning neural-network deep-learning
$endgroup$
$begingroup$
For reference, I add another interesting post that I found: datascience.stackexchange.com/questions/27277/…
$endgroup$
– Emile D.
Jun 20 '18 at 21:22
add a comment |
$begingroup$
There are some interesting literature about RPNs (Region Proposal Network). The most concise and helpful documentation that I found so far is the following: https://www.quora.com/How-does-the-region-proposal-network-RPN-in-Faster-R-CNN-work?share=1.
But there is something that I still don't understand through my various lectures. RPNs are designed to propose several candidate regions. From which, a selection will be done to know which candidates fits our needs.
But, RPNs and neural network in general are deterministic. Thus, once trained, they will always produce the same output for a given input; there is no way to query new candidates given the same input image. As far as I understood, RPNs are trained to produce a fix number of proposal, for each new image. But how the training work then? If the RPN has to produce 300 candidates, what should be the labeled data that we use for the training, knowing that a training image probably won't have more than 5 golden truth bounding boxes?
And then, knowing that the bounding box sizes are not consistent among candidates, how does the CNN behind operates with the different size of the input?
machine-learning neural-network deep-learning
$endgroup$
There are some interesting literature about RPNs (Region Proposal Network). The most concise and helpful documentation that I found so far is the following: https://www.quora.com/How-does-the-region-proposal-network-RPN-in-Faster-R-CNN-work?share=1.
But there is something that I still don't understand through my various lectures. RPNs are designed to propose several candidate regions. From which, a selection will be done to know which candidates fits our needs.
But, RPNs and neural network in general are deterministic. Thus, once trained, they will always produce the same output for a given input; there is no way to query new candidates given the same input image. As far as I understood, RPNs are trained to produce a fix number of proposal, for each new image. But how the training work then? If the RPN has to produce 300 candidates, what should be the labeled data that we use for the training, knowing that a training image probably won't have more than 5 golden truth bounding boxes?
And then, knowing that the bounding box sizes are not consistent among candidates, how does the CNN behind operates with the different size of the input?
machine-learning neural-network deep-learning
machine-learning neural-network deep-learning
asked Jun 20 '18 at 21:20
Emile D.Emile D.
114
114
$begingroup$
For reference, I add another interesting post that I found: datascience.stackexchange.com/questions/27277/…
$endgroup$
– Emile D.
Jun 20 '18 at 21:22
add a comment |
$begingroup$
For reference, I add another interesting post that I found: datascience.stackexchange.com/questions/27277/…
$endgroup$
– Emile D.
Jun 20 '18 at 21:22
$begingroup$
For reference, I add another interesting post that I found: datascience.stackexchange.com/questions/27277/…
$endgroup$
– Emile D.
Jun 20 '18 at 21:22
$begingroup$
For reference, I add another interesting post that I found: datascience.stackexchange.com/questions/27277/…
$endgroup$
– Emile D.
Jun 20 '18 at 21:22
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
The first answer in your commented link answers one point about how region proposals are selected. It is the Intersection Over Union (more formally the Jaccard Index) metric. So how much of your anchor overlaps the label. There is usually a lower limit set for this metric to then filter out all the useless proposals, and the remaining matches can be sorted, choosing the best.
I recommend reading through this excellently explained version of a proposal network - Mask-R-CNN (Masked Region-based CNN).
If you prefer looking at code, there is the full repo here, implemented in Keras/Tensorflow (there is also a PyTorch implementation linked somewhere).
There is even an explanatory Jupyter notebook, which might help make things click for you.
$endgroup$
$begingroup$
Indeed. But my question was not so much about the selection of bounding box, but more about the training. I mean by that - whenever we do a backpropagation, we need to compute the loss function, and have every output compared to the golden value. But since we could have 300x4 output, and we have for instance 5x4 truth (golden) output, how do we do for the backpropagation and the training?
$endgroup$
– Emile D.
Jun 21 '18 at 14:33
add a comment |
$begingroup$
To know how RPN work for training, we can dive into the code wrote by Matterport, which is 10,000 stared and tf/keras implementaion Mask R-CNN repo.
You can check the build_rpn_targets function in mrcnn/model.py
If used the generated anchors (depends on your anchor scales, ratio, image size ...) to calculate the IOU of anchors and ground truth
# Compute overlaps [num_anchors, num_gt_boxes]
overlaps = utils.compute_overlaps(anchors, gt_boxes)
Then we know how overlaps between anchors and ground truth, we choose positive anchors and negative anchors based on their IOU with ground truth. According to Mask R-CNN paper, IOU > 0.7 will be positive anchors and < 0.3 will be negative anchors, otherwise will be neutral anchors and not used when training
# 1. Set negative anchors first. They get overwritten below if a GT box is
# matched to them.
anchor_iou_argmax = np.argmax(overlaps, axis=1)
anchor_iou_max = overlaps[np.arange(overlaps.shape[0]), anchor_iou_argmax]
rpn_match[anchor_iou_max < 0.3] = -1
# 2. Set an anchor for each GT box (regardless of IoU value).
# If multiple anchors have the same IoU match all of them
gt_iou_argmax = np.argwhere(overlaps == np.max(overlaps, axis=0))[:,0]
rpn_match[gt_iou_argmax] = 1
# 3. Set anchors with high overlap as positive.
rpn_match[anchor_iou_max >= 0.7] = 1
To effectively train RPN, you need to setup the RPN_TRAIN_ANCHORS_PER_IMAGE carefully to balance training if there is few object in one image. Please note that there can be multiple anchors match one ground truth since we can give the bbox off-set for each anchor to fit the ground truth.
Hope the answer is clear for you!
New contributor
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f33442%2ftraining-of-region-proposal-network-rpn%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The first answer in your commented link answers one point about how region proposals are selected. It is the Intersection Over Union (more formally the Jaccard Index) metric. So how much of your anchor overlaps the label. There is usually a lower limit set for this metric to then filter out all the useless proposals, and the remaining matches can be sorted, choosing the best.
I recommend reading through this excellently explained version of a proposal network - Mask-R-CNN (Masked Region-based CNN).
If you prefer looking at code, there is the full repo here, implemented in Keras/Tensorflow (there is also a PyTorch implementation linked somewhere).
There is even an explanatory Jupyter notebook, which might help make things click for you.
$endgroup$
$begingroup$
Indeed. But my question was not so much about the selection of bounding box, but more about the training. I mean by that - whenever we do a backpropagation, we need to compute the loss function, and have every output compared to the golden value. But since we could have 300x4 output, and we have for instance 5x4 truth (golden) output, how do we do for the backpropagation and the training?
$endgroup$
– Emile D.
Jun 21 '18 at 14:33
add a comment |
$begingroup$
The first answer in your commented link answers one point about how region proposals are selected. It is the Intersection Over Union (more formally the Jaccard Index) metric. So how much of your anchor overlaps the label. There is usually a lower limit set for this metric to then filter out all the useless proposals, and the remaining matches can be sorted, choosing the best.
I recommend reading through this excellently explained version of a proposal network - Mask-R-CNN (Masked Region-based CNN).
If you prefer looking at code, there is the full repo here, implemented in Keras/Tensorflow (there is also a PyTorch implementation linked somewhere).
There is even an explanatory Jupyter notebook, which might help make things click for you.
$endgroup$
$begingroup$
Indeed. But my question was not so much about the selection of bounding box, but more about the training. I mean by that - whenever we do a backpropagation, we need to compute the loss function, and have every output compared to the golden value. But since we could have 300x4 output, and we have for instance 5x4 truth (golden) output, how do we do for the backpropagation and the training?
$endgroup$
– Emile D.
Jun 21 '18 at 14:33
add a comment |
$begingroup$
The first answer in your commented link answers one point about how region proposals are selected. It is the Intersection Over Union (more formally the Jaccard Index) metric. So how much of your anchor overlaps the label. There is usually a lower limit set for this metric to then filter out all the useless proposals, and the remaining matches can be sorted, choosing the best.
I recommend reading through this excellently explained version of a proposal network - Mask-R-CNN (Masked Region-based CNN).
If you prefer looking at code, there is the full repo here, implemented in Keras/Tensorflow (there is also a PyTorch implementation linked somewhere).
There is even an explanatory Jupyter notebook, which might help make things click for you.
$endgroup$
The first answer in your commented link answers one point about how region proposals are selected. It is the Intersection Over Union (more formally the Jaccard Index) metric. So how much of your anchor overlaps the label. There is usually a lower limit set for this metric to then filter out all the useless proposals, and the remaining matches can be sorted, choosing the best.
I recommend reading through this excellently explained version of a proposal network - Mask-R-CNN (Masked Region-based CNN).
If you prefer looking at code, there is the full repo here, implemented in Keras/Tensorflow (there is also a PyTorch implementation linked somewhere).
There is even an explanatory Jupyter notebook, which might help make things click for you.
answered Jun 20 '18 at 21:40
n1k31t4n1k31t4
6,3562319
6,3562319
$begingroup$
Indeed. But my question was not so much about the selection of bounding box, but more about the training. I mean by that - whenever we do a backpropagation, we need to compute the loss function, and have every output compared to the golden value. But since we could have 300x4 output, and we have for instance 5x4 truth (golden) output, how do we do for the backpropagation and the training?
$endgroup$
– Emile D.
Jun 21 '18 at 14:33
add a comment |
$begingroup$
Indeed. But my question was not so much about the selection of bounding box, but more about the training. I mean by that - whenever we do a backpropagation, we need to compute the loss function, and have every output compared to the golden value. But since we could have 300x4 output, and we have for instance 5x4 truth (golden) output, how do we do for the backpropagation and the training?
$endgroup$
– Emile D.
Jun 21 '18 at 14:33
$begingroup$
Indeed. But my question was not so much about the selection of bounding box, but more about the training. I mean by that - whenever we do a backpropagation, we need to compute the loss function, and have every output compared to the golden value. But since we could have 300x4 output, and we have for instance 5x4 truth (golden) output, how do we do for the backpropagation and the training?
$endgroup$
– Emile D.
Jun 21 '18 at 14:33
$begingroup$
Indeed. But my question was not so much about the selection of bounding box, but more about the training. I mean by that - whenever we do a backpropagation, we need to compute the loss function, and have every output compared to the golden value. But since we could have 300x4 output, and we have for instance 5x4 truth (golden) output, how do we do for the backpropagation and the training?
$endgroup$
– Emile D.
Jun 21 '18 at 14:33
add a comment |
$begingroup$
To know how RPN work for training, we can dive into the code wrote by Matterport, which is 10,000 stared and tf/keras implementaion Mask R-CNN repo.
You can check the build_rpn_targets function in mrcnn/model.py
If used the generated anchors (depends on your anchor scales, ratio, image size ...) to calculate the IOU of anchors and ground truth
# Compute overlaps [num_anchors, num_gt_boxes]
overlaps = utils.compute_overlaps(anchors, gt_boxes)
Then we know how overlaps between anchors and ground truth, we choose positive anchors and negative anchors based on their IOU with ground truth. According to Mask R-CNN paper, IOU > 0.7 will be positive anchors and < 0.3 will be negative anchors, otherwise will be neutral anchors and not used when training
# 1. Set negative anchors first. They get overwritten below if a GT box is
# matched to them.
anchor_iou_argmax = np.argmax(overlaps, axis=1)
anchor_iou_max = overlaps[np.arange(overlaps.shape[0]), anchor_iou_argmax]
rpn_match[anchor_iou_max < 0.3] = -1
# 2. Set an anchor for each GT box (regardless of IoU value).
# If multiple anchors have the same IoU match all of them
gt_iou_argmax = np.argwhere(overlaps == np.max(overlaps, axis=0))[:,0]
rpn_match[gt_iou_argmax] = 1
# 3. Set anchors with high overlap as positive.
rpn_match[anchor_iou_max >= 0.7] = 1
To effectively train RPN, you need to setup the RPN_TRAIN_ANCHORS_PER_IMAGE carefully to balance training if there is few object in one image. Please note that there can be multiple anchors match one ground truth since we can give the bbox off-set for each anchor to fit the ground truth.
Hope the answer is clear for you!
New contributor
$endgroup$
add a comment |
$begingroup$
To know how RPN work for training, we can dive into the code wrote by Matterport, which is 10,000 stared and tf/keras implementaion Mask R-CNN repo.
You can check the build_rpn_targets function in mrcnn/model.py
If used the generated anchors (depends on your anchor scales, ratio, image size ...) to calculate the IOU of anchors and ground truth
# Compute overlaps [num_anchors, num_gt_boxes]
overlaps = utils.compute_overlaps(anchors, gt_boxes)
Then we know how overlaps between anchors and ground truth, we choose positive anchors and negative anchors based on their IOU with ground truth. According to Mask R-CNN paper, IOU > 0.7 will be positive anchors and < 0.3 will be negative anchors, otherwise will be neutral anchors and not used when training
# 1. Set negative anchors first. They get overwritten below if a GT box is
# matched to them.
anchor_iou_argmax = np.argmax(overlaps, axis=1)
anchor_iou_max = overlaps[np.arange(overlaps.shape[0]), anchor_iou_argmax]
rpn_match[anchor_iou_max < 0.3] = -1
# 2. Set an anchor for each GT box (regardless of IoU value).
# If multiple anchors have the same IoU match all of them
gt_iou_argmax = np.argwhere(overlaps == np.max(overlaps, axis=0))[:,0]
rpn_match[gt_iou_argmax] = 1
# 3. Set anchors with high overlap as positive.
rpn_match[anchor_iou_max >= 0.7] = 1
To effectively train RPN, you need to setup the RPN_TRAIN_ANCHORS_PER_IMAGE carefully to balance training if there is few object in one image. Please note that there can be multiple anchors match one ground truth since we can give the bbox off-set for each anchor to fit the ground truth.
Hope the answer is clear for you!
New contributor
$endgroup$
add a comment |
$begingroup$
To know how RPN work for training, we can dive into the code wrote by Matterport, which is 10,000 stared and tf/keras implementaion Mask R-CNN repo.
You can check the build_rpn_targets function in mrcnn/model.py
If used the generated anchors (depends on your anchor scales, ratio, image size ...) to calculate the IOU of anchors and ground truth
# Compute overlaps [num_anchors, num_gt_boxes]
overlaps = utils.compute_overlaps(anchors, gt_boxes)
Then we know how overlaps between anchors and ground truth, we choose positive anchors and negative anchors based on their IOU with ground truth. According to Mask R-CNN paper, IOU > 0.7 will be positive anchors and < 0.3 will be negative anchors, otherwise will be neutral anchors and not used when training
# 1. Set negative anchors first. They get overwritten below if a GT box is
# matched to them.
anchor_iou_argmax = np.argmax(overlaps, axis=1)
anchor_iou_max = overlaps[np.arange(overlaps.shape[0]), anchor_iou_argmax]
rpn_match[anchor_iou_max < 0.3] = -1
# 2. Set an anchor for each GT box (regardless of IoU value).
# If multiple anchors have the same IoU match all of them
gt_iou_argmax = np.argwhere(overlaps == np.max(overlaps, axis=0))[:,0]
rpn_match[gt_iou_argmax] = 1
# 3. Set anchors with high overlap as positive.
rpn_match[anchor_iou_max >= 0.7] = 1
To effectively train RPN, you need to setup the RPN_TRAIN_ANCHORS_PER_IMAGE carefully to balance training if there is few object in one image. Please note that there can be multiple anchors match one ground truth since we can give the bbox off-set for each anchor to fit the ground truth.
Hope the answer is clear for you!
New contributor
$endgroup$
To know how RPN work for training, we can dive into the code wrote by Matterport, which is 10,000 stared and tf/keras implementaion Mask R-CNN repo.
You can check the build_rpn_targets function in mrcnn/model.py
If used the generated anchors (depends on your anchor scales, ratio, image size ...) to calculate the IOU of anchors and ground truth
# Compute overlaps [num_anchors, num_gt_boxes]
overlaps = utils.compute_overlaps(anchors, gt_boxes)
Then we know how overlaps between anchors and ground truth, we choose positive anchors and negative anchors based on their IOU with ground truth. According to Mask R-CNN paper, IOU > 0.7 will be positive anchors and < 0.3 will be negative anchors, otherwise will be neutral anchors and not used when training
# 1. Set negative anchors first. They get overwritten below if a GT box is
# matched to them.
anchor_iou_argmax = np.argmax(overlaps, axis=1)
anchor_iou_max = overlaps[np.arange(overlaps.shape[0]), anchor_iou_argmax]
rpn_match[anchor_iou_max < 0.3] = -1
# 2. Set an anchor for each GT box (regardless of IoU value).
# If multiple anchors have the same IoU match all of them
gt_iou_argmax = np.argwhere(overlaps == np.max(overlaps, axis=0))[:,0]
rpn_match[gt_iou_argmax] = 1
# 3. Set anchors with high overlap as positive.
rpn_match[anchor_iou_max >= 0.7] = 1
To effectively train RPN, you need to setup the RPN_TRAIN_ANCHORS_PER_IMAGE carefully to balance training if there is few object in one image. Please note that there can be multiple anchors match one ground truth since we can give the bbox off-set for each anchor to fit the ground truth.
Hope the answer is clear for you!
New contributor
New contributor
answered 51 mins ago
jimmy15923jimmy15923
1
1
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f33442%2ftraining-of-region-proposal-network-rpn%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
For reference, I add another interesting post that I found: datascience.stackexchange.com/questions/27277/…
$endgroup$
– Emile D.
Jun 20 '18 at 21:22