Why does my Keras model learn to recognize the background?

I'm trying to train this Keras implementation of Deeplabv3+ on Pascal VOC2012, using the pretrained model (which was also trained on that dataset).

I got weird results with the accuracy quickly converging to 1.0:

5/5 [==============================] - 182s 36s/step - loss: 26864.4418 - acc: 0.7669 - val_loss: 19385.8555 - val_acc: 0.4818

Epoch 2/3

5/5 [==============================] - 77s 15s/step - loss: 42117.3555 - acc: 0.9815 - val_loss: 69088.5469 - val_acc: 0.9948

Epoch 3/3

5/5 [==============================] - 78s 16s/step - loss: 45300.6992 - acc: 1.0000 - val_loss: 44569.9414 - val_acc: 1.0000

Testing the model also gives 100% accuracy.

I decided to plot predictions on the same set of random images before and after training, and found that the model is encouraged to say everything is just background (that's the 1st class in Pascal VOC2012).

enter image description here

I'm quite new to deep learning and would need help to figure out where this could come from.

I thought that perhaps it could be my loss function, which I defined as:

def image_categorical_cross_entropy(y_true, y_pred):

    """

    :param y_true: tensor of shape (batch_size, height, width) representing the ground truth.

    :param y_pred: tensor of shape (batch_size, height, width) representing the prediction.

    :return: The mean cross-entropy on softmaxed tensors.

    """

    return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=y_pred, labels=y_true))

I am a bit uncertain on whether my tensors have the right shape. I'm using TF's dataset API to load .tfrecord files, and my annotation tensor is of shape (batch_size, height, width). Would (batch_size, height, width, 21) be what's needed? Other errors from inside the model arise when I try to separate the annotation image into a tensor containing 21 images (one for each class):

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [12,512,512,21] vs. [12,512,512]

         [[Node: metrics/acc/Equal = Equal[T=DT_INT64, _device="/job:localhost/replica:0/task:0/device:GPU:0"](metrics/acc/ArgMax, metrics/acc/ArgMax_1)]]

         [[Node: training/Adam/gradients/bilinear_upsampling_2_1/concat_grad/Slice_1/_13277 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_62151_training/Adam/gradients/bilinear_upsampling_2_1/concat_grad/Slice_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:1"]()]]

Thank you for your help!

edited Oct 5 '18 at 17:31

asked Oct 3 '18 at 15:06

Matt

715

5

$begingroup$
Quite a few items to consider here, I almost don't know where to start (1) are you using a sample size of 5 for training??? (2) What, if any, pre-processing are you doing to your images? I have a feeling that the answer lies within this and (3) you'd have to provide a lot more info on your model. How many labeled samples do you have? How many possible categories? Do you have a balanced training set? (4) your accuracy of 1.0 basically means nothing because your loss is super-high and increasing. Your loss should decrease as your accuracy improves.
$endgroup$
– I_Play_With_Data
Oct 3 '18 at 15:54

$begingroup$
(1) I'm using batches of size 12 but this is kind of irrelevant I think. I only showed 3 small epochs of only 5 steps here because that's just how quickly it converges. (2) My preprocessing consists of some augmentation and rescaling (possibly cropping) to 512x512 for every image and its associated annotation. (3) there are about 11,500 labeled images in Pascal VOC 2012. Granted most papers reach 85%+ mIOU on this dataset, I would assume it's balanced. There's 20 different categories in this dataset plus one for the background or « ambiguous », for a total of 21.
$endgroup$
– Matt
Oct 3 '18 at 16:19

$begingroup$
I'm curios. Did you find the reason for your model's results?
$endgroup$
– Mark.F
Dec 13 '18 at 10:49

3

$begingroup$
If you shared your code, it would be possible to find the mistake.
$endgroup$
– Dmytro Prylipko
Jan 11 at 15:03

1

$begingroup$
The fact that a pre-trained model finds a way to get 100% accuracy within 3 epochs, using the same data as was originally used, makes me think the bug is that your training labels are wrong, perhaps all set to the label that corresponds to background. In any case, have a look at this issue thread, where people discuss their problems and solutions for fine-tuning the model. The model isn't necessarily broken, and the batchnorm bug in Tensorflow can be addressed.
$endgroup$
– n1k31t4
Mar 2 at 13:28

|
show 1 more comment

I'm trying to train this Keras implementation of Deeplabv3+ on Pascal VOC2012, using the pretrained model (which was also trained on that dataset).

I got weird results with the accuracy quickly converging to 1.0:

5/5 [==============================] - 182s 36s/step - loss: 26864.4418 - acc: 0.7669 - val_loss: 19385.8555 - val_acc: 0.4818

Epoch 2/3

5/5 [==============================] - 77s 15s/step - loss: 42117.3555 - acc: 0.9815 - val_loss: 69088.5469 - val_acc: 0.9948

Epoch 3/3

5/5 [==============================] - 78s 16s/step - loss: 45300.6992 - acc: 1.0000 - val_loss: 44569.9414 - val_acc: 1.0000

Testing the model also gives 100% accuracy.

enter image description here

I'm quite new to deep learning and would need help to figure out where this could come from.

I thought that perhaps it could be my loss function, which I defined as:

def image_categorical_cross_entropy(y_true, y_pred):

    """

    :param y_true: tensor of shape (batch_size, height, width) representing the ground truth.

    :param y_pred: tensor of shape (batch_size, height, width) representing the prediction.

    :return: The mean cross-entropy on softmaxed tensors.

    """

    return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=y_pred, labels=y_true))

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [12,512,512,21] vs. [12,512,512]

         [[Node: metrics/acc/Equal = Equal[T=DT_INT64, _device="/job:localhost/replica:0/task:0/device:GPU:0"](metrics/acc/ArgMax, metrics/acc/ArgMax_1)]]

         [[Node: training/Adam/gradients/bilinear_upsampling_2_1/concat_grad/Slice_1/_13277 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_62151_training/Adam/gradients/bilinear_upsampling_2_1/concat_grad/Slice_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:1"]()]]

Thank you for your help!

edited Oct 5 '18 at 17:31

asked Oct 3 '18 at 15:06

Matt

715

5

$begingroup$
Quite a few items to consider here, I almost don't know where to start (1) are you using a sample size of 5 for training??? (2) What, if any, pre-processing are you doing to your images? I have a feeling that the answer lies within this and (3) you'd have to provide a lot more info on your model. How many labeled samples do you have? How many possible categories? Do you have a balanced training set? (4) your accuracy of 1.0 basically means nothing because your loss is super-high and increasing. Your loss should decrease as your accuracy improves.
$endgroup$
– I_Play_With_Data
Oct 3 '18 at 15:54

$begingroup$
(1) I'm using batches of size 12 but this is kind of irrelevant I think. I only showed 3 small epochs of only 5 steps here because that's just how quickly it converges. (2) My preprocessing consists of some augmentation and rescaling (possibly cropping) to 512x512 for every image and its associated annotation. (3) there are about 11,500 labeled images in Pascal VOC 2012. Granted most papers reach 85%+ mIOU on this dataset, I would assume it's balanced. There's 20 different categories in this dataset plus one for the background or « ambiguous », for a total of 21.
$endgroup$
– Matt
Oct 3 '18 at 16:19

$begingroup$
I'm curios. Did you find the reason for your model's results?
$endgroup$
– Mark.F
Dec 13 '18 at 10:49

3

$begingroup$
If you shared your code, it would be possible to find the mistake.
$endgroup$
– Dmytro Prylipko
Jan 11 at 15:03

1

$begingroup$
The fact that a pre-trained model finds a way to get 100% accuracy within 3 epochs, using the same data as was originally used, makes me think the bug is that your training labels are wrong, perhaps all set to the label that corresponds to background. In any case, have a look at this issue thread, where people discuss their problems and solutions for fine-tuning the model. The model isn't necessarily broken, and the batchnorm bug in Tensorflow can be addressed.
$endgroup$
– n1k31t4
Mar 2 at 13:28

|
show 1 more comment

I'm trying to train this Keras implementation of Deeplabv3+ on Pascal VOC2012, using the pretrained model (which was also trained on that dataset).

I got weird results with the accuracy quickly converging to 1.0:

5/5 [==============================] - 182s 36s/step - loss: 26864.4418 - acc: 0.7669 - val_loss: 19385.8555 - val_acc: 0.4818

Epoch 2/3

5/5 [==============================] - 77s 15s/step - loss: 42117.3555 - acc: 0.9815 - val_loss: 69088.5469 - val_acc: 0.9948

Epoch 3/3

5/5 [==============================] - 78s 16s/step - loss: 45300.6992 - acc: 1.0000 - val_loss: 44569.9414 - val_acc: 1.0000

Testing the model also gives 100% accuracy.

enter image description here

I'm quite new to deep learning and would need help to figure out where this could come from.

I thought that perhaps it could be my loss function, which I defined as:

def image_categorical_cross_entropy(y_true, y_pred):

    """

    :param y_true: tensor of shape (batch_size, height, width) representing the ground truth.

    :param y_pred: tensor of shape (batch_size, height, width) representing the prediction.

    :return: The mean cross-entropy on softmaxed tensors.

    """

    return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=y_pred, labels=y_true))

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [12,512,512,21] vs. [12,512,512]

         [[Node: metrics/acc/Equal = Equal[T=DT_INT64, _device="/job:localhost/replica:0/task:0/device:GPU:0"](metrics/acc/ArgMax, metrics/acc/ArgMax_1)]]

         [[Node: training/Adam/gradients/bilinear_upsampling_2_1/concat_grad/Slice_1/_13277 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_62151_training/Adam/gradients/bilinear_upsampling_2_1/concat_grad/Slice_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:1"]()]]

Thank you for your help!

edited Oct 5 '18 at 17:31

asked Oct 3 '18 at 15:06

Matt

715

I'm trying to train this Keras implementation of Deeplabv3+ on Pascal VOC2012, using the pretrained model (which was also trained on that dataset).

I got weird results with the accuracy quickly converging to 1.0:

5/5 [==============================] - 182s 36s/step - loss: 26864.4418 - acc: 0.7669 - val_loss: 19385.8555 - val_acc: 0.4818

Epoch 2/3

5/5 [==============================] - 77s 15s/step - loss: 42117.3555 - acc: 0.9815 - val_loss: 69088.5469 - val_acc: 0.9948

Epoch 3/3

5/5 [==============================] - 78s 16s/step - loss: 45300.6992 - acc: 1.0000 - val_loss: 44569.9414 - val_acc: 1.0000

Testing the model also gives 100% accuracy.

enter image description here

I'm quite new to deep learning and would need help to figure out where this could come from.

I thought that perhaps it could be my loss function, which I defined as:

def image_categorical_cross_entropy(y_true, y_pred):

    """

    :param y_true: tensor of shape (batch_size, height, width) representing the ground truth.

    :param y_pred: tensor of shape (batch_size, height, width) representing the prediction.

    :return: The mean cross-entropy on softmaxed tensors.

    """

    return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=y_pred, labels=y_true))

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [12,512,512,21] vs. [12,512,512]

         [[Node: metrics/acc/Equal = Equal[T=DT_INT64, _device="/job:localhost/replica:0/task:0/device:GPU:0"](metrics/acc/ArgMax, metrics/acc/ArgMax_1)]]

         [[Node: training/Adam/gradients/bilinear_upsampling_2_1/concat_grad/Slice_1/_13277 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_62151_training/Adam/gradients/bilinear_upsampling_2_1/concat_grad/Slice_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:1"]()]]

Thank you for your help!

python deep-learning keras tensorflow

edited Oct 5 '18 at 17:31

asked Oct 3 '18 at 15:06

Matt

715

edited Oct 5 '18 at 17:31

asked Oct 3 '18 at 15:06

Matt

715

edited Oct 5 '18 at 17:31

asked Oct 3 '18 at 15:06

Matt

715

asked Oct 3 '18 at 15:06

Matt

715

asked Oct 3 '18 at 15:06

Matt

715

5

$begingroup$
Quite a few items to consider here, I almost don't know where to start (1) are you using a sample size of 5 for training??? (2) What, if any, pre-processing are you doing to your images? I have a feeling that the answer lies within this and (3) you'd have to provide a lot more info on your model. How many labeled samples do you have? How many possible categories? Do you have a balanced training set? (4) your accuracy of 1.0 basically means nothing because your loss is super-high and increasing. Your loss should decrease as your accuracy improves.
$endgroup$
– I_Play_With_Data
Oct 3 '18 at 15:54

$begingroup$
(1) I'm using batches of size 12 but this is kind of irrelevant I think. I only showed 3 small epochs of only 5 steps here because that's just how quickly it converges. (2) My preprocessing consists of some augmentation and rescaling (possibly cropping) to 512x512 for every image and its associated annotation. (3) there are about 11,500 labeled images in Pascal VOC 2012. Granted most papers reach 85%+ mIOU on this dataset, I would assume it's balanced. There's 20 different categories in this dataset plus one for the background or « ambiguous », for a total of 21.
$endgroup$
– Matt
Oct 3 '18 at 16:19

$begingroup$
I'm curios. Did you find the reason for your model's results?
$endgroup$
– Mark.F
Dec 13 '18 at 10:49

3

$begingroup$
If you shared your code, it would be possible to find the mistake.
$endgroup$
– Dmytro Prylipko
Jan 11 at 15:03

1

$begingroup$
The fact that a pre-trained model finds a way to get 100% accuracy within 3 epochs, using the same data as was originally used, makes me think the bug is that your training labels are wrong, perhaps all set to the label that corresponds to background. In any case, have a look at this issue thread, where people discuss their problems and solutions for fine-tuning the model. The model isn't necessarily broken, and the batchnorm bug in Tensorflow can be addressed.
$endgroup$
– n1k31t4
Mar 2 at 13:28

|
show 1 more comment

5

$begingroup$
Quite a few items to consider here, I almost don't know where to start (1) are you using a sample size of 5 for training??? (2) What, if any, pre-processing are you doing to your images? I have a feeling that the answer lies within this and (3) you'd have to provide a lot more info on your model. How many labeled samples do you have? How many possible categories? Do you have a balanced training set? (4) your accuracy of 1.0 basically means nothing because your loss is super-high and increasing. Your loss should decrease as your accuracy improves.
$endgroup$
– I_Play_With_Data
Oct 3 '18 at 15:54

$begingroup$
(1) I'm using batches of size 12 but this is kind of irrelevant I think. I only showed 3 small epochs of only 5 steps here because that's just how quickly it converges. (2) My preprocessing consists of some augmentation and rescaling (possibly cropping) to 512x512 for every image and its associated annotation. (3) there are about 11,500 labeled images in Pascal VOC 2012. Granted most papers reach 85%+ mIOU on this dataset, I would assume it's balanced. There's 20 different categories in this dataset plus one for the background or « ambiguous », for a total of 21.
$endgroup$
– Matt
Oct 3 '18 at 16:19

$begingroup$
I'm curios. Did you find the reason for your model's results?
$endgroup$
– Mark.F
Dec 13 '18 at 10:49

3

$begingroup$
If you shared your code, it would be possible to find the mistake.
$endgroup$
– Dmytro Prylipko
Jan 11 at 15:03

1

$begingroup$
The fact that a pre-trained model finds a way to get 100% accuracy within 3 epochs, using the same data as was originally used, makes me think the bug is that your training labels are wrong, perhaps all set to the label that corresponds to background. In any case, have a look at this issue thread, where people discuss their problems and solutions for fine-tuning the model. The model isn't necessarily broken, and the batchnorm bug in Tensorflow can be addressed.
$endgroup$
– n1k31t4
Mar 2 at 13:28

Quite a few items to consider here, I almost don't know where to start (1) are you using a sample size of 5 for training??? (2) What, if any, pre-processing are you doing to your images? I have a feeling that the answer lies within this and (3) you'd have to provide a lot more info on your model. How many labeled samples do you have? How many possible categories? Do you have a balanced training set? (4) your accuracy of 1.0 basically means nothing because your loss is super-high and increasing. Your loss should decrease as your accuracy improves.

– I_Play_With_Data
Oct 3 '18 at 15:54

(1) I'm using batches of size 12 but this is kind of irrelevant I think. I only showed 3 small epochs of only 5 steps here because that's just how quickly it converges. (2) My preprocessing consists of some augmentation and rescaling (possibly cropping) to 512x512 for every image and its associated annotation. (3) there are about 11,500 labeled images in Pascal VOC 2012. Granted most papers reach 85%+ mIOU on this dataset, I would assume it's balanced. There's 20 different categories in this dataset plus one for the background or « ambiguous », for a total of 21.

– Matt
Oct 3 '18 at 16:19

I'm curios. Did you find the reason for your model's results?

– Mark.F
Dec 13 '18 at 10:49

If you shared your code, it would be possible to find the mistake.

– Dmytro Prylipko
Jan 11 at 15:03

The fact that a pre-trained model finds a way to get 100% accuracy within 3 epochs, using the same data as was originally used, makes me think the bug is that your training labels are wrong, perhaps all set to the label that corresponds to background. In any case, have a look at this issue thread, where people discuss their problems and solutions for fine-tuning the model. The model isn't necessarily broken, and the batchnorm bug in Tensorflow can be addressed.

– n1k31t4
Mar 2 at 13:28

|
show 1 more comment

1 Answer
1

active

oldest

votes

Your model is overfitting. Each epoch only has 5 images. The model is "memorizing" the answer for each image.

In order to minimize the chance of overfitting, increase the number of images. There should be several thousand example images for each category of object.

answered 4 hours ago

Brian Spiering

4,2331129

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f39117%2fwhy-does-my-keras-model-learn-to-recognize-the-background%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Your model is overfitting. Each epoch only has 5 images. The model is "memorizing" the answer for each image.

In order to minimize the chance of overfitting, increase the number of images. There should be several thousand example images for each category of object.

answered 4 hours ago

Brian Spiering

4,2331129

add a comment |

Your model is overfitting. Each epoch only has 5 images. The model is "memorizing" the answer for each image.

In order to minimize the chance of overfitting, increase the number of images. There should be several thousand example images for each category of object.

answered 4 hours ago

Brian Spiering

4,2331129

add a comment |

Your model is overfitting. Each epoch only has 5 images. The model is "memorizing" the answer for each image.

In order to minimize the chance of overfitting, increase the number of images. There should be several thousand example images for each category of object.

answered 4 hours ago

Brian Spiering

4,2331129

Your model is overfitting. Each epoch only has 5 images. The model is "memorizing" the answer for each image.

In order to minimize the chance of overfitting, increase the number of images. There should be several thousand example images for each category of object.

answered 4 hours ago

Brian Spiering

4,2331129

answered 4 hours ago

Brian Spiering

4,2331129

answered 4 hours ago

Brian Spiering

4,2331129

answered 4 hours ago

Brian Spiering

4,2331129

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Gfyuki