Dealing with unbalanced error rate in confusion matrix

enter image description here

Here is the confusion matrix I got when I was playing with Forest Type Cover Kaggle dataset : Link.

In the matrix, light color and higher numbers represent higher error rates, so as you can see, lots of mis-classification happened between class 1 and 0 .

I wonder what kind of methods I can use to reduce these two error rates though improvements have been made through combing two classifiers, Random Forest and Extra Tree. Will stacking help in this case?

Data can be found on
https://www.kaggle.com/c/forest-cover-type-prediction/data

edited Dec 13 '17 at 7:59

asked Dec 13 '17 at 6:19

Chenxiong Yi

263

bumped to the homepage by Community♦ 8 hours ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

$begingroup$
The dataset is not skewed and every class has the same amount of training instances.
$endgroup$
– Chenxiong Yi
Dec 15 '17 at 7:13

add a comment |

enter image description here

Here is the confusion matrix I got when I was playing with Forest Type Cover Kaggle dataset : Link.

In the matrix, light color and higher numbers represent higher error rates, so as you can see, lots of mis-classification happened between class 1 and 0 .

Data can be found on
https://www.kaggle.com/c/forest-cover-type-prediction/data

edited Dec 13 '17 at 7:59

asked Dec 13 '17 at 6:19

Chenxiong Yi

263

bumped to the homepage by Community♦ 8 hours ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

$begingroup$
The dataset is not skewed and every class has the same amount of training instances.
$endgroup$
– Chenxiong Yi
Dec 15 '17 at 7:13

add a comment |

enter image description here

Here is the confusion matrix I got when I was playing with Forest Type Cover Kaggle dataset : Link.

In the matrix, light color and higher numbers represent higher error rates, so as you can see, lots of mis-classification happened between class 1 and 0 .

Data can be found on
https://www.kaggle.com/c/forest-cover-type-prediction/data

edited Dec 13 '17 at 7:59

asked Dec 13 '17 at 6:19

Chenxiong Yi

263

enter image description here

Here is the confusion matrix I got when I was playing with Forest Type Cover Kaggle dataset : Link.

In the matrix, light color and higher numbers represent higher error rates, so as you can see, lots of mis-classification happened between class 1 and 0 .

Data can be found on
https://www.kaggle.com/c/forest-cover-type-prediction/data

machine-learning classification confusion-matrix

edited Dec 13 '17 at 7:59

asked Dec 13 '17 at 6:19

Chenxiong Yi

263

edited Dec 13 '17 at 7:59

asked Dec 13 '17 at 6:19

Chenxiong Yi

263

edited Dec 13 '17 at 7:59

asked Dec 13 '17 at 6:19

Chenxiong Yi

263

asked Dec 13 '17 at 6:19

Chenxiong Yi

263

asked Dec 13 '17 at 6:19

Chenxiong Yi

263

bumped to the homepage by Community♦ 8 hours ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 8 hours ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

$begingroup$
The dataset is not skewed and every class has the same amount of training instances.
$endgroup$
– Chenxiong Yi
Dec 15 '17 at 7:13

add a comment |

$begingroup$
The dataset is not skewed and every class has the same amount of training instances.
$endgroup$
– Chenxiong Yi
Dec 15 '17 at 7:13

The dataset is not skewed and every class has the same amount of training instances.

– Chenxiong Yi
Dec 15 '17 at 7:13

add a comment |

1 Answer
1

active

oldest

votes

Welcome to the site!

I think Ensemble Method is very tricky. when one of the model doesn't work well then the accuracy of the Ensemble also goes down.

For instance let us consider that you are using RandomForest(RF) and Rpart for classification and RF accuracy is 90% and Rpart accuracy is 60%. If you ensemble these 2 models then the Ensemble accuracy goes down.

Coming to your scenario, you need to be very careful at the time of stacking, you need to select the models that are performing moderately and then stack them to improve the accuracy.

How is the distribution of 0/1's, if they are imbalanced then you need to balance to improve the accuracy of the model. To handle imbalance data situation we use packages like SMOTE,ROSE etc.

Feature Engineering like adding external factors or adding new features, might help you to improve your models accuracy.

Do let me know if you have any additional questions.

answered Dec 13 '17 at 6:37

Toros91

2,0042829

$begingroup$
Thank you for your answer! All classes in this data-set actually have same size, so 0 and 1 are not minority in this case. I feel that since 0 and 1 are pretty much the same in almost all features, it is hard to classify them correctly. Do you know any way to handle this scenario?
$endgroup$
– Chenxiong Yi
Dec 13 '17 at 7:27

$begingroup$
So the data is normally distributed, what all features do you have?
$endgroup$
– Toros91
Dec 13 '17 at 7:28

$begingroup$
kaggle.com/c/forest-cover-type-prediction/data you can see all features here. By the way, since all classes have same sizes, shouldn't the distribution be uniform?
$endgroup$
– Chenxiong Yi
Dec 13 '17 at 7:30

$begingroup$
Can you explain the above statement with an example?
$endgroup$
– Toros91
Dec 13 '17 at 7:35

$begingroup$
I just mean no class has more training data than other classes. Sorry for the confusion.
$endgroup$
– Chenxiong Yi
Dec 13 '17 at 7:37

|
show 6 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f25616%2fdealing-with-unbalanced-error-rate-in-confusion-matrix%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Welcome to the site!

I think Ensemble Method is very tricky. when one of the model doesn't work well then the accuracy of the Ensemble also goes down.

Coming to your scenario, you need to be very careful at the time of stacking, you need to select the models that are performing moderately and then stack them to improve the accuracy.

How is the distribution of 0/1's, if they are imbalanced then you need to balance to improve the accuracy of the model. To handle imbalance data situation we use packages like SMOTE,ROSE etc.

Feature Engineering like adding external factors or adding new features, might help you to improve your models accuracy.

Do let me know if you have any additional questions.

answered Dec 13 '17 at 6:37

Toros91

2,0042829

$begingroup$
Thank you for your answer! All classes in this data-set actually have same size, so 0 and 1 are not minority in this case. I feel that since 0 and 1 are pretty much the same in almost all features, it is hard to classify them correctly. Do you know any way to handle this scenario?
$endgroup$
– Chenxiong Yi
Dec 13 '17 at 7:27

$begingroup$
So the data is normally distributed, what all features do you have?
$endgroup$
– Toros91
Dec 13 '17 at 7:28

$begingroup$
kaggle.com/c/forest-cover-type-prediction/data you can see all features here. By the way, since all classes have same sizes, shouldn't the distribution be uniform?
$endgroup$
– Chenxiong Yi
Dec 13 '17 at 7:30

$begingroup$
Can you explain the above statement with an example?
$endgroup$
– Toros91
Dec 13 '17 at 7:35

$begingroup$
I just mean no class has more training data than other classes. Sorry for the confusion.
$endgroup$
– Chenxiong Yi
Dec 13 '17 at 7:37

|
show 6 more comments

Welcome to the site!

I think Ensemble Method is very tricky. when one of the model doesn't work well then the accuracy of the Ensemble also goes down.

Coming to your scenario, you need to be very careful at the time of stacking, you need to select the models that are performing moderately and then stack them to improve the accuracy.

How is the distribution of 0/1's, if they are imbalanced then you need to balance to improve the accuracy of the model. To handle imbalance data situation we use packages like SMOTE,ROSE etc.

Feature Engineering like adding external factors or adding new features, might help you to improve your models accuracy.

Do let me know if you have any additional questions.

answered Dec 13 '17 at 6:37

Toros91

2,0042829

$begingroup$
Thank you for your answer! All classes in this data-set actually have same size, so 0 and 1 are not minority in this case. I feel that since 0 and 1 are pretty much the same in almost all features, it is hard to classify them correctly. Do you know any way to handle this scenario?
$endgroup$
– Chenxiong Yi
Dec 13 '17 at 7:27

$begingroup$
So the data is normally distributed, what all features do you have?
$endgroup$
– Toros91
Dec 13 '17 at 7:28

$begingroup$
kaggle.com/c/forest-cover-type-prediction/data you can see all features here. By the way, since all classes have same sizes, shouldn't the distribution be uniform?
$endgroup$
– Chenxiong Yi
Dec 13 '17 at 7:30

$begingroup$
Can you explain the above statement with an example?
$endgroup$
– Toros91
Dec 13 '17 at 7:35

$begingroup$
I just mean no class has more training data than other classes. Sorry for the confusion.
$endgroup$
– Chenxiong Yi
Dec 13 '17 at 7:37

|
show 6 more comments

Welcome to the site!

I think Ensemble Method is very tricky. when one of the model doesn't work well then the accuracy of the Ensemble also goes down.

Coming to your scenario, you need to be very careful at the time of stacking, you need to select the models that are performing moderately and then stack them to improve the accuracy.

How is the distribution of 0/1's, if they are imbalanced then you need to balance to improve the accuracy of the model. To handle imbalance data situation we use packages like SMOTE,ROSE etc.

Feature Engineering like adding external factors or adding new features, might help you to improve your models accuracy.

Do let me know if you have any additional questions.

answered Dec 13 '17 at 6:37

Toros91

2,0042829

Welcome to the site!

I think Ensemble Method is very tricky. when one of the model doesn't work well then the accuracy of the Ensemble also goes down.

Coming to your scenario, you need to be very careful at the time of stacking, you need to select the models that are performing moderately and then stack them to improve the accuracy.

How is the distribution of 0/1's, if they are imbalanced then you need to balance to improve the accuracy of the model. To handle imbalance data situation we use packages like SMOTE,ROSE etc.

Feature Engineering like adding external factors or adding new features, might help you to improve your models accuracy.

Do let me know if you have any additional questions.

answered Dec 13 '17 at 6:37

Toros91

2,0042829

answered Dec 13 '17 at 6:37

Toros91

2,0042829

answered Dec 13 '17 at 6:37

Toros91

2,0042829

answered Dec 13 '17 at 6:37

Toros91

2,0042829

$begingroup$
Thank you for your answer! All classes in this data-set actually have same size, so 0 and 1 are not minority in this case. I feel that since 0 and 1 are pretty much the same in almost all features, it is hard to classify them correctly. Do you know any way to handle this scenario?
$endgroup$
– Chenxiong Yi
Dec 13 '17 at 7:27

$begingroup$
So the data is normally distributed, what all features do you have?
$endgroup$
– Toros91
Dec 13 '17 at 7:28

$begingroup$
kaggle.com/c/forest-cover-type-prediction/data you can see all features here. By the way, since all classes have same sizes, shouldn't the distribution be uniform?
$endgroup$
– Chenxiong Yi
Dec 13 '17 at 7:30

$begingroup$
Can you explain the above statement with an example?
$endgroup$
– Toros91
Dec 13 '17 at 7:35

$begingroup$
I just mean no class has more training data than other classes. Sorry for the confusion.
$endgroup$
– Chenxiong Yi
Dec 13 '17 at 7:37

|
show 6 more comments

$begingroup$
Thank you for your answer! All classes in this data-set actually have same size, so 0 and 1 are not minority in this case. I feel that since 0 and 1 are pretty much the same in almost all features, it is hard to classify them correctly. Do you know any way to handle this scenario?
$endgroup$
– Chenxiong Yi
Dec 13 '17 at 7:27

$begingroup$
So the data is normally distributed, what all features do you have?
$endgroup$
– Toros91
Dec 13 '17 at 7:28

$begingroup$
kaggle.com/c/forest-cover-type-prediction/data you can see all features here. By the way, since all classes have same sizes, shouldn't the distribution be uniform?
$endgroup$
– Chenxiong Yi
Dec 13 '17 at 7:30

$begingroup$
Can you explain the above statement with an example?
$endgroup$
– Toros91
Dec 13 '17 at 7:35

$begingroup$
I just mean no class has more training data than other classes. Sorry for the confusion.
$endgroup$
– Chenxiong Yi
Dec 13 '17 at 7:37

Thank you for your answer! All classes in this data-set actually have same size, so 0 and 1 are not minority in this case. I feel that since 0 and 1 are pretty much the same in almost all features, it is hard to classify them correctly. Do you know any way to handle this scenario?

– Chenxiong Yi
Dec 13 '17 at 7:27

So the data is normally distributed, what all features do you have?

– Toros91
Dec 13 '17 at 7:28

kaggle.com/c/forest-cover-type-prediction/data you can see all features here. By the way, since all classes have same sizes, shouldn't the distribution be uniform?

– Chenxiong Yi
Dec 13 '17 at 7:30

Can you explain the above statement with an example?

– Toros91
Dec 13 '17 at 7:35

I just mean no class has more training data than other classes. Sorry for the confusion.

– Chenxiong Yi
Dec 13 '17 at 7:37

|
show 6 more comments

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Gfyuki