What are some possible reasons that your multiclass classifier is classifying alll the classes in a single...

I have unbalanced classes.
Group1 N = 140
Group2 N = 35
Group3 N = 30

I ran the code on this data and all the Groups got classified as Group1.
I thought that since group1 is the majority group this is not a surprise.
Then i ran the same code but this time with SMOTE, now all groups are 140, and i still got the same results, where all the groups were classified in Group1. Then i balanced the class weights (W/O SMOTE), but still got the same results. This was confusing to me. What am i doing wrong? Can someone help me understand this please? or what can i do to improve the model?
I tried 5 different classifiers (KNN, AdaBoost, SVC, RF, DT) and in 4 out of 6 i got the same result!

Here's the code:

#Splitting data to training and testing 

X_train, X_test, y_train, y_test = train_test_split (X, y, test_size = 0.1, random_state=42)



#Apply StandardScaler for feature scaling

sc = StandardScaler()

X_train_std = sc.fit_transform(X_train)

X_test_std = sc.transform (X_test)



#SMOTE

sm = SMOTE(random_state=42)

X_balanced, y_balanced = sm.fit_sample(X_train_std, y_train)



#PCA

pca = PCA(random_state=42)



#Classifier regularization (SVC).



svc = SVC(random_state=42, class_weight= 'balanced')

pipe_svc = Pipeline(steps=[('pca', pca), ('svc', svc)])





# Parameters of pipelines can be set using ‘__’ separated parameter names:

parameters_svc = [{'pca__n_components': [2, 5, 20, 30, 40, 50, 60, 70, 80, 90, 100, 140, 150]}, 

                   {'svc__C':[1, 10, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 400, 500], 

                    'svc__kernel':['rbf', 'linear','poly'], 

                    'svc__gamma': [0.05, 0.06, 0.07, 0.08, 0.09, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 

                                   0.008,0.009, 0.0001, 0.0002, 0.0003, 0.0004, 0.0005],

                    'svc__degree': [1, 2, 3, 4, 5, 6],

                    'svc__gamma': ['auto', 'scale']}]



clfsvc = GridSearchCV(pipe_svc, param_grid =parameters_svc, iid=False, cv=10,

                      return_train_score=False)

clfsvc.fit(X_balanced, y_balanced)





# Plot the PCA spectrum (SVC)

pca.fit(X_balanced)



fig1, (ax0, ax1) = plt.subplots(nrows=2, sharex=True, figsize=(6, 6)) #(I added 1 to fig)

ax0.plot(pca.explained_variance_ratio_, linewidth=2)

ax0.set_ylabel('PCA explained variance')



ax0.axvline(clfsvc.best_estimator_.named_steps['pca'].n_components,

            linestyle=':', label='n_components chosen')

ax0.legend(prop=dict(size=12))



# For each number of components, find the best classifier results

results_svc = pd.DataFrame(clfsvc.cv_results_) #(Added _svc to all variable def)

components_col_svc = 'param_pca__n_components'

best_clfs_svc = results_svc.groupby(components_col_svc).apply(

    lambda g: g.nlargest(1, 'mean_test_score'))



best_clfs_svc.plot(x=components_col_svc, y='mean_test_score', yerr='std_test_score',

               legend=False, ax=ax1)

ax1.set_ylabel('Classification accuracy (val)')

ax1.set_xlabel('n_components')



plt.tight_layout()

plt.show()



#Predicting the test set results (SVC)

y_pred1 = clfsvc.predict(X_test)



# Model Accuracy, how often is the classifier correct?

Accuracyscore_svc = accuracy_score(y_test, y_pred1)



print("Accuracy for SVC on CV data: ", Accuracyscore_svc)



# Making the confusion matrix to describe the performance of a classifier

from sklearn.metrics import confusion_matrix

cm1 = confusion_matrix (y_test, y_pred1)





#accuracy

# Get accuracy score

accuracy1 = accuracy_score(y_test, y_pred1)

print('Accuracy1: %.2f%%' % (accuracy1 * 100.0))





#Checking shape after confusion matrix

print (X_test)

print (y_pred1)



print (cm1)

asked 12 mins ago

tsumaranaina

7510

$begingroup$
What is the loss function that you are using ? Is loss function sensitive to class-imbalance ?
$endgroup$
– Shamit Verma
1 min ago

add a comment |

I have unbalanced classes.
Group1 N = 140
Group2 N = 35
Group3 N = 30

Here's the code:

#Splitting data to training and testing 

X_train, X_test, y_train, y_test = train_test_split (X, y, test_size = 0.1, random_state=42)



#Apply StandardScaler for feature scaling

sc = StandardScaler()

X_train_std = sc.fit_transform(X_train)

X_test_std = sc.transform (X_test)



#SMOTE

sm = SMOTE(random_state=42)

X_balanced, y_balanced = sm.fit_sample(X_train_std, y_train)



#PCA

pca = PCA(random_state=42)



#Classifier regularization (SVC).



svc = SVC(random_state=42, class_weight= 'balanced')

pipe_svc = Pipeline(steps=[('pca', pca), ('svc', svc)])





# Parameters of pipelines can be set using ‘__’ separated parameter names:

parameters_svc = [{'pca__n_components': [2, 5, 20, 30, 40, 50, 60, 70, 80, 90, 100, 140, 150]}, 

                   {'svc__C':[1, 10, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 400, 500], 

                    'svc__kernel':['rbf', 'linear','poly'], 

                    'svc__gamma': [0.05, 0.06, 0.07, 0.08, 0.09, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 

                                   0.008,0.009, 0.0001, 0.0002, 0.0003, 0.0004, 0.0005],

                    'svc__degree': [1, 2, 3, 4, 5, 6],

                    'svc__gamma': ['auto', 'scale']}]



clfsvc = GridSearchCV(pipe_svc, param_grid =parameters_svc, iid=False, cv=10,

                      return_train_score=False)

clfsvc.fit(X_balanced, y_balanced)





# Plot the PCA spectrum (SVC)

pca.fit(X_balanced)



fig1, (ax0, ax1) = plt.subplots(nrows=2, sharex=True, figsize=(6, 6)) #(I added 1 to fig)

ax0.plot(pca.explained_variance_ratio_, linewidth=2)

ax0.set_ylabel('PCA explained variance')



ax0.axvline(clfsvc.best_estimator_.named_steps['pca'].n_components,

            linestyle=':', label='n_components chosen')

ax0.legend(prop=dict(size=12))



# For each number of components, find the best classifier results

results_svc = pd.DataFrame(clfsvc.cv_results_) #(Added _svc to all variable def)

components_col_svc = 'param_pca__n_components'

best_clfs_svc = results_svc.groupby(components_col_svc).apply(

    lambda g: g.nlargest(1, 'mean_test_score'))



best_clfs_svc.plot(x=components_col_svc, y='mean_test_score', yerr='std_test_score',

               legend=False, ax=ax1)

ax1.set_ylabel('Classification accuracy (val)')

ax1.set_xlabel('n_components')



plt.tight_layout()

plt.show()



#Predicting the test set results (SVC)

y_pred1 = clfsvc.predict(X_test)



# Model Accuracy, how often is the classifier correct?

Accuracyscore_svc = accuracy_score(y_test, y_pred1)



print("Accuracy for SVC on CV data: ", Accuracyscore_svc)



# Making the confusion matrix to describe the performance of a classifier

from sklearn.metrics import confusion_matrix

cm1 = confusion_matrix (y_test, y_pred1)





#accuracy

# Get accuracy score

accuracy1 = accuracy_score(y_test, y_pred1)

print('Accuracy1: %.2f%%' % (accuracy1 * 100.0))





#Checking shape after confusion matrix

print (X_test)

print (y_pred1)



print (cm1)

asked 12 mins ago

tsumaranaina

7510

$begingroup$
What is the loss function that you are using ? Is loss function sensitive to class-imbalance ?
$endgroup$
– Shamit Verma
1 min ago

add a comment |

I have unbalanced classes.
Group1 N = 140
Group2 N = 35
Group3 N = 30

Here's the code:

#Splitting data to training and testing 

X_train, X_test, y_train, y_test = train_test_split (X, y, test_size = 0.1, random_state=42)



#Apply StandardScaler for feature scaling

sc = StandardScaler()

X_train_std = sc.fit_transform(X_train)

X_test_std = sc.transform (X_test)



#SMOTE

sm = SMOTE(random_state=42)

X_balanced, y_balanced = sm.fit_sample(X_train_std, y_train)



#PCA

pca = PCA(random_state=42)



#Classifier regularization (SVC).



svc = SVC(random_state=42, class_weight= 'balanced')

pipe_svc = Pipeline(steps=[('pca', pca), ('svc', svc)])





# Parameters of pipelines can be set using ‘__’ separated parameter names:

parameters_svc = [{'pca__n_components': [2, 5, 20, 30, 40, 50, 60, 70, 80, 90, 100, 140, 150]}, 

                   {'svc__C':[1, 10, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 400, 500], 

                    'svc__kernel':['rbf', 'linear','poly'], 

                    'svc__gamma': [0.05, 0.06, 0.07, 0.08, 0.09, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 

                                   0.008,0.009, 0.0001, 0.0002, 0.0003, 0.0004, 0.0005],

                    'svc__degree': [1, 2, 3, 4, 5, 6],

                    'svc__gamma': ['auto', 'scale']}]



clfsvc = GridSearchCV(pipe_svc, param_grid =parameters_svc, iid=False, cv=10,

                      return_train_score=False)

clfsvc.fit(X_balanced, y_balanced)





# Plot the PCA spectrum (SVC)

pca.fit(X_balanced)



fig1, (ax0, ax1) = plt.subplots(nrows=2, sharex=True, figsize=(6, 6)) #(I added 1 to fig)

ax0.plot(pca.explained_variance_ratio_, linewidth=2)

ax0.set_ylabel('PCA explained variance')



ax0.axvline(clfsvc.best_estimator_.named_steps['pca'].n_components,

            linestyle=':', label='n_components chosen')

ax0.legend(prop=dict(size=12))



# For each number of components, find the best classifier results

results_svc = pd.DataFrame(clfsvc.cv_results_) #(Added _svc to all variable def)

components_col_svc = 'param_pca__n_components'

best_clfs_svc = results_svc.groupby(components_col_svc).apply(

    lambda g: g.nlargest(1, 'mean_test_score'))



best_clfs_svc.plot(x=components_col_svc, y='mean_test_score', yerr='std_test_score',

               legend=False, ax=ax1)

ax1.set_ylabel('Classification accuracy (val)')

ax1.set_xlabel('n_components')



plt.tight_layout()

plt.show()



#Predicting the test set results (SVC)

y_pred1 = clfsvc.predict(X_test)



# Model Accuracy, how often is the classifier correct?

Accuracyscore_svc = accuracy_score(y_test, y_pred1)



print("Accuracy for SVC on CV data: ", Accuracyscore_svc)



# Making the confusion matrix to describe the performance of a classifier

from sklearn.metrics import confusion_matrix

cm1 = confusion_matrix (y_test, y_pred1)





#accuracy

# Get accuracy score

accuracy1 = accuracy_score(y_test, y_pred1)

print('Accuracy1: %.2f%%' % (accuracy1 * 100.0))





#Checking shape after confusion matrix

print (X_test)

print (y_pred1)



print (cm1)

asked 12 mins ago

tsumaranaina

7510

I have unbalanced classes.
Group1 N = 140
Group2 N = 35
Group3 N = 30

Here's the code:

#Splitting data to training and testing 

X_train, X_test, y_train, y_test = train_test_split (X, y, test_size = 0.1, random_state=42)



#Apply StandardScaler for feature scaling

sc = StandardScaler()

X_train_std = sc.fit_transform(X_train)

X_test_std = sc.transform (X_test)



#SMOTE

sm = SMOTE(random_state=42)

X_balanced, y_balanced = sm.fit_sample(X_train_std, y_train)



#PCA

pca = PCA(random_state=42)



#Classifier regularization (SVC).



svc = SVC(random_state=42, class_weight= 'balanced')

pipe_svc = Pipeline(steps=[('pca', pca), ('svc', svc)])





# Parameters of pipelines can be set using ‘__’ separated parameter names:

parameters_svc = [{'pca__n_components': [2, 5, 20, 30, 40, 50, 60, 70, 80, 90, 100, 140, 150]}, 

                   {'svc__C':[1, 10, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 400, 500], 

                    'svc__kernel':['rbf', 'linear','poly'], 

                    'svc__gamma': [0.05, 0.06, 0.07, 0.08, 0.09, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 

                                   0.008,0.009, 0.0001, 0.0002, 0.0003, 0.0004, 0.0005],

                    'svc__degree': [1, 2, 3, 4, 5, 6],

                    'svc__gamma': ['auto', 'scale']}]



clfsvc = GridSearchCV(pipe_svc, param_grid =parameters_svc, iid=False, cv=10,

                      return_train_score=False)

clfsvc.fit(X_balanced, y_balanced)





# Plot the PCA spectrum (SVC)

pca.fit(X_balanced)



fig1, (ax0, ax1) = plt.subplots(nrows=2, sharex=True, figsize=(6, 6)) #(I added 1 to fig)

ax0.plot(pca.explained_variance_ratio_, linewidth=2)

ax0.set_ylabel('PCA explained variance')



ax0.axvline(clfsvc.best_estimator_.named_steps['pca'].n_components,

            linestyle=':', label='n_components chosen')

ax0.legend(prop=dict(size=12))



# For each number of components, find the best classifier results

results_svc = pd.DataFrame(clfsvc.cv_results_) #(Added _svc to all variable def)

components_col_svc = 'param_pca__n_components'

best_clfs_svc = results_svc.groupby(components_col_svc).apply(

    lambda g: g.nlargest(1, 'mean_test_score'))



best_clfs_svc.plot(x=components_col_svc, y='mean_test_score', yerr='std_test_score',

               legend=False, ax=ax1)

ax1.set_ylabel('Classification accuracy (val)')

ax1.set_xlabel('n_components')



plt.tight_layout()

plt.show()



#Predicting the test set results (SVC)

y_pred1 = clfsvc.predict(X_test)



# Model Accuracy, how often is the classifier correct?

Accuracyscore_svc = accuracy_score(y_test, y_pred1)



print("Accuracy for SVC on CV data: ", Accuracyscore_svc)



# Making the confusion matrix to describe the performance of a classifier

from sklearn.metrics import confusion_matrix

cm1 = confusion_matrix (y_test, y_pred1)





#accuracy

# Get accuracy score

accuracy1 = accuracy_score(y_test, y_pred1)

print('Accuracy1: %.2f%%' % (accuracy1 * 100.0))





#Checking shape after confusion matrix

print (X_test)

print (y_pred1)



print (cm1)

machine-learning classification machine-learning-model multilabel-classification unbalanced-classes

asked 12 mins ago

tsumaranaina

7510

asked 12 mins ago

tsumaranaina

7510

asked 12 mins ago

tsumaranaina

7510

asked 12 mins ago

tsumaranaina

7510

asked 12 mins ago

tsumaranaina

7510

$begingroup$
What is the loss function that you are using ? Is loss function sensitive to class-imbalance ?
$endgroup$
– Shamit Verma
1 min ago

add a comment |

$begingroup$
What is the loss function that you are using ? Is loss function sensitive to class-imbalance ?
$endgroup$
– Shamit Verma
1 min ago

What is the loss function that you are using ? Is loss function sensitive to class-imbalance ?

– Shamit Verma
1 min ago

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47022%2fwhat-are-some-possible-reasons-that-your-multiclass-classifier-is-classifying-al%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Gfyuki