What are some possible reasons that your multiclass classifier is classifying alll the classes in a single...












0












$begingroup$


I have unbalanced classes.
Group1 N = 140
Group2 N = 35
Group3 N = 30



I ran the code on this data and all the Groups got classified as Group1.
I thought that since group1 is the majority group this is not a surprise.
Then i ran the same code but this time with SMOTE, now all groups are 140, and i still got the same results, where all the groups were classified in Group1. Then i balanced the class weights (W/O SMOTE), but still got the same results. This was confusing to me. What am i doing wrong? Can someone help me understand this please? or what can i do to improve the model?
I tried 5 different classifiers (KNN, AdaBoost, SVC, RF, DT) and in 4 out of 6 i got the same result!



Here's the code:



#Splitting data to training and testing 
X_train, X_test, y_train, y_test = train_test_split (X, y, test_size = 0.1, random_state=42)

#Apply StandardScaler for feature scaling
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform (X_test)

#SMOTE
sm = SMOTE(random_state=42)
X_balanced, y_balanced = sm.fit_sample(X_train_std, y_train)

#PCA
pca = PCA(random_state=42)

#Classifier regularization (SVC).

svc = SVC(random_state=42, class_weight= 'balanced')
pipe_svc = Pipeline(steps=[('pca', pca), ('svc', svc)])


# Parameters of pipelines can be set using ‘__’ separated parameter names:
parameters_svc = [{'pca__n_components': [2, 5, 20, 30, 40, 50, 60, 70, 80, 90, 100, 140, 150]},
{'svc__C':[1, 10, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 400, 500],
'svc__kernel':['rbf', 'linear','poly'],
'svc__gamma': [0.05, 0.06, 0.07, 0.08, 0.09, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007,
0.008,0.009, 0.0001, 0.0002, 0.0003, 0.0004, 0.0005],
'svc__degree': [1, 2, 3, 4, 5, 6],
'svc__gamma': ['auto', 'scale']}]

clfsvc = GridSearchCV(pipe_svc, param_grid =parameters_svc, iid=False, cv=10,
return_train_score=False)
clfsvc.fit(X_balanced, y_balanced)


# Plot the PCA spectrum (SVC)
pca.fit(X_balanced)

fig1, (ax0, ax1) = plt.subplots(nrows=2, sharex=True, figsize=(6, 6)) #(I added 1 to fig)
ax0.plot(pca.explained_variance_ratio_, linewidth=2)
ax0.set_ylabel('PCA explained variance')

ax0.axvline(clfsvc.best_estimator_.named_steps['pca'].n_components,
linestyle=':', label='n_components chosen')
ax0.legend(prop=dict(size=12))

# For each number of components, find the best classifier results
results_svc = pd.DataFrame(clfsvc.cv_results_) #(Added _svc to all variable def)
components_col_svc = 'param_pca__n_components'
best_clfs_svc = results_svc.groupby(components_col_svc).apply(
lambda g: g.nlargest(1, 'mean_test_score'))

best_clfs_svc.plot(x=components_col_svc, y='mean_test_score', yerr='std_test_score',
legend=False, ax=ax1)
ax1.set_ylabel('Classification accuracy (val)')
ax1.set_xlabel('n_components')

plt.tight_layout()
plt.show()

#Predicting the test set results (SVC)
y_pred1 = clfsvc.predict(X_test)

# Model Accuracy, how often is the classifier correct?
Accuracyscore_svc = accuracy_score(y_test, y_pred1)

print("Accuracy for SVC on CV data: ", Accuracyscore_svc)

# Making the confusion matrix to describe the performance of a classifier
from sklearn.metrics import confusion_matrix
cm1 = confusion_matrix (y_test, y_pred1)


#accuracy
# Get accuracy score
accuracy1 = accuracy_score(y_test, y_pred1)
print('Accuracy1: %.2f%%' % (accuracy1 * 100.0))


#Checking shape after confusion matrix
print (X_test)
print (y_pred1)

print (cm1)









share|improve this question









$endgroup$












  • $begingroup$
    What is the loss function that you are using ? Is loss function sensitive to class-imbalance ?
    $endgroup$
    – Shamit Verma
    1 min ago
















0












$begingroup$


I have unbalanced classes.
Group1 N = 140
Group2 N = 35
Group3 N = 30



I ran the code on this data and all the Groups got classified as Group1.
I thought that since group1 is the majority group this is not a surprise.
Then i ran the same code but this time with SMOTE, now all groups are 140, and i still got the same results, where all the groups were classified in Group1. Then i balanced the class weights (W/O SMOTE), but still got the same results. This was confusing to me. What am i doing wrong? Can someone help me understand this please? or what can i do to improve the model?
I tried 5 different classifiers (KNN, AdaBoost, SVC, RF, DT) and in 4 out of 6 i got the same result!



Here's the code:



#Splitting data to training and testing 
X_train, X_test, y_train, y_test = train_test_split (X, y, test_size = 0.1, random_state=42)

#Apply StandardScaler for feature scaling
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform (X_test)

#SMOTE
sm = SMOTE(random_state=42)
X_balanced, y_balanced = sm.fit_sample(X_train_std, y_train)

#PCA
pca = PCA(random_state=42)

#Classifier regularization (SVC).

svc = SVC(random_state=42, class_weight= 'balanced')
pipe_svc = Pipeline(steps=[('pca', pca), ('svc', svc)])


# Parameters of pipelines can be set using ‘__’ separated parameter names:
parameters_svc = [{'pca__n_components': [2, 5, 20, 30, 40, 50, 60, 70, 80, 90, 100, 140, 150]},
{'svc__C':[1, 10, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 400, 500],
'svc__kernel':['rbf', 'linear','poly'],
'svc__gamma': [0.05, 0.06, 0.07, 0.08, 0.09, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007,
0.008,0.009, 0.0001, 0.0002, 0.0003, 0.0004, 0.0005],
'svc__degree': [1, 2, 3, 4, 5, 6],
'svc__gamma': ['auto', 'scale']}]

clfsvc = GridSearchCV(pipe_svc, param_grid =parameters_svc, iid=False, cv=10,
return_train_score=False)
clfsvc.fit(X_balanced, y_balanced)


# Plot the PCA spectrum (SVC)
pca.fit(X_balanced)

fig1, (ax0, ax1) = plt.subplots(nrows=2, sharex=True, figsize=(6, 6)) #(I added 1 to fig)
ax0.plot(pca.explained_variance_ratio_, linewidth=2)
ax0.set_ylabel('PCA explained variance')

ax0.axvline(clfsvc.best_estimator_.named_steps['pca'].n_components,
linestyle=':', label='n_components chosen')
ax0.legend(prop=dict(size=12))

# For each number of components, find the best classifier results
results_svc = pd.DataFrame(clfsvc.cv_results_) #(Added _svc to all variable def)
components_col_svc = 'param_pca__n_components'
best_clfs_svc = results_svc.groupby(components_col_svc).apply(
lambda g: g.nlargest(1, 'mean_test_score'))

best_clfs_svc.plot(x=components_col_svc, y='mean_test_score', yerr='std_test_score',
legend=False, ax=ax1)
ax1.set_ylabel('Classification accuracy (val)')
ax1.set_xlabel('n_components')

plt.tight_layout()
plt.show()

#Predicting the test set results (SVC)
y_pred1 = clfsvc.predict(X_test)

# Model Accuracy, how often is the classifier correct?
Accuracyscore_svc = accuracy_score(y_test, y_pred1)

print("Accuracy for SVC on CV data: ", Accuracyscore_svc)

# Making the confusion matrix to describe the performance of a classifier
from sklearn.metrics import confusion_matrix
cm1 = confusion_matrix (y_test, y_pred1)


#accuracy
# Get accuracy score
accuracy1 = accuracy_score(y_test, y_pred1)
print('Accuracy1: %.2f%%' % (accuracy1 * 100.0))


#Checking shape after confusion matrix
print (X_test)
print (y_pred1)

print (cm1)









share|improve this question









$endgroup$












  • $begingroup$
    What is the loss function that you are using ? Is loss function sensitive to class-imbalance ?
    $endgroup$
    – Shamit Verma
    1 min ago














0












0








0





$begingroup$


I have unbalanced classes.
Group1 N = 140
Group2 N = 35
Group3 N = 30



I ran the code on this data and all the Groups got classified as Group1.
I thought that since group1 is the majority group this is not a surprise.
Then i ran the same code but this time with SMOTE, now all groups are 140, and i still got the same results, where all the groups were classified in Group1. Then i balanced the class weights (W/O SMOTE), but still got the same results. This was confusing to me. What am i doing wrong? Can someone help me understand this please? or what can i do to improve the model?
I tried 5 different classifiers (KNN, AdaBoost, SVC, RF, DT) and in 4 out of 6 i got the same result!



Here's the code:



#Splitting data to training and testing 
X_train, X_test, y_train, y_test = train_test_split (X, y, test_size = 0.1, random_state=42)

#Apply StandardScaler for feature scaling
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform (X_test)

#SMOTE
sm = SMOTE(random_state=42)
X_balanced, y_balanced = sm.fit_sample(X_train_std, y_train)

#PCA
pca = PCA(random_state=42)

#Classifier regularization (SVC).

svc = SVC(random_state=42, class_weight= 'balanced')
pipe_svc = Pipeline(steps=[('pca', pca), ('svc', svc)])


# Parameters of pipelines can be set using ‘__’ separated parameter names:
parameters_svc = [{'pca__n_components': [2, 5, 20, 30, 40, 50, 60, 70, 80, 90, 100, 140, 150]},
{'svc__C':[1, 10, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 400, 500],
'svc__kernel':['rbf', 'linear','poly'],
'svc__gamma': [0.05, 0.06, 0.07, 0.08, 0.09, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007,
0.008,0.009, 0.0001, 0.0002, 0.0003, 0.0004, 0.0005],
'svc__degree': [1, 2, 3, 4, 5, 6],
'svc__gamma': ['auto', 'scale']}]

clfsvc = GridSearchCV(pipe_svc, param_grid =parameters_svc, iid=False, cv=10,
return_train_score=False)
clfsvc.fit(X_balanced, y_balanced)


# Plot the PCA spectrum (SVC)
pca.fit(X_balanced)

fig1, (ax0, ax1) = plt.subplots(nrows=2, sharex=True, figsize=(6, 6)) #(I added 1 to fig)
ax0.plot(pca.explained_variance_ratio_, linewidth=2)
ax0.set_ylabel('PCA explained variance')

ax0.axvline(clfsvc.best_estimator_.named_steps['pca'].n_components,
linestyle=':', label='n_components chosen')
ax0.legend(prop=dict(size=12))

# For each number of components, find the best classifier results
results_svc = pd.DataFrame(clfsvc.cv_results_) #(Added _svc to all variable def)
components_col_svc = 'param_pca__n_components'
best_clfs_svc = results_svc.groupby(components_col_svc).apply(
lambda g: g.nlargest(1, 'mean_test_score'))

best_clfs_svc.plot(x=components_col_svc, y='mean_test_score', yerr='std_test_score',
legend=False, ax=ax1)
ax1.set_ylabel('Classification accuracy (val)')
ax1.set_xlabel('n_components')

plt.tight_layout()
plt.show()

#Predicting the test set results (SVC)
y_pred1 = clfsvc.predict(X_test)

# Model Accuracy, how often is the classifier correct?
Accuracyscore_svc = accuracy_score(y_test, y_pred1)

print("Accuracy for SVC on CV data: ", Accuracyscore_svc)

# Making the confusion matrix to describe the performance of a classifier
from sklearn.metrics import confusion_matrix
cm1 = confusion_matrix (y_test, y_pred1)


#accuracy
# Get accuracy score
accuracy1 = accuracy_score(y_test, y_pred1)
print('Accuracy1: %.2f%%' % (accuracy1 * 100.0))


#Checking shape after confusion matrix
print (X_test)
print (y_pred1)

print (cm1)









share|improve this question









$endgroup$




I have unbalanced classes.
Group1 N = 140
Group2 N = 35
Group3 N = 30



I ran the code on this data and all the Groups got classified as Group1.
I thought that since group1 is the majority group this is not a surprise.
Then i ran the same code but this time with SMOTE, now all groups are 140, and i still got the same results, where all the groups were classified in Group1. Then i balanced the class weights (W/O SMOTE), but still got the same results. This was confusing to me. What am i doing wrong? Can someone help me understand this please? or what can i do to improve the model?
I tried 5 different classifiers (KNN, AdaBoost, SVC, RF, DT) and in 4 out of 6 i got the same result!



Here's the code:



#Splitting data to training and testing 
X_train, X_test, y_train, y_test = train_test_split (X, y, test_size = 0.1, random_state=42)

#Apply StandardScaler for feature scaling
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform (X_test)

#SMOTE
sm = SMOTE(random_state=42)
X_balanced, y_balanced = sm.fit_sample(X_train_std, y_train)

#PCA
pca = PCA(random_state=42)

#Classifier regularization (SVC).

svc = SVC(random_state=42, class_weight= 'balanced')
pipe_svc = Pipeline(steps=[('pca', pca), ('svc', svc)])


# Parameters of pipelines can be set using ‘__’ separated parameter names:
parameters_svc = [{'pca__n_components': [2, 5, 20, 30, 40, 50, 60, 70, 80, 90, 100, 140, 150]},
{'svc__C':[1, 10, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 400, 500],
'svc__kernel':['rbf', 'linear','poly'],
'svc__gamma': [0.05, 0.06, 0.07, 0.08, 0.09, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007,
0.008,0.009, 0.0001, 0.0002, 0.0003, 0.0004, 0.0005],
'svc__degree': [1, 2, 3, 4, 5, 6],
'svc__gamma': ['auto', 'scale']}]

clfsvc = GridSearchCV(pipe_svc, param_grid =parameters_svc, iid=False, cv=10,
return_train_score=False)
clfsvc.fit(X_balanced, y_balanced)


# Plot the PCA spectrum (SVC)
pca.fit(X_balanced)

fig1, (ax0, ax1) = plt.subplots(nrows=2, sharex=True, figsize=(6, 6)) #(I added 1 to fig)
ax0.plot(pca.explained_variance_ratio_, linewidth=2)
ax0.set_ylabel('PCA explained variance')

ax0.axvline(clfsvc.best_estimator_.named_steps['pca'].n_components,
linestyle=':', label='n_components chosen')
ax0.legend(prop=dict(size=12))

# For each number of components, find the best classifier results
results_svc = pd.DataFrame(clfsvc.cv_results_) #(Added _svc to all variable def)
components_col_svc = 'param_pca__n_components'
best_clfs_svc = results_svc.groupby(components_col_svc).apply(
lambda g: g.nlargest(1, 'mean_test_score'))

best_clfs_svc.plot(x=components_col_svc, y='mean_test_score', yerr='std_test_score',
legend=False, ax=ax1)
ax1.set_ylabel('Classification accuracy (val)')
ax1.set_xlabel('n_components')

plt.tight_layout()
plt.show()

#Predicting the test set results (SVC)
y_pred1 = clfsvc.predict(X_test)

# Model Accuracy, how often is the classifier correct?
Accuracyscore_svc = accuracy_score(y_test, y_pred1)

print("Accuracy for SVC on CV data: ", Accuracyscore_svc)

# Making the confusion matrix to describe the performance of a classifier
from sklearn.metrics import confusion_matrix
cm1 = confusion_matrix (y_test, y_pred1)


#accuracy
# Get accuracy score
accuracy1 = accuracy_score(y_test, y_pred1)
print('Accuracy1: %.2f%%' % (accuracy1 * 100.0))


#Checking shape after confusion matrix
print (X_test)
print (y_pred1)

print (cm1)






machine-learning classification machine-learning-model multilabel-classification unbalanced-classes






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 12 mins ago









tsumaranainatsumaranaina

7510




7510












  • $begingroup$
    What is the loss function that you are using ? Is loss function sensitive to class-imbalance ?
    $endgroup$
    – Shamit Verma
    1 min ago


















  • $begingroup$
    What is the loss function that you are using ? Is loss function sensitive to class-imbalance ?
    $endgroup$
    – Shamit Verma
    1 min ago
















$begingroup$
What is the loss function that you are using ? Is loss function sensitive to class-imbalance ?
$endgroup$
– Shamit Verma
1 min ago




$begingroup$
What is the loss function that you are using ? Is loss function sensitive to class-imbalance ?
$endgroup$
– Shamit Verma
1 min ago










0






active

oldest

votes











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47022%2fwhat-are-some-possible-reasons-that-your-multiclass-classifier-is-classifying-al%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47022%2fwhat-are-some-possible-reasons-that-your-multiclass-classifier-is-classifying-al%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Ponta tanko

Tantalo (mitologio)

Erzsébet Schaár