What are some possible reasons that your multiclass classifier is classifying alll the classes in a single...
$begingroup$
I have unbalanced classes.
Group1 N = 140
Group2 N = 35
Group3 N = 30
I ran the code on this data and all the Groups got classified as Group1.
I thought that since group1 is the majority group this is not a surprise.
Then i ran the same code but this time with SMOTE, now all groups are 140, and i still got the same results, where all the groups were classified in Group1. Then i balanced the class weights (W/O SMOTE), but still got the same results. This was confusing to me. What am i doing wrong? Can someone help me understand this please? or what can i do to improve the model?
I tried 5 different classifiers (KNN, AdaBoost, SVC, RF, DT) and in 4 out of 6 i got the same result!
Here's the code:
#Splitting data to training and testing
X_train, X_test, y_train, y_test = train_test_split (X, y, test_size = 0.1, random_state=42)
#Apply StandardScaler for feature scaling
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform (X_test)
#SMOTE
sm = SMOTE(random_state=42)
X_balanced, y_balanced = sm.fit_sample(X_train_std, y_train)
#PCA
pca = PCA(random_state=42)
#Classifier regularization (SVC).
svc = SVC(random_state=42, class_weight= 'balanced')
pipe_svc = Pipeline(steps=[('pca', pca), ('svc', svc)])
# Parameters of pipelines can be set using ‘__’ separated parameter names:
parameters_svc = [{'pca__n_components': [2, 5, 20, 30, 40, 50, 60, 70, 80, 90, 100, 140, 150]},
{'svc__C':[1, 10, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 400, 500],
'svc__kernel':['rbf', 'linear','poly'],
'svc__gamma': [0.05, 0.06, 0.07, 0.08, 0.09, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007,
0.008,0.009, 0.0001, 0.0002, 0.0003, 0.0004, 0.0005],
'svc__degree': [1, 2, 3, 4, 5, 6],
'svc__gamma': ['auto', 'scale']}]
clfsvc = GridSearchCV(pipe_svc, param_grid =parameters_svc, iid=False, cv=10,
return_train_score=False)
clfsvc.fit(X_balanced, y_balanced)
# Plot the PCA spectrum (SVC)
pca.fit(X_balanced)
fig1, (ax0, ax1) = plt.subplots(nrows=2, sharex=True, figsize=(6, 6)) #(I added 1 to fig)
ax0.plot(pca.explained_variance_ratio_, linewidth=2)
ax0.set_ylabel('PCA explained variance')
ax0.axvline(clfsvc.best_estimator_.named_steps['pca'].n_components,
linestyle=':', label='n_components chosen')
ax0.legend(prop=dict(size=12))
# For each number of components, find the best classifier results
results_svc = pd.DataFrame(clfsvc.cv_results_) #(Added _svc to all variable def)
components_col_svc = 'param_pca__n_components'
best_clfs_svc = results_svc.groupby(components_col_svc).apply(
lambda g: g.nlargest(1, 'mean_test_score'))
best_clfs_svc.plot(x=components_col_svc, y='mean_test_score', yerr='std_test_score',
legend=False, ax=ax1)
ax1.set_ylabel('Classification accuracy (val)')
ax1.set_xlabel('n_components')
plt.tight_layout()
plt.show()
#Predicting the test set results (SVC)
y_pred1 = clfsvc.predict(X_test)
# Model Accuracy, how often is the classifier correct?
Accuracyscore_svc = accuracy_score(y_test, y_pred1)
print("Accuracy for SVC on CV data: ", Accuracyscore_svc)
# Making the confusion matrix to describe the performance of a classifier
from sklearn.metrics import confusion_matrix
cm1 = confusion_matrix (y_test, y_pred1)
#accuracy
# Get accuracy score
accuracy1 = accuracy_score(y_test, y_pred1)
print('Accuracy1: %.2f%%' % (accuracy1 * 100.0))
#Checking shape after confusion matrix
print (X_test)
print (y_pred1)
print (cm1)
machine-learning classification machine-learning-model multilabel-classification unbalanced-classes
$endgroup$
add a comment |
$begingroup$
I have unbalanced classes.
Group1 N = 140
Group2 N = 35
Group3 N = 30
I ran the code on this data and all the Groups got classified as Group1.
I thought that since group1 is the majority group this is not a surprise.
Then i ran the same code but this time with SMOTE, now all groups are 140, and i still got the same results, where all the groups were classified in Group1. Then i balanced the class weights (W/O SMOTE), but still got the same results. This was confusing to me. What am i doing wrong? Can someone help me understand this please? or what can i do to improve the model?
I tried 5 different classifiers (KNN, AdaBoost, SVC, RF, DT) and in 4 out of 6 i got the same result!
Here's the code:
#Splitting data to training and testing
X_train, X_test, y_train, y_test = train_test_split (X, y, test_size = 0.1, random_state=42)
#Apply StandardScaler for feature scaling
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform (X_test)
#SMOTE
sm = SMOTE(random_state=42)
X_balanced, y_balanced = sm.fit_sample(X_train_std, y_train)
#PCA
pca = PCA(random_state=42)
#Classifier regularization (SVC).
svc = SVC(random_state=42, class_weight= 'balanced')
pipe_svc = Pipeline(steps=[('pca', pca), ('svc', svc)])
# Parameters of pipelines can be set using ‘__’ separated parameter names:
parameters_svc = [{'pca__n_components': [2, 5, 20, 30, 40, 50, 60, 70, 80, 90, 100, 140, 150]},
{'svc__C':[1, 10, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 400, 500],
'svc__kernel':['rbf', 'linear','poly'],
'svc__gamma': [0.05, 0.06, 0.07, 0.08, 0.09, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007,
0.008,0.009, 0.0001, 0.0002, 0.0003, 0.0004, 0.0005],
'svc__degree': [1, 2, 3, 4, 5, 6],
'svc__gamma': ['auto', 'scale']}]
clfsvc = GridSearchCV(pipe_svc, param_grid =parameters_svc, iid=False, cv=10,
return_train_score=False)
clfsvc.fit(X_balanced, y_balanced)
# Plot the PCA spectrum (SVC)
pca.fit(X_balanced)
fig1, (ax0, ax1) = plt.subplots(nrows=2, sharex=True, figsize=(6, 6)) #(I added 1 to fig)
ax0.plot(pca.explained_variance_ratio_, linewidth=2)
ax0.set_ylabel('PCA explained variance')
ax0.axvline(clfsvc.best_estimator_.named_steps['pca'].n_components,
linestyle=':', label='n_components chosen')
ax0.legend(prop=dict(size=12))
# For each number of components, find the best classifier results
results_svc = pd.DataFrame(clfsvc.cv_results_) #(Added _svc to all variable def)
components_col_svc = 'param_pca__n_components'
best_clfs_svc = results_svc.groupby(components_col_svc).apply(
lambda g: g.nlargest(1, 'mean_test_score'))
best_clfs_svc.plot(x=components_col_svc, y='mean_test_score', yerr='std_test_score',
legend=False, ax=ax1)
ax1.set_ylabel('Classification accuracy (val)')
ax1.set_xlabel('n_components')
plt.tight_layout()
plt.show()
#Predicting the test set results (SVC)
y_pred1 = clfsvc.predict(X_test)
# Model Accuracy, how often is the classifier correct?
Accuracyscore_svc = accuracy_score(y_test, y_pred1)
print("Accuracy for SVC on CV data: ", Accuracyscore_svc)
# Making the confusion matrix to describe the performance of a classifier
from sklearn.metrics import confusion_matrix
cm1 = confusion_matrix (y_test, y_pred1)
#accuracy
# Get accuracy score
accuracy1 = accuracy_score(y_test, y_pred1)
print('Accuracy1: %.2f%%' % (accuracy1 * 100.0))
#Checking shape after confusion matrix
print (X_test)
print (y_pred1)
print (cm1)
machine-learning classification machine-learning-model multilabel-classification unbalanced-classes
$endgroup$
$begingroup$
What is the loss function that you are using ? Is loss function sensitive to class-imbalance ?
$endgroup$
– Shamit Verma
1 min ago
add a comment |
$begingroup$
I have unbalanced classes.
Group1 N = 140
Group2 N = 35
Group3 N = 30
I ran the code on this data and all the Groups got classified as Group1.
I thought that since group1 is the majority group this is not a surprise.
Then i ran the same code but this time with SMOTE, now all groups are 140, and i still got the same results, where all the groups were classified in Group1. Then i balanced the class weights (W/O SMOTE), but still got the same results. This was confusing to me. What am i doing wrong? Can someone help me understand this please? or what can i do to improve the model?
I tried 5 different classifiers (KNN, AdaBoost, SVC, RF, DT) and in 4 out of 6 i got the same result!
Here's the code:
#Splitting data to training and testing
X_train, X_test, y_train, y_test = train_test_split (X, y, test_size = 0.1, random_state=42)
#Apply StandardScaler for feature scaling
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform (X_test)
#SMOTE
sm = SMOTE(random_state=42)
X_balanced, y_balanced = sm.fit_sample(X_train_std, y_train)
#PCA
pca = PCA(random_state=42)
#Classifier regularization (SVC).
svc = SVC(random_state=42, class_weight= 'balanced')
pipe_svc = Pipeline(steps=[('pca', pca), ('svc', svc)])
# Parameters of pipelines can be set using ‘__’ separated parameter names:
parameters_svc = [{'pca__n_components': [2, 5, 20, 30, 40, 50, 60, 70, 80, 90, 100, 140, 150]},
{'svc__C':[1, 10, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 400, 500],
'svc__kernel':['rbf', 'linear','poly'],
'svc__gamma': [0.05, 0.06, 0.07, 0.08, 0.09, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007,
0.008,0.009, 0.0001, 0.0002, 0.0003, 0.0004, 0.0005],
'svc__degree': [1, 2, 3, 4, 5, 6],
'svc__gamma': ['auto', 'scale']}]
clfsvc = GridSearchCV(pipe_svc, param_grid =parameters_svc, iid=False, cv=10,
return_train_score=False)
clfsvc.fit(X_balanced, y_balanced)
# Plot the PCA spectrum (SVC)
pca.fit(X_balanced)
fig1, (ax0, ax1) = plt.subplots(nrows=2, sharex=True, figsize=(6, 6)) #(I added 1 to fig)
ax0.plot(pca.explained_variance_ratio_, linewidth=2)
ax0.set_ylabel('PCA explained variance')
ax0.axvline(clfsvc.best_estimator_.named_steps['pca'].n_components,
linestyle=':', label='n_components chosen')
ax0.legend(prop=dict(size=12))
# For each number of components, find the best classifier results
results_svc = pd.DataFrame(clfsvc.cv_results_) #(Added _svc to all variable def)
components_col_svc = 'param_pca__n_components'
best_clfs_svc = results_svc.groupby(components_col_svc).apply(
lambda g: g.nlargest(1, 'mean_test_score'))
best_clfs_svc.plot(x=components_col_svc, y='mean_test_score', yerr='std_test_score',
legend=False, ax=ax1)
ax1.set_ylabel('Classification accuracy (val)')
ax1.set_xlabel('n_components')
plt.tight_layout()
plt.show()
#Predicting the test set results (SVC)
y_pred1 = clfsvc.predict(X_test)
# Model Accuracy, how often is the classifier correct?
Accuracyscore_svc = accuracy_score(y_test, y_pred1)
print("Accuracy for SVC on CV data: ", Accuracyscore_svc)
# Making the confusion matrix to describe the performance of a classifier
from sklearn.metrics import confusion_matrix
cm1 = confusion_matrix (y_test, y_pred1)
#accuracy
# Get accuracy score
accuracy1 = accuracy_score(y_test, y_pred1)
print('Accuracy1: %.2f%%' % (accuracy1 * 100.0))
#Checking shape after confusion matrix
print (X_test)
print (y_pred1)
print (cm1)
machine-learning classification machine-learning-model multilabel-classification unbalanced-classes
$endgroup$
I have unbalanced classes.
Group1 N = 140
Group2 N = 35
Group3 N = 30
I ran the code on this data and all the Groups got classified as Group1.
I thought that since group1 is the majority group this is not a surprise.
Then i ran the same code but this time with SMOTE, now all groups are 140, and i still got the same results, where all the groups were classified in Group1. Then i balanced the class weights (W/O SMOTE), but still got the same results. This was confusing to me. What am i doing wrong? Can someone help me understand this please? or what can i do to improve the model?
I tried 5 different classifiers (KNN, AdaBoost, SVC, RF, DT) and in 4 out of 6 i got the same result!
Here's the code:
#Splitting data to training and testing
X_train, X_test, y_train, y_test = train_test_split (X, y, test_size = 0.1, random_state=42)
#Apply StandardScaler for feature scaling
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform (X_test)
#SMOTE
sm = SMOTE(random_state=42)
X_balanced, y_balanced = sm.fit_sample(X_train_std, y_train)
#PCA
pca = PCA(random_state=42)
#Classifier regularization (SVC).
svc = SVC(random_state=42, class_weight= 'balanced')
pipe_svc = Pipeline(steps=[('pca', pca), ('svc', svc)])
# Parameters of pipelines can be set using ‘__’ separated parameter names:
parameters_svc = [{'pca__n_components': [2, 5, 20, 30, 40, 50, 60, 70, 80, 90, 100, 140, 150]},
{'svc__C':[1, 10, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 400, 500],
'svc__kernel':['rbf', 'linear','poly'],
'svc__gamma': [0.05, 0.06, 0.07, 0.08, 0.09, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007,
0.008,0.009, 0.0001, 0.0002, 0.0003, 0.0004, 0.0005],
'svc__degree': [1, 2, 3, 4, 5, 6],
'svc__gamma': ['auto', 'scale']}]
clfsvc = GridSearchCV(pipe_svc, param_grid =parameters_svc, iid=False, cv=10,
return_train_score=False)
clfsvc.fit(X_balanced, y_balanced)
# Plot the PCA spectrum (SVC)
pca.fit(X_balanced)
fig1, (ax0, ax1) = plt.subplots(nrows=2, sharex=True, figsize=(6, 6)) #(I added 1 to fig)
ax0.plot(pca.explained_variance_ratio_, linewidth=2)
ax0.set_ylabel('PCA explained variance')
ax0.axvline(clfsvc.best_estimator_.named_steps['pca'].n_components,
linestyle=':', label='n_components chosen')
ax0.legend(prop=dict(size=12))
# For each number of components, find the best classifier results
results_svc = pd.DataFrame(clfsvc.cv_results_) #(Added _svc to all variable def)
components_col_svc = 'param_pca__n_components'
best_clfs_svc = results_svc.groupby(components_col_svc).apply(
lambda g: g.nlargest(1, 'mean_test_score'))
best_clfs_svc.plot(x=components_col_svc, y='mean_test_score', yerr='std_test_score',
legend=False, ax=ax1)
ax1.set_ylabel('Classification accuracy (val)')
ax1.set_xlabel('n_components')
plt.tight_layout()
plt.show()
#Predicting the test set results (SVC)
y_pred1 = clfsvc.predict(X_test)
# Model Accuracy, how often is the classifier correct?
Accuracyscore_svc = accuracy_score(y_test, y_pred1)
print("Accuracy for SVC on CV data: ", Accuracyscore_svc)
# Making the confusion matrix to describe the performance of a classifier
from sklearn.metrics import confusion_matrix
cm1 = confusion_matrix (y_test, y_pred1)
#accuracy
# Get accuracy score
accuracy1 = accuracy_score(y_test, y_pred1)
print('Accuracy1: %.2f%%' % (accuracy1 * 100.0))
#Checking shape after confusion matrix
print (X_test)
print (y_pred1)
print (cm1)
machine-learning classification machine-learning-model multilabel-classification unbalanced-classes
machine-learning classification machine-learning-model multilabel-classification unbalanced-classes
asked 12 mins ago
tsumaranainatsumaranaina
7510
7510
$begingroup$
What is the loss function that you are using ? Is loss function sensitive to class-imbalance ?
$endgroup$
– Shamit Verma
1 min ago
add a comment |
$begingroup$
What is the loss function that you are using ? Is loss function sensitive to class-imbalance ?
$endgroup$
– Shamit Verma
1 min ago
$begingroup$
What is the loss function that you are using ? Is loss function sensitive to class-imbalance ?
$endgroup$
– Shamit Verma
1 min ago
$begingroup$
What is the loss function that you are using ? Is loss function sensitive to class-imbalance ?
$endgroup$
– Shamit Verma
1 min ago
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47022%2fwhat-are-some-possible-reasons-that-your-multiclass-classifier-is-classifying-al%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47022%2fwhat-are-some-possible-reasons-that-your-multiclass-classifier-is-classifying-al%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
What is the loss function that you are using ? Is loss function sensitive to class-imbalance ?
$endgroup$
– Shamit Verma
1 min ago