Merging sparse and dense data in machine learning to improve the performance
$begingroup$
I have sparse features which are predictive, also I have some dense features which are also predictive. I need to combine these features together to improve the overall performance of the classifier.
Now, the thing is when I try to combine these together, the dense features tend to dominate more over sparse features, hence giving only 1% improvement in AUC compared to model with only dense features.
Has somebody come across similar problems? Really appreciate the inputs, kind of stuck. I have already tried lot of different classifiers, combination of classifiers, feature transformations and processing with different algorithms.
Thanks in advance for the help.
Edit:
I have already tried the suggestions which are given in the comments. What I have observed is, for almost 45% of the data, sparse features perform really well, I get the AUC of around 0.9 with only sparse features, but for the remaining ones dense features perform well with AUC of around 0.75. I kind of tried separating out these datasets, but I get the AUC of 0.6, so, I can't simply train a model and decide which features to use.
Regarding the code snippet, I have tried out so many things, that I am not sure what exactly to share :(
machine-learning classification predictive-modeling scikit-learn supervised-learning
$endgroup$
add a comment |
$begingroup$
I have sparse features which are predictive, also I have some dense features which are also predictive. I need to combine these features together to improve the overall performance of the classifier.
Now, the thing is when I try to combine these together, the dense features tend to dominate more over sparse features, hence giving only 1% improvement in AUC compared to model with only dense features.
Has somebody come across similar problems? Really appreciate the inputs, kind of stuck. I have already tried lot of different classifiers, combination of classifiers, feature transformations and processing with different algorithms.
Thanks in advance for the help.
Edit:
I have already tried the suggestions which are given in the comments. What I have observed is, for almost 45% of the data, sparse features perform really well, I get the AUC of around 0.9 with only sparse features, but for the remaining ones dense features perform well with AUC of around 0.75. I kind of tried separating out these datasets, but I get the AUC of 0.6, so, I can't simply train a model and decide which features to use.
Regarding the code snippet, I have tried out so many things, that I am not sure what exactly to share :(
machine-learning classification predictive-modeling scikit-learn supervised-learning
$endgroup$
$begingroup$
How sparse are your features? Are they 1% filled or even less?
$endgroup$
– João Almeida
Apr 6 '16 at 12:35
2
$begingroup$
Also you should note that if your features are sparse then they should only help classify a small part of your dataset, which means overall the accuracy shouldn't change significantly. This is kind of a guess, as I don't know what are the characteristics of your dataset.
$endgroup$
– João Almeida
Apr 6 '16 at 12:40
$begingroup$
@JoãoAlmeida They are not that sparse. They are around 5% filled. The problem is when I look at the difference in the predictions from two models, where the predictions differ, model with sparse features tend to perform better, that's why I expected it to see the boost in AUC as well when I combined them with dense features. I am getting a boost, but seems very low.
$endgroup$
– Sagar Waghmode
Apr 7 '16 at 10:46
$begingroup$
hum... I don't have any idea for you then
$endgroup$
– João Almeida
Apr 7 '16 at 10:51
add a comment |
$begingroup$
I have sparse features which are predictive, also I have some dense features which are also predictive. I need to combine these features together to improve the overall performance of the classifier.
Now, the thing is when I try to combine these together, the dense features tend to dominate more over sparse features, hence giving only 1% improvement in AUC compared to model with only dense features.
Has somebody come across similar problems? Really appreciate the inputs, kind of stuck. I have already tried lot of different classifiers, combination of classifiers, feature transformations and processing with different algorithms.
Thanks in advance for the help.
Edit:
I have already tried the suggestions which are given in the comments. What I have observed is, for almost 45% of the data, sparse features perform really well, I get the AUC of around 0.9 with only sparse features, but for the remaining ones dense features perform well with AUC of around 0.75. I kind of tried separating out these datasets, but I get the AUC of 0.6, so, I can't simply train a model and decide which features to use.
Regarding the code snippet, I have tried out so many things, that I am not sure what exactly to share :(
machine-learning classification predictive-modeling scikit-learn supervised-learning
$endgroup$
I have sparse features which are predictive, also I have some dense features which are also predictive. I need to combine these features together to improve the overall performance of the classifier.
Now, the thing is when I try to combine these together, the dense features tend to dominate more over sparse features, hence giving only 1% improvement in AUC compared to model with only dense features.
Has somebody come across similar problems? Really appreciate the inputs, kind of stuck. I have already tried lot of different classifiers, combination of classifiers, feature transformations and processing with different algorithms.
Thanks in advance for the help.
Edit:
I have already tried the suggestions which are given in the comments. What I have observed is, for almost 45% of the data, sparse features perform really well, I get the AUC of around 0.9 with only sparse features, but for the remaining ones dense features perform well with AUC of around 0.75. I kind of tried separating out these datasets, but I get the AUC of 0.6, so, I can't simply train a model and decide which features to use.
Regarding the code snippet, I have tried out so many things, that I am not sure what exactly to share :(
machine-learning classification predictive-modeling scikit-learn supervised-learning
machine-learning classification predictive-modeling scikit-learn supervised-learning
edited Apr 18 '16 at 4:42
Sagar Waghmode
asked Apr 6 '16 at 5:14
Sagar WaghmodeSagar Waghmode
12617
12617
$begingroup$
How sparse are your features? Are they 1% filled or even less?
$endgroup$
– João Almeida
Apr 6 '16 at 12:35
2
$begingroup$
Also you should note that if your features are sparse then they should only help classify a small part of your dataset, which means overall the accuracy shouldn't change significantly. This is kind of a guess, as I don't know what are the characteristics of your dataset.
$endgroup$
– João Almeida
Apr 6 '16 at 12:40
$begingroup$
@JoãoAlmeida They are not that sparse. They are around 5% filled. The problem is when I look at the difference in the predictions from two models, where the predictions differ, model with sparse features tend to perform better, that's why I expected it to see the boost in AUC as well when I combined them with dense features. I am getting a boost, but seems very low.
$endgroup$
– Sagar Waghmode
Apr 7 '16 at 10:46
$begingroup$
hum... I don't have any idea for you then
$endgroup$
– João Almeida
Apr 7 '16 at 10:51
add a comment |
$begingroup$
How sparse are your features? Are they 1% filled or even less?
$endgroup$
– João Almeida
Apr 6 '16 at 12:35
2
$begingroup$
Also you should note that if your features are sparse then they should only help classify a small part of your dataset, which means overall the accuracy shouldn't change significantly. This is kind of a guess, as I don't know what are the characteristics of your dataset.
$endgroup$
– João Almeida
Apr 6 '16 at 12:40
$begingroup$
@JoãoAlmeida They are not that sparse. They are around 5% filled. The problem is when I look at the difference in the predictions from two models, where the predictions differ, model with sparse features tend to perform better, that's why I expected it to see the boost in AUC as well when I combined them with dense features. I am getting a boost, but seems very low.
$endgroup$
– Sagar Waghmode
Apr 7 '16 at 10:46
$begingroup$
hum... I don't have any idea for you then
$endgroup$
– João Almeida
Apr 7 '16 at 10:51
$begingroup$
How sparse are your features? Are they 1% filled or even less?
$endgroup$
– João Almeida
Apr 6 '16 at 12:35
$begingroup$
How sparse are your features? Are they 1% filled or even less?
$endgroup$
– João Almeida
Apr 6 '16 at 12:35
2
2
$begingroup$
Also you should note that if your features are sparse then they should only help classify a small part of your dataset, which means overall the accuracy shouldn't change significantly. This is kind of a guess, as I don't know what are the characteristics of your dataset.
$endgroup$
– João Almeida
Apr 6 '16 at 12:40
$begingroup$
Also you should note that if your features are sparse then they should only help classify a small part of your dataset, which means overall the accuracy shouldn't change significantly. This is kind of a guess, as I don't know what are the characteristics of your dataset.
$endgroup$
– João Almeida
Apr 6 '16 at 12:40
$begingroup$
@JoãoAlmeida They are not that sparse. They are around 5% filled. The problem is when I look at the difference in the predictions from two models, where the predictions differ, model with sparse features tend to perform better, that's why I expected it to see the boost in AUC as well when I combined them with dense features. I am getting a boost, but seems very low.
$endgroup$
– Sagar Waghmode
Apr 7 '16 at 10:46
$begingroup$
@JoãoAlmeida They are not that sparse. They are around 5% filled. The problem is when I look at the difference in the predictions from two models, where the predictions differ, model with sparse features tend to perform better, that's why I expected it to see the boost in AUC as well when I combined them with dense features. I am getting a boost, but seems very low.
$endgroup$
– Sagar Waghmode
Apr 7 '16 at 10:46
$begingroup$
hum... I don't have any idea for you then
$endgroup$
– João Almeida
Apr 7 '16 at 10:51
$begingroup$
hum... I don't have any idea for you then
$endgroup$
– João Almeida
Apr 7 '16 at 10:51
add a comment |
6 Answers
6
active
oldest
votes
$begingroup$
This seems like a job for Principal Component Analysis. In Scikit is PCA implemented well and it helped me many times.
PCA, in a certain way, combines your features. By limiting the number of components, you fetch your model with noise-less data (in the best case). Because your model is as good as your data are.
Consider below a simple example.
from sklearn.pipeline import Pipeline
pipe_rf = Pipeline([('pca', PCA(n_components=80)),
('clf',RandomForestClassifier(n_estimators=100))])
pipe_rf.fit(X_train_s,y_train_s)
pred = pipe_rf.predict(X_test)
Why I picked 80? When I plot cumulative variance, I got this below, which tells me that with ~80 components, I reach almost all the variance.
So I would say give it a try, use it in your models. It should help.
$endgroup$
add a comment |
$begingroup$
The best way to combine features is through ensemble methods.
Basically there are three different methods: bagging, boosting and stacking.
You can either use Adabbost augmented with feature selection (in this consider both sparse and dense features) or stacking based (random feature - random subspace)
I prefer the second option you can train a set of base learners ( decisions. Trees) by using random subsets and random feature ( keep training base learners until you cover the whole set of features)
The next step is to test the Training set to generate the meta data. Use this meta data to train a meta classifier.
The meta classifier will figure out which feature is more important and what kind of relationship should be utilized
$endgroup$
$begingroup$
Can you please share the relevant documentation? Didn't exactly get you what you meant?
$endgroup$
– Sagar Waghmode
Apr 13 '16 at 6:04
$begingroup$
You can read an article about staking " issues in stacking techniques, 1999" read about stackingC . It is very important to know that I am talking about the whole vector (e.g. 1x36 in case of Hog) as a one feature, but not the dimensions within it. You need to track which feature used with which base learner. Be careful about the overfitting problem
$endgroup$
– Bashar Haddad
Apr 13 '16 at 16:15
$begingroup$
If you give more details about the database , number of classes, number of samples , code , what things you have tried , what things you noticed, do you have data imbalance problem, noisy samples ,... Etc . All these details are important and can help in selecting the best method. Give me more details if this ok and I may help in a better way
$endgroup$
– Bashar Haddad
Apr 13 '16 at 16:19
add a comment |
$begingroup$
The variable groups may be multicollinear or the conversion between sparse and dense might go wrong. Have you thought about using a voting classifier/ ensemble classification? http://scikit-learn.org/stable/modules/ensemble.html
That way you could deal with both above problems.
$endgroup$
$begingroup$
I have already tried out the ensemble techniques as well as voting classifiers. Still no luck.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 8:15
$begingroup$
So do you see a lot of overlap then between the predictions from the two datasets? May be there indeed is no new information? I.e. the data tells the same story.
$endgroup$
– Diego
Apr 12 '16 at 9:20
$begingroup$
yes, I have done exactly that. Though the predictions are not entirely different, the number of samples where predictions differ are quite high (around 15-20%) of the data. For these samples model with sparse features performs better than that of model with dense features. My point is if sparse features perform better, why don't they come as important features in any of the models which I have tried so far.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 9:31
$begingroup$
What predictor algorithm do you use?
$endgroup$
– Diego
Apr 12 '16 at 12:21
$begingroup$
I have tried out quite a few algorithms and settled on Gradient Boosted Model, also I do use Random Forests quite a lot for my problem.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 17:27
|
show 2 more comments
$begingroup$
In addition to some of the suggestions above, I would recommend using a two-step modeling approach.
- Use the sparse features first and develop the best model.
- Calculate the predicted probability from that model.
- Feed that probability estimate into the second model (as an input feature), which would incorporate the dense features. In other words, use all dense features and the probability estimate for building the second model.
- The final classification will then be based on the second model.
$endgroup$
add a comment |
$begingroup$
Try PCA only on sparse features, and combine PCA output with dense features.
So you'll get dense set of (original) features + dense set of features (which were originally sparse).
+1 for the question. Please update us with the results.
$endgroup$
$begingroup$
Wow, this has actually brought down AUC :( Not sure, what it means, need to check the feature importance and all. But my philosophy is, out of around 2.3k sparse features, I used 1k features which were explaining 0.97 variance ratio, this loss of information may have brought down AUC.
$endgroup$
– Sagar Waghmode
Apr 18 '16 at 10:17
$begingroup$
Interesting. Thanks for sharing. We have very similar dataset to yours (1k-2k sparse features). Just out of curiosity, how many principal componenets you have generated? If that number is too low, this may explain why AUC went down.
$endgroup$
– Tagar
Apr 18 '16 at 15:22
$begingroup$
As I said already, I have generated 1k principal components which were explaining 0.97 variance.
$endgroup$
– Sagar Waghmode
Apr 18 '16 at 17:55
add a comment |
$begingroup$
i met the same problem, maybe simply put dense and sparse feature in a single model is not a good choice. maybe you can try wide and deep model. wide for sparse features and deep for dense features, if you tried this method, please tell me the answer.
New contributor
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f11060%2fmerging-sparse-and-dense-data-in-machine-learning-to-improve-the-performance%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
This seems like a job for Principal Component Analysis. In Scikit is PCA implemented well and it helped me many times.
PCA, in a certain way, combines your features. By limiting the number of components, you fetch your model with noise-less data (in the best case). Because your model is as good as your data are.
Consider below a simple example.
from sklearn.pipeline import Pipeline
pipe_rf = Pipeline([('pca', PCA(n_components=80)),
('clf',RandomForestClassifier(n_estimators=100))])
pipe_rf.fit(X_train_s,y_train_s)
pred = pipe_rf.predict(X_test)
Why I picked 80? When I plot cumulative variance, I got this below, which tells me that with ~80 components, I reach almost all the variance.
So I would say give it a try, use it in your models. It should help.
$endgroup$
add a comment |
$begingroup$
This seems like a job for Principal Component Analysis. In Scikit is PCA implemented well and it helped me many times.
PCA, in a certain way, combines your features. By limiting the number of components, you fetch your model with noise-less data (in the best case). Because your model is as good as your data are.
Consider below a simple example.
from sklearn.pipeline import Pipeline
pipe_rf = Pipeline([('pca', PCA(n_components=80)),
('clf',RandomForestClassifier(n_estimators=100))])
pipe_rf.fit(X_train_s,y_train_s)
pred = pipe_rf.predict(X_test)
Why I picked 80? When I plot cumulative variance, I got this below, which tells me that with ~80 components, I reach almost all the variance.
So I would say give it a try, use it in your models. It should help.
$endgroup$
add a comment |
$begingroup$
This seems like a job for Principal Component Analysis. In Scikit is PCA implemented well and it helped me many times.
PCA, in a certain way, combines your features. By limiting the number of components, you fetch your model with noise-less data (in the best case). Because your model is as good as your data are.
Consider below a simple example.
from sklearn.pipeline import Pipeline
pipe_rf = Pipeline([('pca', PCA(n_components=80)),
('clf',RandomForestClassifier(n_estimators=100))])
pipe_rf.fit(X_train_s,y_train_s)
pred = pipe_rf.predict(X_test)
Why I picked 80? When I plot cumulative variance, I got this below, which tells me that with ~80 components, I reach almost all the variance.
So I would say give it a try, use it in your models. It should help.
$endgroup$
This seems like a job for Principal Component Analysis. In Scikit is PCA implemented well and it helped me many times.
PCA, in a certain way, combines your features. By limiting the number of components, you fetch your model with noise-less data (in the best case). Because your model is as good as your data are.
Consider below a simple example.
from sklearn.pipeline import Pipeline
pipe_rf = Pipeline([('pca', PCA(n_components=80)),
('clf',RandomForestClassifier(n_estimators=100))])
pipe_rf.fit(X_train_s,y_train_s)
pred = pipe_rf.predict(X_test)
Why I picked 80? When I plot cumulative variance, I got this below, which tells me that with ~80 components, I reach almost all the variance.
So I would say give it a try, use it in your models. It should help.
edited Dec 29 '17 at 12:56
answered Apr 13 '16 at 12:54
HonzaBHonzaB
1,196514
1,196514
add a comment |
add a comment |
$begingroup$
The best way to combine features is through ensemble methods.
Basically there are three different methods: bagging, boosting and stacking.
You can either use Adabbost augmented with feature selection (in this consider both sparse and dense features) or stacking based (random feature - random subspace)
I prefer the second option you can train a set of base learners ( decisions. Trees) by using random subsets and random feature ( keep training base learners until you cover the whole set of features)
The next step is to test the Training set to generate the meta data. Use this meta data to train a meta classifier.
The meta classifier will figure out which feature is more important and what kind of relationship should be utilized
$endgroup$
$begingroup$
Can you please share the relevant documentation? Didn't exactly get you what you meant?
$endgroup$
– Sagar Waghmode
Apr 13 '16 at 6:04
$begingroup$
You can read an article about staking " issues in stacking techniques, 1999" read about stackingC . It is very important to know that I am talking about the whole vector (e.g. 1x36 in case of Hog) as a one feature, but not the dimensions within it. You need to track which feature used with which base learner. Be careful about the overfitting problem
$endgroup$
– Bashar Haddad
Apr 13 '16 at 16:15
$begingroup$
If you give more details about the database , number of classes, number of samples , code , what things you have tried , what things you noticed, do you have data imbalance problem, noisy samples ,... Etc . All these details are important and can help in selecting the best method. Give me more details if this ok and I may help in a better way
$endgroup$
– Bashar Haddad
Apr 13 '16 at 16:19
add a comment |
$begingroup$
The best way to combine features is through ensemble methods.
Basically there are three different methods: bagging, boosting and stacking.
You can either use Adabbost augmented with feature selection (in this consider both sparse and dense features) or stacking based (random feature - random subspace)
I prefer the second option you can train a set of base learners ( decisions. Trees) by using random subsets and random feature ( keep training base learners until you cover the whole set of features)
The next step is to test the Training set to generate the meta data. Use this meta data to train a meta classifier.
The meta classifier will figure out which feature is more important and what kind of relationship should be utilized
$endgroup$
$begingroup$
Can you please share the relevant documentation? Didn't exactly get you what you meant?
$endgroup$
– Sagar Waghmode
Apr 13 '16 at 6:04
$begingroup$
You can read an article about staking " issues in stacking techniques, 1999" read about stackingC . It is very important to know that I am talking about the whole vector (e.g. 1x36 in case of Hog) as a one feature, but not the dimensions within it. You need to track which feature used with which base learner. Be careful about the overfitting problem
$endgroup$
– Bashar Haddad
Apr 13 '16 at 16:15
$begingroup$
If you give more details about the database , number of classes, number of samples , code , what things you have tried , what things you noticed, do you have data imbalance problem, noisy samples ,... Etc . All these details are important and can help in selecting the best method. Give me more details if this ok and I may help in a better way
$endgroup$
– Bashar Haddad
Apr 13 '16 at 16:19
add a comment |
$begingroup$
The best way to combine features is through ensemble methods.
Basically there are three different methods: bagging, boosting and stacking.
You can either use Adabbost augmented with feature selection (in this consider both sparse and dense features) or stacking based (random feature - random subspace)
I prefer the second option you can train a set of base learners ( decisions. Trees) by using random subsets and random feature ( keep training base learners until you cover the whole set of features)
The next step is to test the Training set to generate the meta data. Use this meta data to train a meta classifier.
The meta classifier will figure out which feature is more important and what kind of relationship should be utilized
$endgroup$
The best way to combine features is through ensemble methods.
Basically there are three different methods: bagging, boosting and stacking.
You can either use Adabbost augmented with feature selection (in this consider both sparse and dense features) or stacking based (random feature - random subspace)
I prefer the second option you can train a set of base learners ( decisions. Trees) by using random subsets and random feature ( keep training base learners until you cover the whole set of features)
The next step is to test the Training set to generate the meta data. Use this meta data to train a meta classifier.
The meta classifier will figure out which feature is more important and what kind of relationship should be utilized
answered Apr 12 '16 at 4:44
Bashar HaddadBashar Haddad
1,2621413
1,2621413
$begingroup$
Can you please share the relevant documentation? Didn't exactly get you what you meant?
$endgroup$
– Sagar Waghmode
Apr 13 '16 at 6:04
$begingroup$
You can read an article about staking " issues in stacking techniques, 1999" read about stackingC . It is very important to know that I am talking about the whole vector (e.g. 1x36 in case of Hog) as a one feature, but not the dimensions within it. You need to track which feature used with which base learner. Be careful about the overfitting problem
$endgroup$
– Bashar Haddad
Apr 13 '16 at 16:15
$begingroup$
If you give more details about the database , number of classes, number of samples , code , what things you have tried , what things you noticed, do you have data imbalance problem, noisy samples ,... Etc . All these details are important and can help in selecting the best method. Give me more details if this ok and I may help in a better way
$endgroup$
– Bashar Haddad
Apr 13 '16 at 16:19
add a comment |
$begingroup$
Can you please share the relevant documentation? Didn't exactly get you what you meant?
$endgroup$
– Sagar Waghmode
Apr 13 '16 at 6:04
$begingroup$
You can read an article about staking " issues in stacking techniques, 1999" read about stackingC . It is very important to know that I am talking about the whole vector (e.g. 1x36 in case of Hog) as a one feature, but not the dimensions within it. You need to track which feature used with which base learner. Be careful about the overfitting problem
$endgroup$
– Bashar Haddad
Apr 13 '16 at 16:15
$begingroup$
If you give more details about the database , number of classes, number of samples , code , what things you have tried , what things you noticed, do you have data imbalance problem, noisy samples ,... Etc . All these details are important and can help in selecting the best method. Give me more details if this ok and I may help in a better way
$endgroup$
– Bashar Haddad
Apr 13 '16 at 16:19
$begingroup$
Can you please share the relevant documentation? Didn't exactly get you what you meant?
$endgroup$
– Sagar Waghmode
Apr 13 '16 at 6:04
$begingroup$
Can you please share the relevant documentation? Didn't exactly get you what you meant?
$endgroup$
– Sagar Waghmode
Apr 13 '16 at 6:04
$begingroup$
You can read an article about staking " issues in stacking techniques, 1999" read about stackingC . It is very important to know that I am talking about the whole vector (e.g. 1x36 in case of Hog) as a one feature, but not the dimensions within it. You need to track which feature used with which base learner. Be careful about the overfitting problem
$endgroup$
– Bashar Haddad
Apr 13 '16 at 16:15
$begingroup$
You can read an article about staking " issues in stacking techniques, 1999" read about stackingC . It is very important to know that I am talking about the whole vector (e.g. 1x36 in case of Hog) as a one feature, but not the dimensions within it. You need to track which feature used with which base learner. Be careful about the overfitting problem
$endgroup$
– Bashar Haddad
Apr 13 '16 at 16:15
$begingroup$
If you give more details about the database , number of classes, number of samples , code , what things you have tried , what things you noticed, do you have data imbalance problem, noisy samples ,... Etc . All these details are important and can help in selecting the best method. Give me more details if this ok and I may help in a better way
$endgroup$
– Bashar Haddad
Apr 13 '16 at 16:19
$begingroup$
If you give more details about the database , number of classes, number of samples , code , what things you have tried , what things you noticed, do you have data imbalance problem, noisy samples ,... Etc . All these details are important and can help in selecting the best method. Give me more details if this ok and I may help in a better way
$endgroup$
– Bashar Haddad
Apr 13 '16 at 16:19
add a comment |
$begingroup$
The variable groups may be multicollinear or the conversion between sparse and dense might go wrong. Have you thought about using a voting classifier/ ensemble classification? http://scikit-learn.org/stable/modules/ensemble.html
That way you could deal with both above problems.
$endgroup$
$begingroup$
I have already tried out the ensemble techniques as well as voting classifiers. Still no luck.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 8:15
$begingroup$
So do you see a lot of overlap then between the predictions from the two datasets? May be there indeed is no new information? I.e. the data tells the same story.
$endgroup$
– Diego
Apr 12 '16 at 9:20
$begingroup$
yes, I have done exactly that. Though the predictions are not entirely different, the number of samples where predictions differ are quite high (around 15-20%) of the data. For these samples model with sparse features performs better than that of model with dense features. My point is if sparse features perform better, why don't they come as important features in any of the models which I have tried so far.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 9:31
$begingroup$
What predictor algorithm do you use?
$endgroup$
– Diego
Apr 12 '16 at 12:21
$begingroup$
I have tried out quite a few algorithms and settled on Gradient Boosted Model, also I do use Random Forests quite a lot for my problem.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 17:27
|
show 2 more comments
$begingroup$
The variable groups may be multicollinear or the conversion between sparse and dense might go wrong. Have you thought about using a voting classifier/ ensemble classification? http://scikit-learn.org/stable/modules/ensemble.html
That way you could deal with both above problems.
$endgroup$
$begingroup$
I have already tried out the ensemble techniques as well as voting classifiers. Still no luck.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 8:15
$begingroup$
So do you see a lot of overlap then between the predictions from the two datasets? May be there indeed is no new information? I.e. the data tells the same story.
$endgroup$
– Diego
Apr 12 '16 at 9:20
$begingroup$
yes, I have done exactly that. Though the predictions are not entirely different, the number of samples where predictions differ are quite high (around 15-20%) of the data. For these samples model with sparse features performs better than that of model with dense features. My point is if sparse features perform better, why don't they come as important features in any of the models which I have tried so far.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 9:31
$begingroup$
What predictor algorithm do you use?
$endgroup$
– Diego
Apr 12 '16 at 12:21
$begingroup$
I have tried out quite a few algorithms and settled on Gradient Boosted Model, also I do use Random Forests quite a lot for my problem.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 17:27
|
show 2 more comments
$begingroup$
The variable groups may be multicollinear or the conversion between sparse and dense might go wrong. Have you thought about using a voting classifier/ ensemble classification? http://scikit-learn.org/stable/modules/ensemble.html
That way you could deal with both above problems.
$endgroup$
The variable groups may be multicollinear or the conversion between sparse and dense might go wrong. Have you thought about using a voting classifier/ ensemble classification? http://scikit-learn.org/stable/modules/ensemble.html
That way you could deal with both above problems.
answered Apr 12 '16 at 4:30
DiegoDiego
52528
52528
$begingroup$
I have already tried out the ensemble techniques as well as voting classifiers. Still no luck.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 8:15
$begingroup$
So do you see a lot of overlap then between the predictions from the two datasets? May be there indeed is no new information? I.e. the data tells the same story.
$endgroup$
– Diego
Apr 12 '16 at 9:20
$begingroup$
yes, I have done exactly that. Though the predictions are not entirely different, the number of samples where predictions differ are quite high (around 15-20%) of the data. For these samples model with sparse features performs better than that of model with dense features. My point is if sparse features perform better, why don't they come as important features in any of the models which I have tried so far.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 9:31
$begingroup$
What predictor algorithm do you use?
$endgroup$
– Diego
Apr 12 '16 at 12:21
$begingroup$
I have tried out quite a few algorithms and settled on Gradient Boosted Model, also I do use Random Forests quite a lot for my problem.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 17:27
|
show 2 more comments
$begingroup$
I have already tried out the ensemble techniques as well as voting classifiers. Still no luck.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 8:15
$begingroup$
So do you see a lot of overlap then between the predictions from the two datasets? May be there indeed is no new information? I.e. the data tells the same story.
$endgroup$
– Diego
Apr 12 '16 at 9:20
$begingroup$
yes, I have done exactly that. Though the predictions are not entirely different, the number of samples where predictions differ are quite high (around 15-20%) of the data. For these samples model with sparse features performs better than that of model with dense features. My point is if sparse features perform better, why don't they come as important features in any of the models which I have tried so far.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 9:31
$begingroup$
What predictor algorithm do you use?
$endgroup$
– Diego
Apr 12 '16 at 12:21
$begingroup$
I have tried out quite a few algorithms and settled on Gradient Boosted Model, also I do use Random Forests quite a lot for my problem.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 17:27
$begingroup$
I have already tried out the ensemble techniques as well as voting classifiers. Still no luck.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 8:15
$begingroup$
I have already tried out the ensemble techniques as well as voting classifiers. Still no luck.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 8:15
$begingroup$
So do you see a lot of overlap then between the predictions from the two datasets? May be there indeed is no new information? I.e. the data tells the same story.
$endgroup$
– Diego
Apr 12 '16 at 9:20
$begingroup$
So do you see a lot of overlap then between the predictions from the two datasets? May be there indeed is no new information? I.e. the data tells the same story.
$endgroup$
– Diego
Apr 12 '16 at 9:20
$begingroup$
yes, I have done exactly that. Though the predictions are not entirely different, the number of samples where predictions differ are quite high (around 15-20%) of the data. For these samples model with sparse features performs better than that of model with dense features. My point is if sparse features perform better, why don't they come as important features in any of the models which I have tried so far.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 9:31
$begingroup$
yes, I have done exactly that. Though the predictions are not entirely different, the number of samples where predictions differ are quite high (around 15-20%) of the data. For these samples model with sparse features performs better than that of model with dense features. My point is if sparse features perform better, why don't they come as important features in any of the models which I have tried so far.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 9:31
$begingroup$
What predictor algorithm do you use?
$endgroup$
– Diego
Apr 12 '16 at 12:21
$begingroup$
What predictor algorithm do you use?
$endgroup$
– Diego
Apr 12 '16 at 12:21
$begingroup$
I have tried out quite a few algorithms and settled on Gradient Boosted Model, also I do use Random Forests quite a lot for my problem.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 17:27
$begingroup$
I have tried out quite a few algorithms and settled on Gradient Boosted Model, also I do use Random Forests quite a lot for my problem.
$endgroup$
– Sagar Waghmode
Apr 12 '16 at 17:27
|
show 2 more comments
$begingroup$
In addition to some of the suggestions above, I would recommend using a two-step modeling approach.
- Use the sparse features first and develop the best model.
- Calculate the predicted probability from that model.
- Feed that probability estimate into the second model (as an input feature), which would incorporate the dense features. In other words, use all dense features and the probability estimate for building the second model.
- The final classification will then be based on the second model.
$endgroup$
add a comment |
$begingroup$
In addition to some of the suggestions above, I would recommend using a two-step modeling approach.
- Use the sparse features first and develop the best model.
- Calculate the predicted probability from that model.
- Feed that probability estimate into the second model (as an input feature), which would incorporate the dense features. In other words, use all dense features and the probability estimate for building the second model.
- The final classification will then be based on the second model.
$endgroup$
add a comment |
$begingroup$
In addition to some of the suggestions above, I would recommend using a two-step modeling approach.
- Use the sparse features first and develop the best model.
- Calculate the predicted probability from that model.
- Feed that probability estimate into the second model (as an input feature), which would incorporate the dense features. In other words, use all dense features and the probability estimate for building the second model.
- The final classification will then be based on the second model.
$endgroup$
In addition to some of the suggestions above, I would recommend using a two-step modeling approach.
- Use the sparse features first and develop the best model.
- Calculate the predicted probability from that model.
- Feed that probability estimate into the second model (as an input feature), which would incorporate the dense features. In other words, use all dense features and the probability estimate for building the second model.
- The final classification will then be based on the second model.
answered Apr 13 '16 at 17:24
VishalVishal
1634
1634
add a comment |
add a comment |
$begingroup$
Try PCA only on sparse features, and combine PCA output with dense features.
So you'll get dense set of (original) features + dense set of features (which were originally sparse).
+1 for the question. Please update us with the results.
$endgroup$
$begingroup$
Wow, this has actually brought down AUC :( Not sure, what it means, need to check the feature importance and all. But my philosophy is, out of around 2.3k sparse features, I used 1k features which were explaining 0.97 variance ratio, this loss of information may have brought down AUC.
$endgroup$
– Sagar Waghmode
Apr 18 '16 at 10:17
$begingroup$
Interesting. Thanks for sharing. We have very similar dataset to yours (1k-2k sparse features). Just out of curiosity, how many principal componenets you have generated? If that number is too low, this may explain why AUC went down.
$endgroup$
– Tagar
Apr 18 '16 at 15:22
$begingroup$
As I said already, I have generated 1k principal components which were explaining 0.97 variance.
$endgroup$
– Sagar Waghmode
Apr 18 '16 at 17:55
add a comment |
$begingroup$
Try PCA only on sparse features, and combine PCA output with dense features.
So you'll get dense set of (original) features + dense set of features (which were originally sparse).
+1 for the question. Please update us with the results.
$endgroup$
$begingroup$
Wow, this has actually brought down AUC :( Not sure, what it means, need to check the feature importance and all. But my philosophy is, out of around 2.3k sparse features, I used 1k features which were explaining 0.97 variance ratio, this loss of information may have brought down AUC.
$endgroup$
– Sagar Waghmode
Apr 18 '16 at 10:17
$begingroup$
Interesting. Thanks for sharing. We have very similar dataset to yours (1k-2k sparse features). Just out of curiosity, how many principal componenets you have generated? If that number is too low, this may explain why AUC went down.
$endgroup$
– Tagar
Apr 18 '16 at 15:22
$begingroup$
As I said already, I have generated 1k principal components which were explaining 0.97 variance.
$endgroup$
– Sagar Waghmode
Apr 18 '16 at 17:55
add a comment |
$begingroup$
Try PCA only on sparse features, and combine PCA output with dense features.
So you'll get dense set of (original) features + dense set of features (which were originally sparse).
+1 for the question. Please update us with the results.
$endgroup$
Try PCA only on sparse features, and combine PCA output with dense features.
So you'll get dense set of (original) features + dense set of features (which were originally sparse).
+1 for the question. Please update us with the results.
answered Apr 18 '16 at 6:11
TagarTagar
153111
153111
$begingroup$
Wow, this has actually brought down AUC :( Not sure, what it means, need to check the feature importance and all. But my philosophy is, out of around 2.3k sparse features, I used 1k features which were explaining 0.97 variance ratio, this loss of information may have brought down AUC.
$endgroup$
– Sagar Waghmode
Apr 18 '16 at 10:17
$begingroup$
Interesting. Thanks for sharing. We have very similar dataset to yours (1k-2k sparse features). Just out of curiosity, how many principal componenets you have generated? If that number is too low, this may explain why AUC went down.
$endgroup$
– Tagar
Apr 18 '16 at 15:22
$begingroup$
As I said already, I have generated 1k principal components which were explaining 0.97 variance.
$endgroup$
– Sagar Waghmode
Apr 18 '16 at 17:55
add a comment |
$begingroup$
Wow, this has actually brought down AUC :( Not sure, what it means, need to check the feature importance and all. But my philosophy is, out of around 2.3k sparse features, I used 1k features which were explaining 0.97 variance ratio, this loss of information may have brought down AUC.
$endgroup$
– Sagar Waghmode
Apr 18 '16 at 10:17
$begingroup$
Interesting. Thanks for sharing. We have very similar dataset to yours (1k-2k sparse features). Just out of curiosity, how many principal componenets you have generated? If that number is too low, this may explain why AUC went down.
$endgroup$
– Tagar
Apr 18 '16 at 15:22
$begingroup$
As I said already, I have generated 1k principal components which were explaining 0.97 variance.
$endgroup$
– Sagar Waghmode
Apr 18 '16 at 17:55
$begingroup$
Wow, this has actually brought down AUC :( Not sure, what it means, need to check the feature importance and all. But my philosophy is, out of around 2.3k sparse features, I used 1k features which were explaining 0.97 variance ratio, this loss of information may have brought down AUC.
$endgroup$
– Sagar Waghmode
Apr 18 '16 at 10:17
$begingroup$
Wow, this has actually brought down AUC :( Not sure, what it means, need to check the feature importance and all. But my philosophy is, out of around 2.3k sparse features, I used 1k features which were explaining 0.97 variance ratio, this loss of information may have brought down AUC.
$endgroup$
– Sagar Waghmode
Apr 18 '16 at 10:17
$begingroup$
Interesting. Thanks for sharing. We have very similar dataset to yours (1k-2k sparse features). Just out of curiosity, how many principal componenets you have generated? If that number is too low, this may explain why AUC went down.
$endgroup$
– Tagar
Apr 18 '16 at 15:22
$begingroup$
Interesting. Thanks for sharing. We have very similar dataset to yours (1k-2k sparse features). Just out of curiosity, how many principal componenets you have generated? If that number is too low, this may explain why AUC went down.
$endgroup$
– Tagar
Apr 18 '16 at 15:22
$begingroup$
As I said already, I have generated 1k principal components which were explaining 0.97 variance.
$endgroup$
– Sagar Waghmode
Apr 18 '16 at 17:55
$begingroup$
As I said already, I have generated 1k principal components which were explaining 0.97 variance.
$endgroup$
– Sagar Waghmode
Apr 18 '16 at 17:55
add a comment |
$begingroup$
i met the same problem, maybe simply put dense and sparse feature in a single model is not a good choice. maybe you can try wide and deep model. wide for sparse features and deep for dense features, if you tried this method, please tell me the answer.
New contributor
$endgroup$
add a comment |
$begingroup$
i met the same problem, maybe simply put dense and sparse feature in a single model is not a good choice. maybe you can try wide and deep model. wide for sparse features and deep for dense features, if you tried this method, please tell me the answer.
New contributor
$endgroup$
add a comment |
$begingroup$
i met the same problem, maybe simply put dense and sparse feature in a single model is not a good choice. maybe you can try wide and deep model. wide for sparse features and deep for dense features, if you tried this method, please tell me the answer.
New contributor
$endgroup$
i met the same problem, maybe simply put dense and sparse feature in a single model is not a good choice. maybe you can try wide and deep model. wide for sparse features and deep for dense features, if you tried this method, please tell me the answer.
New contributor
New contributor
answered 11 mins ago
Jianye JiJianye Ji
1
1
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f11060%2fmerging-sparse-and-dense-data-in-machine-learning-to-improve-the-performance%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
How sparse are your features? Are they 1% filled or even less?
$endgroup$
– João Almeida
Apr 6 '16 at 12:35
2
$begingroup$
Also you should note that if your features are sparse then they should only help classify a small part of your dataset, which means overall the accuracy shouldn't change significantly. This is kind of a guess, as I don't know what are the characteristics of your dataset.
$endgroup$
– João Almeida
Apr 6 '16 at 12:40
$begingroup$
@JoãoAlmeida They are not that sparse. They are around 5% filled. The problem is when I look at the difference in the predictions from two models, where the predictions differ, model with sparse features tend to perform better, that's why I expected it to see the boost in AUC as well when I combined them with dense features. I am getting a boost, but seems very low.
$endgroup$
– Sagar Waghmode
Apr 7 '16 at 10:46
$begingroup$
hum... I don't have any idea for you then
$endgroup$
– João Almeida
Apr 7 '16 at 10:51