Is feature selection necessary?
$begingroup$
I would like to run some machine learning model like random forest, gradient boosting or SVM on my dataset. There are more than 200 predictor variables in my dataset and my target classes are a binary variable. Do I need to run feature selection before the model fitting? Does it affect the model performance significantly or is there not much difference if I directly fit the model using all predictor variables?
machine-learning predictive-modeling feature-selection random-forest
$endgroup$
add a comment |
$begingroup$
I would like to run some machine learning model like random forest, gradient boosting or SVM on my dataset. There are more than 200 predictor variables in my dataset and my target classes are a binary variable. Do I need to run feature selection before the model fitting? Does it affect the model performance significantly or is there not much difference if I directly fit the model using all predictor variables?
machine-learning predictive-modeling feature-selection random-forest
$endgroup$
$begingroup$
How big is your dataset? If you have thousands of samples and 200 predictor variables, chances are quite high that with a model like Random Forests you will be able to already achieve quite a high performance. Further feature selection will then further improve your performance.
$endgroup$
– Archie
Jan 4 '17 at 9:05
$begingroup$
@Archie Yes, my dataset size is similar as what you mentioned. 'Further feature selection', do you mean to conduct the feature selection before the model fitting and it can do favor for the model performance?
$endgroup$
– LUSAQX
Jan 4 '17 at 9:07
$begingroup$
I mean I would first have a go with all features, Random Forests would be a great classifier to start with. If you then want to push the performance higher, I would look at for example the feature importances to select the most significant features.
$endgroup$
– Archie
Jan 4 '17 at 9:29
$begingroup$
Ok. That is what I have done so far. I will try some feature selection methods before the model fitting to see if there is any improvement by then.
$endgroup$
– LUSAQX
Jan 4 '17 at 9:50
$begingroup$
A short answer from my recent practice, the feature selection is necessary for model comparison. Some algorithms would work better on some set of features whilst some other algorithms on another set.
$endgroup$
– LUSAQX
Sep 11 '17 at 22:39
add a comment |
$begingroup$
I would like to run some machine learning model like random forest, gradient boosting or SVM on my dataset. There are more than 200 predictor variables in my dataset and my target classes are a binary variable. Do I need to run feature selection before the model fitting? Does it affect the model performance significantly or is there not much difference if I directly fit the model using all predictor variables?
machine-learning predictive-modeling feature-selection random-forest
$endgroup$
I would like to run some machine learning model like random forest, gradient boosting or SVM on my dataset. There are more than 200 predictor variables in my dataset and my target classes are a binary variable. Do I need to run feature selection before the model fitting? Does it affect the model performance significantly or is there not much difference if I directly fit the model using all predictor variables?
machine-learning predictive-modeling feature-selection random-forest
machine-learning predictive-modeling feature-selection random-forest
edited Jan 4 '17 at 9:08
Archie
498318
498318
asked Jan 4 '17 at 8:46
LUSAQXLUSAQX
348516
348516
$begingroup$
How big is your dataset? If you have thousands of samples and 200 predictor variables, chances are quite high that with a model like Random Forests you will be able to already achieve quite a high performance. Further feature selection will then further improve your performance.
$endgroup$
– Archie
Jan 4 '17 at 9:05
$begingroup$
@Archie Yes, my dataset size is similar as what you mentioned. 'Further feature selection', do you mean to conduct the feature selection before the model fitting and it can do favor for the model performance?
$endgroup$
– LUSAQX
Jan 4 '17 at 9:07
$begingroup$
I mean I would first have a go with all features, Random Forests would be a great classifier to start with. If you then want to push the performance higher, I would look at for example the feature importances to select the most significant features.
$endgroup$
– Archie
Jan 4 '17 at 9:29
$begingroup$
Ok. That is what I have done so far. I will try some feature selection methods before the model fitting to see if there is any improvement by then.
$endgroup$
– LUSAQX
Jan 4 '17 at 9:50
$begingroup$
A short answer from my recent practice, the feature selection is necessary for model comparison. Some algorithms would work better on some set of features whilst some other algorithms on another set.
$endgroup$
– LUSAQX
Sep 11 '17 at 22:39
add a comment |
$begingroup$
How big is your dataset? If you have thousands of samples and 200 predictor variables, chances are quite high that with a model like Random Forests you will be able to already achieve quite a high performance. Further feature selection will then further improve your performance.
$endgroup$
– Archie
Jan 4 '17 at 9:05
$begingroup$
@Archie Yes, my dataset size is similar as what you mentioned. 'Further feature selection', do you mean to conduct the feature selection before the model fitting and it can do favor for the model performance?
$endgroup$
– LUSAQX
Jan 4 '17 at 9:07
$begingroup$
I mean I would first have a go with all features, Random Forests would be a great classifier to start with. If you then want to push the performance higher, I would look at for example the feature importances to select the most significant features.
$endgroup$
– Archie
Jan 4 '17 at 9:29
$begingroup$
Ok. That is what I have done so far. I will try some feature selection methods before the model fitting to see if there is any improvement by then.
$endgroup$
– LUSAQX
Jan 4 '17 at 9:50
$begingroup$
A short answer from my recent practice, the feature selection is necessary for model comparison. Some algorithms would work better on some set of features whilst some other algorithms on another set.
$endgroup$
– LUSAQX
Sep 11 '17 at 22:39
$begingroup$
How big is your dataset? If you have thousands of samples and 200 predictor variables, chances are quite high that with a model like Random Forests you will be able to already achieve quite a high performance. Further feature selection will then further improve your performance.
$endgroup$
– Archie
Jan 4 '17 at 9:05
$begingroup$
How big is your dataset? If you have thousands of samples and 200 predictor variables, chances are quite high that with a model like Random Forests you will be able to already achieve quite a high performance. Further feature selection will then further improve your performance.
$endgroup$
– Archie
Jan 4 '17 at 9:05
$begingroup$
@Archie Yes, my dataset size is similar as what you mentioned. 'Further feature selection', do you mean to conduct the feature selection before the model fitting and it can do favor for the model performance?
$endgroup$
– LUSAQX
Jan 4 '17 at 9:07
$begingroup$
@Archie Yes, my dataset size is similar as what you mentioned. 'Further feature selection', do you mean to conduct the feature selection before the model fitting and it can do favor for the model performance?
$endgroup$
– LUSAQX
Jan 4 '17 at 9:07
$begingroup$
I mean I would first have a go with all features, Random Forests would be a great classifier to start with. If you then want to push the performance higher, I would look at for example the feature importances to select the most significant features.
$endgroup$
– Archie
Jan 4 '17 at 9:29
$begingroup$
I mean I would first have a go with all features, Random Forests would be a great classifier to start with. If you then want to push the performance higher, I would look at for example the feature importances to select the most significant features.
$endgroup$
– Archie
Jan 4 '17 at 9:29
$begingroup$
Ok. That is what I have done so far. I will try some feature selection methods before the model fitting to see if there is any improvement by then.
$endgroup$
– LUSAQX
Jan 4 '17 at 9:50
$begingroup$
Ok. That is what I have done so far. I will try some feature selection methods before the model fitting to see if there is any improvement by then.
$endgroup$
– LUSAQX
Jan 4 '17 at 9:50
$begingroup$
A short answer from my recent practice, the feature selection is necessary for model comparison. Some algorithms would work better on some set of features whilst some other algorithms on another set.
$endgroup$
– LUSAQX
Sep 11 '17 at 22:39
$begingroup$
A short answer from my recent practice, the feature selection is necessary for model comparison. Some algorithms would work better on some set of features whilst some other algorithms on another set.
$endgroup$
– LUSAQX
Sep 11 '17 at 22:39
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
Feature selection might be consider a stage to avoid.
You have to spend computation time in order to remove features and actually lose data and the methods that you have to do feature selection are not optimal since the problem is NP-Complete.
Using it doesn't sound like an offer that you cannot refuse.
So, what are the benefits of using it?
- Many features and low samples/features ratio will introduce noise into your dataset. In such a case your classification algorithm are likely to overfit, and give you a false feeling of good performance.
- Reducing the number of features will reduce the running time in the later stages. That in turn will enable you using algorithms of higher complexity, search for more hyper parameters or do more evaluations.
- A smaller set of feature is more comprehendible to humans. That will enable you to focus on the main sources of predictability and do more exact feature engineering. If you will have to explain your model to a client, you are better presenting a model with 5 features than a model with 200 features.
Now for your specific case:
I recommend that you'll begin in computing the correlations among the features and the concept. Computing correlations among all features is also informative.
Note that there are many types of useful correlations (e.g., Pearson, Mutual information) and many attributes that might effect them (e.g., sparseness, concept imbalance). Examining them instead of blindly go with a feature selection algorithm might save you plenty of time in the future.
I don't think that you will have a lot of running time problems with your dataset. However, your samples/features ratio isn't too high so you might benefit from feature selection.
Choose a classifier of low complexity(e.g., linear regression, a small decision tree) and use it as a benchmark. Try it on the full data set and on some dataset with a subset of the features. Such a benchmark will guid you in the use of feature selection. You will need such guidance since there are many options (e.g., the number of features to select, the feature selection algorithm) an since the goal is usually the predication and not the feature selection so feedback is at least one step away.
$endgroup$
$begingroup$
Thanks a lot. But for a non-linear classifier like random forest, does it also require the predictor variables independent from each other? I guess not but could be wrong. How the correlation guide the feature selection?
$endgroup$
– LUSAQX
Jan 4 '17 at 20:19
$begingroup$
Random forest is a collection of trees that cope well with dependent variable. That is since in each node in the tree, the dataset is conditioned by all the variables above it. The problem is that the tree growing is heuristic so the choice of the above variable might have not been optimal.
$endgroup$
– DaL
Jan 5 '17 at 7:57
$begingroup$
The correlation only compare pairs of variables and therefore cannot give you the full picture. On the other hand, you get the result in O(n^2) and not O(2^n)... The guidance is usually specific to the dataset so I find it hard to explain it this way. Some examples is the removal of variables that are redundant (very correlated to other variables). Examining the strength of the correlation might indicate whether you can use a small model or will be needed to use many weak learners. Identifying subset that are not too correlated with each other might indicate the co-training will be useful.
$endgroup$
– DaL
Jan 5 '17 at 8:03
add a comment |
$begingroup$
I've posted a very similar question on Cross Validated few months ago and got a very large number of responses. Read the responses and the comments.
https://stats.stackexchange.com/questions/215154/variable-selection-for-predictive-modeling-really-needed-in-2016
$endgroup$
$begingroup$
Great question!
$endgroup$
– Aaron
Oct 12 '17 at 18:29
add a comment |
$begingroup$
Yes, feature selection is one of the most crucial task for machine learning problems, after performing data wrangling and cleaning.
you can find the functions implementing the feature selection process using XGBOOST feature importance here.
https://github.com/abhisheksharma4194/Machine-learning
New contributor
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f16062%2fis-feature-selection-necessary%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Feature selection might be consider a stage to avoid.
You have to spend computation time in order to remove features and actually lose data and the methods that you have to do feature selection are not optimal since the problem is NP-Complete.
Using it doesn't sound like an offer that you cannot refuse.
So, what are the benefits of using it?
- Many features and low samples/features ratio will introduce noise into your dataset. In such a case your classification algorithm are likely to overfit, and give you a false feeling of good performance.
- Reducing the number of features will reduce the running time in the later stages. That in turn will enable you using algorithms of higher complexity, search for more hyper parameters or do more evaluations.
- A smaller set of feature is more comprehendible to humans. That will enable you to focus on the main sources of predictability and do more exact feature engineering. If you will have to explain your model to a client, you are better presenting a model with 5 features than a model with 200 features.
Now for your specific case:
I recommend that you'll begin in computing the correlations among the features and the concept. Computing correlations among all features is also informative.
Note that there are many types of useful correlations (e.g., Pearson, Mutual information) and many attributes that might effect them (e.g., sparseness, concept imbalance). Examining them instead of blindly go with a feature selection algorithm might save you plenty of time in the future.
I don't think that you will have a lot of running time problems with your dataset. However, your samples/features ratio isn't too high so you might benefit from feature selection.
Choose a classifier of low complexity(e.g., linear regression, a small decision tree) and use it as a benchmark. Try it on the full data set and on some dataset with a subset of the features. Such a benchmark will guid you in the use of feature selection. You will need such guidance since there are many options (e.g., the number of features to select, the feature selection algorithm) an since the goal is usually the predication and not the feature selection so feedback is at least one step away.
$endgroup$
$begingroup$
Thanks a lot. But for a non-linear classifier like random forest, does it also require the predictor variables independent from each other? I guess not but could be wrong. How the correlation guide the feature selection?
$endgroup$
– LUSAQX
Jan 4 '17 at 20:19
$begingroup$
Random forest is a collection of trees that cope well with dependent variable. That is since in each node in the tree, the dataset is conditioned by all the variables above it. The problem is that the tree growing is heuristic so the choice of the above variable might have not been optimal.
$endgroup$
– DaL
Jan 5 '17 at 7:57
$begingroup$
The correlation only compare pairs of variables and therefore cannot give you the full picture. On the other hand, you get the result in O(n^2) and not O(2^n)... The guidance is usually specific to the dataset so I find it hard to explain it this way. Some examples is the removal of variables that are redundant (very correlated to other variables). Examining the strength of the correlation might indicate whether you can use a small model or will be needed to use many weak learners. Identifying subset that are not too correlated with each other might indicate the co-training will be useful.
$endgroup$
– DaL
Jan 5 '17 at 8:03
add a comment |
$begingroup$
Feature selection might be consider a stage to avoid.
You have to spend computation time in order to remove features and actually lose data and the methods that you have to do feature selection are not optimal since the problem is NP-Complete.
Using it doesn't sound like an offer that you cannot refuse.
So, what are the benefits of using it?
- Many features and low samples/features ratio will introduce noise into your dataset. In such a case your classification algorithm are likely to overfit, and give you a false feeling of good performance.
- Reducing the number of features will reduce the running time in the later stages. That in turn will enable you using algorithms of higher complexity, search for more hyper parameters or do more evaluations.
- A smaller set of feature is more comprehendible to humans. That will enable you to focus on the main sources of predictability and do more exact feature engineering. If you will have to explain your model to a client, you are better presenting a model with 5 features than a model with 200 features.
Now for your specific case:
I recommend that you'll begin in computing the correlations among the features and the concept. Computing correlations among all features is also informative.
Note that there are many types of useful correlations (e.g., Pearson, Mutual information) and many attributes that might effect them (e.g., sparseness, concept imbalance). Examining them instead of blindly go with a feature selection algorithm might save you plenty of time in the future.
I don't think that you will have a lot of running time problems with your dataset. However, your samples/features ratio isn't too high so you might benefit from feature selection.
Choose a classifier of low complexity(e.g., linear regression, a small decision tree) and use it as a benchmark. Try it on the full data set and on some dataset with a subset of the features. Such a benchmark will guid you in the use of feature selection. You will need such guidance since there are many options (e.g., the number of features to select, the feature selection algorithm) an since the goal is usually the predication and not the feature selection so feedback is at least one step away.
$endgroup$
$begingroup$
Thanks a lot. But for a non-linear classifier like random forest, does it also require the predictor variables independent from each other? I guess not but could be wrong. How the correlation guide the feature selection?
$endgroup$
– LUSAQX
Jan 4 '17 at 20:19
$begingroup$
Random forest is a collection of trees that cope well with dependent variable. That is since in each node in the tree, the dataset is conditioned by all the variables above it. The problem is that the tree growing is heuristic so the choice of the above variable might have not been optimal.
$endgroup$
– DaL
Jan 5 '17 at 7:57
$begingroup$
The correlation only compare pairs of variables and therefore cannot give you the full picture. On the other hand, you get the result in O(n^2) and not O(2^n)... The guidance is usually specific to the dataset so I find it hard to explain it this way. Some examples is the removal of variables that are redundant (very correlated to other variables). Examining the strength of the correlation might indicate whether you can use a small model or will be needed to use many weak learners. Identifying subset that are not too correlated with each other might indicate the co-training will be useful.
$endgroup$
– DaL
Jan 5 '17 at 8:03
add a comment |
$begingroup$
Feature selection might be consider a stage to avoid.
You have to spend computation time in order to remove features and actually lose data and the methods that you have to do feature selection are not optimal since the problem is NP-Complete.
Using it doesn't sound like an offer that you cannot refuse.
So, what are the benefits of using it?
- Many features and low samples/features ratio will introduce noise into your dataset. In such a case your classification algorithm are likely to overfit, and give you a false feeling of good performance.
- Reducing the number of features will reduce the running time in the later stages. That in turn will enable you using algorithms of higher complexity, search for more hyper parameters or do more evaluations.
- A smaller set of feature is more comprehendible to humans. That will enable you to focus on the main sources of predictability and do more exact feature engineering. If you will have to explain your model to a client, you are better presenting a model with 5 features than a model with 200 features.
Now for your specific case:
I recommend that you'll begin in computing the correlations among the features and the concept. Computing correlations among all features is also informative.
Note that there are many types of useful correlations (e.g., Pearson, Mutual information) and many attributes that might effect them (e.g., sparseness, concept imbalance). Examining them instead of blindly go with a feature selection algorithm might save you plenty of time in the future.
I don't think that you will have a lot of running time problems with your dataset. However, your samples/features ratio isn't too high so you might benefit from feature selection.
Choose a classifier of low complexity(e.g., linear regression, a small decision tree) and use it as a benchmark. Try it on the full data set and on some dataset with a subset of the features. Such a benchmark will guid you in the use of feature selection. You will need such guidance since there are many options (e.g., the number of features to select, the feature selection algorithm) an since the goal is usually the predication and not the feature selection so feedback is at least one step away.
$endgroup$
Feature selection might be consider a stage to avoid.
You have to spend computation time in order to remove features and actually lose data and the methods that you have to do feature selection are not optimal since the problem is NP-Complete.
Using it doesn't sound like an offer that you cannot refuse.
So, what are the benefits of using it?
- Many features and low samples/features ratio will introduce noise into your dataset. In such a case your classification algorithm are likely to overfit, and give you a false feeling of good performance.
- Reducing the number of features will reduce the running time in the later stages. That in turn will enable you using algorithms of higher complexity, search for more hyper parameters or do more evaluations.
- A smaller set of feature is more comprehendible to humans. That will enable you to focus on the main sources of predictability and do more exact feature engineering. If you will have to explain your model to a client, you are better presenting a model with 5 features than a model with 200 features.
Now for your specific case:
I recommend that you'll begin in computing the correlations among the features and the concept. Computing correlations among all features is also informative.
Note that there are many types of useful correlations (e.g., Pearson, Mutual information) and many attributes that might effect them (e.g., sparseness, concept imbalance). Examining them instead of blindly go with a feature selection algorithm might save you plenty of time in the future.
I don't think that you will have a lot of running time problems with your dataset. However, your samples/features ratio isn't too high so you might benefit from feature selection.
Choose a classifier of low complexity(e.g., linear regression, a small decision tree) and use it as a benchmark. Try it on the full data set and on some dataset with a subset of the features. Such a benchmark will guid you in the use of feature selection. You will need such guidance since there are many options (e.g., the number of features to select, the feature selection algorithm) an since the goal is usually the predication and not the feature selection so feedback is at least one step away.
answered Jan 4 '17 at 12:03
DaLDaL
2,174410
2,174410
$begingroup$
Thanks a lot. But for a non-linear classifier like random forest, does it also require the predictor variables independent from each other? I guess not but could be wrong. How the correlation guide the feature selection?
$endgroup$
– LUSAQX
Jan 4 '17 at 20:19
$begingroup$
Random forest is a collection of trees that cope well with dependent variable. That is since in each node in the tree, the dataset is conditioned by all the variables above it. The problem is that the tree growing is heuristic so the choice of the above variable might have not been optimal.
$endgroup$
– DaL
Jan 5 '17 at 7:57
$begingroup$
The correlation only compare pairs of variables and therefore cannot give you the full picture. On the other hand, you get the result in O(n^2) and not O(2^n)... The guidance is usually specific to the dataset so I find it hard to explain it this way. Some examples is the removal of variables that are redundant (very correlated to other variables). Examining the strength of the correlation might indicate whether you can use a small model or will be needed to use many weak learners. Identifying subset that are not too correlated with each other might indicate the co-training will be useful.
$endgroup$
– DaL
Jan 5 '17 at 8:03
add a comment |
$begingroup$
Thanks a lot. But for a non-linear classifier like random forest, does it also require the predictor variables independent from each other? I guess not but could be wrong. How the correlation guide the feature selection?
$endgroup$
– LUSAQX
Jan 4 '17 at 20:19
$begingroup$
Random forest is a collection of trees that cope well with dependent variable. That is since in each node in the tree, the dataset is conditioned by all the variables above it. The problem is that the tree growing is heuristic so the choice of the above variable might have not been optimal.
$endgroup$
– DaL
Jan 5 '17 at 7:57
$begingroup$
The correlation only compare pairs of variables and therefore cannot give you the full picture. On the other hand, you get the result in O(n^2) and not O(2^n)... The guidance is usually specific to the dataset so I find it hard to explain it this way. Some examples is the removal of variables that are redundant (very correlated to other variables). Examining the strength of the correlation might indicate whether you can use a small model or will be needed to use many weak learners. Identifying subset that are not too correlated with each other might indicate the co-training will be useful.
$endgroup$
– DaL
Jan 5 '17 at 8:03
$begingroup$
Thanks a lot. But for a non-linear classifier like random forest, does it also require the predictor variables independent from each other? I guess not but could be wrong. How the correlation guide the feature selection?
$endgroup$
– LUSAQX
Jan 4 '17 at 20:19
$begingroup$
Thanks a lot. But for a non-linear classifier like random forest, does it also require the predictor variables independent from each other? I guess not but could be wrong. How the correlation guide the feature selection?
$endgroup$
– LUSAQX
Jan 4 '17 at 20:19
$begingroup$
Random forest is a collection of trees that cope well with dependent variable. That is since in each node in the tree, the dataset is conditioned by all the variables above it. The problem is that the tree growing is heuristic so the choice of the above variable might have not been optimal.
$endgroup$
– DaL
Jan 5 '17 at 7:57
$begingroup$
Random forest is a collection of trees that cope well with dependent variable. That is since in each node in the tree, the dataset is conditioned by all the variables above it. The problem is that the tree growing is heuristic so the choice of the above variable might have not been optimal.
$endgroup$
– DaL
Jan 5 '17 at 7:57
$begingroup$
The correlation only compare pairs of variables and therefore cannot give you the full picture. On the other hand, you get the result in O(n^2) and not O(2^n)... The guidance is usually specific to the dataset so I find it hard to explain it this way. Some examples is the removal of variables that are redundant (very correlated to other variables). Examining the strength of the correlation might indicate whether you can use a small model or will be needed to use many weak learners. Identifying subset that are not too correlated with each other might indicate the co-training will be useful.
$endgroup$
– DaL
Jan 5 '17 at 8:03
$begingroup$
The correlation only compare pairs of variables and therefore cannot give you the full picture. On the other hand, you get the result in O(n^2) and not O(2^n)... The guidance is usually specific to the dataset so I find it hard to explain it this way. Some examples is the removal of variables that are redundant (very correlated to other variables). Examining the strength of the correlation might indicate whether you can use a small model or will be needed to use many weak learners. Identifying subset that are not too correlated with each other might indicate the co-training will be useful.
$endgroup$
– DaL
Jan 5 '17 at 8:03
add a comment |
$begingroup$
I've posted a very similar question on Cross Validated few months ago and got a very large number of responses. Read the responses and the comments.
https://stats.stackexchange.com/questions/215154/variable-selection-for-predictive-modeling-really-needed-in-2016
$endgroup$
$begingroup$
Great question!
$endgroup$
– Aaron
Oct 12 '17 at 18:29
add a comment |
$begingroup$
I've posted a very similar question on Cross Validated few months ago and got a very large number of responses. Read the responses and the comments.
https://stats.stackexchange.com/questions/215154/variable-selection-for-predictive-modeling-really-needed-in-2016
$endgroup$
$begingroup$
Great question!
$endgroup$
– Aaron
Oct 12 '17 at 18:29
add a comment |
$begingroup$
I've posted a very similar question on Cross Validated few months ago and got a very large number of responses. Read the responses and the comments.
https://stats.stackexchange.com/questions/215154/variable-selection-for-predictive-modeling-really-needed-in-2016
$endgroup$
I've posted a very similar question on Cross Validated few months ago and got a very large number of responses. Read the responses and the comments.
https://stats.stackexchange.com/questions/215154/variable-selection-for-predictive-modeling-really-needed-in-2016
edited Apr 13 '17 at 12:44
Community♦
1
1
answered Jan 4 '17 at 21:34
horaceThoraceT
860511
860511
$begingroup$
Great question!
$endgroup$
– Aaron
Oct 12 '17 at 18:29
add a comment |
$begingroup$
Great question!
$endgroup$
– Aaron
Oct 12 '17 at 18:29
$begingroup$
Great question!
$endgroup$
– Aaron
Oct 12 '17 at 18:29
$begingroup$
Great question!
$endgroup$
– Aaron
Oct 12 '17 at 18:29
add a comment |
$begingroup$
Yes, feature selection is one of the most crucial task for machine learning problems, after performing data wrangling and cleaning.
you can find the functions implementing the feature selection process using XGBOOST feature importance here.
https://github.com/abhisheksharma4194/Machine-learning
New contributor
$endgroup$
add a comment |
$begingroup$
Yes, feature selection is one of the most crucial task for machine learning problems, after performing data wrangling and cleaning.
you can find the functions implementing the feature selection process using XGBOOST feature importance here.
https://github.com/abhisheksharma4194/Machine-learning
New contributor
$endgroup$
add a comment |
$begingroup$
Yes, feature selection is one of the most crucial task for machine learning problems, after performing data wrangling and cleaning.
you can find the functions implementing the feature selection process using XGBOOST feature importance here.
https://github.com/abhisheksharma4194/Machine-learning
New contributor
$endgroup$
Yes, feature selection is one of the most crucial task for machine learning problems, after performing data wrangling and cleaning.
you can find the functions implementing the feature selection process using XGBOOST feature importance here.
https://github.com/abhisheksharma4194/Machine-learning
New contributor
New contributor
answered 7 mins ago
Abhishek SharmaAbhishek Sharma
1
1
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f16062%2fis-feature-selection-necessary%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
How big is your dataset? If you have thousands of samples and 200 predictor variables, chances are quite high that with a model like Random Forests you will be able to already achieve quite a high performance. Further feature selection will then further improve your performance.
$endgroup$
– Archie
Jan 4 '17 at 9:05
$begingroup$
@Archie Yes, my dataset size is similar as what you mentioned. 'Further feature selection', do you mean to conduct the feature selection before the model fitting and it can do favor for the model performance?
$endgroup$
– LUSAQX
Jan 4 '17 at 9:07
$begingroup$
I mean I would first have a go with all features, Random Forests would be a great classifier to start with. If you then want to push the performance higher, I would look at for example the feature importances to select the most significant features.
$endgroup$
– Archie
Jan 4 '17 at 9:29
$begingroup$
Ok. That is what I have done so far. I will try some feature selection methods before the model fitting to see if there is any improvement by then.
$endgroup$
– LUSAQX
Jan 4 '17 at 9:50
$begingroup$
A short answer from my recent practice, the feature selection is necessary for model comparison. Some algorithms would work better on some set of features whilst some other algorithms on another set.
$endgroup$
– LUSAQX
Sep 11 '17 at 22:39