Is feature selection necessary?












7












$begingroup$


I would like to run some machine learning model like random forest, gradient boosting or SVM on my dataset. There are more than 200 predictor variables in my dataset and my target classes are a binary variable. Do I need to run feature selection before the model fitting? Does it affect the model performance significantly or is there not much difference if I directly fit the model using all predictor variables?










share|improve this question











$endgroup$












  • $begingroup$
    How big is your dataset? If you have thousands of samples and 200 predictor variables, chances are quite high that with a model like Random Forests you will be able to already achieve quite a high performance. Further feature selection will then further improve your performance.
    $endgroup$
    – Archie
    Jan 4 '17 at 9:05










  • $begingroup$
    @Archie Yes, my dataset size is similar as what you mentioned. 'Further feature selection', do you mean to conduct the feature selection before the model fitting and it can do favor for the model performance?
    $endgroup$
    – LUSAQX
    Jan 4 '17 at 9:07










  • $begingroup$
    I mean I would first have a go with all features, Random Forests would be a great classifier to start with. If you then want to push the performance higher, I would look at for example the feature importances to select the most significant features.
    $endgroup$
    – Archie
    Jan 4 '17 at 9:29










  • $begingroup$
    Ok. That is what I have done so far. I will try some feature selection methods before the model fitting to see if there is any improvement by then.
    $endgroup$
    – LUSAQX
    Jan 4 '17 at 9:50










  • $begingroup$
    A short answer from my recent practice, the feature selection is necessary for model comparison. Some algorithms would work better on some set of features whilst some other algorithms on another set.
    $endgroup$
    – LUSAQX
    Sep 11 '17 at 22:39
















7












$begingroup$


I would like to run some machine learning model like random forest, gradient boosting or SVM on my dataset. There are more than 200 predictor variables in my dataset and my target classes are a binary variable. Do I need to run feature selection before the model fitting? Does it affect the model performance significantly or is there not much difference if I directly fit the model using all predictor variables?










share|improve this question











$endgroup$












  • $begingroup$
    How big is your dataset? If you have thousands of samples and 200 predictor variables, chances are quite high that with a model like Random Forests you will be able to already achieve quite a high performance. Further feature selection will then further improve your performance.
    $endgroup$
    – Archie
    Jan 4 '17 at 9:05










  • $begingroup$
    @Archie Yes, my dataset size is similar as what you mentioned. 'Further feature selection', do you mean to conduct the feature selection before the model fitting and it can do favor for the model performance?
    $endgroup$
    – LUSAQX
    Jan 4 '17 at 9:07










  • $begingroup$
    I mean I would first have a go with all features, Random Forests would be a great classifier to start with. If you then want to push the performance higher, I would look at for example the feature importances to select the most significant features.
    $endgroup$
    – Archie
    Jan 4 '17 at 9:29










  • $begingroup$
    Ok. That is what I have done so far. I will try some feature selection methods before the model fitting to see if there is any improvement by then.
    $endgroup$
    – LUSAQX
    Jan 4 '17 at 9:50










  • $begingroup$
    A short answer from my recent practice, the feature selection is necessary for model comparison. Some algorithms would work better on some set of features whilst some other algorithms on another set.
    $endgroup$
    – LUSAQX
    Sep 11 '17 at 22:39














7












7








7


2



$begingroup$


I would like to run some machine learning model like random forest, gradient boosting or SVM on my dataset. There are more than 200 predictor variables in my dataset and my target classes are a binary variable. Do I need to run feature selection before the model fitting? Does it affect the model performance significantly or is there not much difference if I directly fit the model using all predictor variables?










share|improve this question











$endgroup$




I would like to run some machine learning model like random forest, gradient boosting or SVM on my dataset. There are more than 200 predictor variables in my dataset and my target classes are a binary variable. Do I need to run feature selection before the model fitting? Does it affect the model performance significantly or is there not much difference if I directly fit the model using all predictor variables?







machine-learning predictive-modeling feature-selection random-forest






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 4 '17 at 9:08









Archie

498318




498318










asked Jan 4 '17 at 8:46









LUSAQXLUSAQX

348516




348516












  • $begingroup$
    How big is your dataset? If you have thousands of samples and 200 predictor variables, chances are quite high that with a model like Random Forests you will be able to already achieve quite a high performance. Further feature selection will then further improve your performance.
    $endgroup$
    – Archie
    Jan 4 '17 at 9:05










  • $begingroup$
    @Archie Yes, my dataset size is similar as what you mentioned. 'Further feature selection', do you mean to conduct the feature selection before the model fitting and it can do favor for the model performance?
    $endgroup$
    – LUSAQX
    Jan 4 '17 at 9:07










  • $begingroup$
    I mean I would first have a go with all features, Random Forests would be a great classifier to start with. If you then want to push the performance higher, I would look at for example the feature importances to select the most significant features.
    $endgroup$
    – Archie
    Jan 4 '17 at 9:29










  • $begingroup$
    Ok. That is what I have done so far. I will try some feature selection methods before the model fitting to see if there is any improvement by then.
    $endgroup$
    – LUSAQX
    Jan 4 '17 at 9:50










  • $begingroup$
    A short answer from my recent practice, the feature selection is necessary for model comparison. Some algorithms would work better on some set of features whilst some other algorithms on another set.
    $endgroup$
    – LUSAQX
    Sep 11 '17 at 22:39


















  • $begingroup$
    How big is your dataset? If you have thousands of samples and 200 predictor variables, chances are quite high that with a model like Random Forests you will be able to already achieve quite a high performance. Further feature selection will then further improve your performance.
    $endgroup$
    – Archie
    Jan 4 '17 at 9:05










  • $begingroup$
    @Archie Yes, my dataset size is similar as what you mentioned. 'Further feature selection', do you mean to conduct the feature selection before the model fitting and it can do favor for the model performance?
    $endgroup$
    – LUSAQX
    Jan 4 '17 at 9:07










  • $begingroup$
    I mean I would first have a go with all features, Random Forests would be a great classifier to start with. If you then want to push the performance higher, I would look at for example the feature importances to select the most significant features.
    $endgroup$
    – Archie
    Jan 4 '17 at 9:29










  • $begingroup$
    Ok. That is what I have done so far. I will try some feature selection methods before the model fitting to see if there is any improvement by then.
    $endgroup$
    – LUSAQX
    Jan 4 '17 at 9:50










  • $begingroup$
    A short answer from my recent practice, the feature selection is necessary for model comparison. Some algorithms would work better on some set of features whilst some other algorithms on another set.
    $endgroup$
    – LUSAQX
    Sep 11 '17 at 22:39
















$begingroup$
How big is your dataset? If you have thousands of samples and 200 predictor variables, chances are quite high that with a model like Random Forests you will be able to already achieve quite a high performance. Further feature selection will then further improve your performance.
$endgroup$
– Archie
Jan 4 '17 at 9:05




$begingroup$
How big is your dataset? If you have thousands of samples and 200 predictor variables, chances are quite high that with a model like Random Forests you will be able to already achieve quite a high performance. Further feature selection will then further improve your performance.
$endgroup$
– Archie
Jan 4 '17 at 9:05












$begingroup$
@Archie Yes, my dataset size is similar as what you mentioned. 'Further feature selection', do you mean to conduct the feature selection before the model fitting and it can do favor for the model performance?
$endgroup$
– LUSAQX
Jan 4 '17 at 9:07




$begingroup$
@Archie Yes, my dataset size is similar as what you mentioned. 'Further feature selection', do you mean to conduct the feature selection before the model fitting and it can do favor for the model performance?
$endgroup$
– LUSAQX
Jan 4 '17 at 9:07












$begingroup$
I mean I would first have a go with all features, Random Forests would be a great classifier to start with. If you then want to push the performance higher, I would look at for example the feature importances to select the most significant features.
$endgroup$
– Archie
Jan 4 '17 at 9:29




$begingroup$
I mean I would first have a go with all features, Random Forests would be a great classifier to start with. If you then want to push the performance higher, I would look at for example the feature importances to select the most significant features.
$endgroup$
– Archie
Jan 4 '17 at 9:29












$begingroup$
Ok. That is what I have done so far. I will try some feature selection methods before the model fitting to see if there is any improvement by then.
$endgroup$
– LUSAQX
Jan 4 '17 at 9:50




$begingroup$
Ok. That is what I have done so far. I will try some feature selection methods before the model fitting to see if there is any improvement by then.
$endgroup$
– LUSAQX
Jan 4 '17 at 9:50












$begingroup$
A short answer from my recent practice, the feature selection is necessary for model comparison. Some algorithms would work better on some set of features whilst some other algorithms on another set.
$endgroup$
– LUSAQX
Sep 11 '17 at 22:39




$begingroup$
A short answer from my recent practice, the feature selection is necessary for model comparison. Some algorithms would work better on some set of features whilst some other algorithms on another set.
$endgroup$
– LUSAQX
Sep 11 '17 at 22:39










3 Answers
3






active

oldest

votes


















10












$begingroup$

Feature selection might be consider a stage to avoid.
You have to spend computation time in order to remove features and actually lose data and the methods that you have to do feature selection are not optimal since the problem is NP-Complete.
Using it doesn't sound like an offer that you cannot refuse.



So, what are the benefits of using it?




  1. Many features and low samples/features ratio will introduce noise into your dataset. In such a case your classification algorithm are likely to overfit, and give you a false feeling of good performance.

  2. Reducing the number of features will reduce the running time in the later stages. That in turn will enable you using algorithms of higher complexity, search for more hyper parameters or do more evaluations.

  3. A smaller set of feature is more comprehendible to humans. That will enable you to focus on the main sources of predictability and do more exact feature engineering. If you will have to explain your model to a client, you are better presenting a model with 5 features than a model with 200 features.


Now for your specific case:
I recommend that you'll begin in computing the correlations among the features and the concept. Computing correlations among all features is also informative.
Note that there are many types of useful correlations (e.g., Pearson, Mutual information) and many attributes that might effect them (e.g., sparseness, concept imbalance). Examining them instead of blindly go with a feature selection algorithm might save you plenty of time in the future.



I don't think that you will have a lot of running time problems with your dataset. However, your samples/features ratio isn't too high so you might benefit from feature selection.



Choose a classifier of low complexity(e.g., linear regression, a small decision tree) and use it as a benchmark. Try it on the full data set and on some dataset with a subset of the features. Such a benchmark will guid you in the use of feature selection. You will need such guidance since there are many options (e.g., the number of features to select, the feature selection algorithm) an since the goal is usually the predication and not the feature selection so feedback is at least one step away.






share|improve this answer









$endgroup$













  • $begingroup$
    Thanks a lot. But for a non-linear classifier like random forest, does it also require the predictor variables independent from each other? I guess not but could be wrong. How the correlation guide the feature selection?
    $endgroup$
    – LUSAQX
    Jan 4 '17 at 20:19










  • $begingroup$
    Random forest is a collection of trees that cope well with dependent variable. That is since in each node in the tree, the dataset is conditioned by all the variables above it. The problem is that the tree growing is heuristic so the choice of the above variable might have not been optimal.
    $endgroup$
    – DaL
    Jan 5 '17 at 7:57










  • $begingroup$
    The correlation only compare pairs of variables and therefore cannot give you the full picture. On the other hand, you get the result in O(n^2) and not O(2^n)... The guidance is usually specific to the dataset so I find it hard to explain it this way. Some examples is the removal of variables that are redundant (very correlated to other variables). Examining the strength of the correlation might indicate whether you can use a small model or will be needed to use many weak learners. Identifying subset that are not too correlated with each other might indicate the co-training will be useful.
    $endgroup$
    – DaL
    Jan 5 '17 at 8:03



















7












$begingroup$

I've posted a very similar question on Cross Validated few months ago and got a very large number of responses. Read the responses and the comments.



https://stats.stackexchange.com/questions/215154/variable-selection-for-predictive-modeling-really-needed-in-2016






share|improve this answer











$endgroup$













  • $begingroup$
    Great question!
    $endgroup$
    – Aaron
    Oct 12 '17 at 18:29



















0












$begingroup$

Yes, feature selection is one of the most crucial task for machine learning problems, after performing data wrangling and cleaning.
you can find the functions implementing the feature selection process using XGBOOST feature importance here.



https://github.com/abhisheksharma4194/Machine-learning





share








New contributor




Abhishek Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "557"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f16062%2fis-feature-selection-necessary%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    10












    $begingroup$

    Feature selection might be consider a stage to avoid.
    You have to spend computation time in order to remove features and actually lose data and the methods that you have to do feature selection are not optimal since the problem is NP-Complete.
    Using it doesn't sound like an offer that you cannot refuse.



    So, what are the benefits of using it?




    1. Many features and low samples/features ratio will introduce noise into your dataset. In such a case your classification algorithm are likely to overfit, and give you a false feeling of good performance.

    2. Reducing the number of features will reduce the running time in the later stages. That in turn will enable you using algorithms of higher complexity, search for more hyper parameters or do more evaluations.

    3. A smaller set of feature is more comprehendible to humans. That will enable you to focus on the main sources of predictability and do more exact feature engineering. If you will have to explain your model to a client, you are better presenting a model with 5 features than a model with 200 features.


    Now for your specific case:
    I recommend that you'll begin in computing the correlations among the features and the concept. Computing correlations among all features is also informative.
    Note that there are many types of useful correlations (e.g., Pearson, Mutual information) and many attributes that might effect them (e.g., sparseness, concept imbalance). Examining them instead of blindly go with a feature selection algorithm might save you plenty of time in the future.



    I don't think that you will have a lot of running time problems with your dataset. However, your samples/features ratio isn't too high so you might benefit from feature selection.



    Choose a classifier of low complexity(e.g., linear regression, a small decision tree) and use it as a benchmark. Try it on the full data set and on some dataset with a subset of the features. Such a benchmark will guid you in the use of feature selection. You will need such guidance since there are many options (e.g., the number of features to select, the feature selection algorithm) an since the goal is usually the predication and not the feature selection so feedback is at least one step away.






    share|improve this answer









    $endgroup$













    • $begingroup$
      Thanks a lot. But for a non-linear classifier like random forest, does it also require the predictor variables independent from each other? I guess not but could be wrong. How the correlation guide the feature selection?
      $endgroup$
      – LUSAQX
      Jan 4 '17 at 20:19










    • $begingroup$
      Random forest is a collection of trees that cope well with dependent variable. That is since in each node in the tree, the dataset is conditioned by all the variables above it. The problem is that the tree growing is heuristic so the choice of the above variable might have not been optimal.
      $endgroup$
      – DaL
      Jan 5 '17 at 7:57










    • $begingroup$
      The correlation only compare pairs of variables and therefore cannot give you the full picture. On the other hand, you get the result in O(n^2) and not O(2^n)... The guidance is usually specific to the dataset so I find it hard to explain it this way. Some examples is the removal of variables that are redundant (very correlated to other variables). Examining the strength of the correlation might indicate whether you can use a small model or will be needed to use many weak learners. Identifying subset that are not too correlated with each other might indicate the co-training will be useful.
      $endgroup$
      – DaL
      Jan 5 '17 at 8:03
















    10












    $begingroup$

    Feature selection might be consider a stage to avoid.
    You have to spend computation time in order to remove features and actually lose data and the methods that you have to do feature selection are not optimal since the problem is NP-Complete.
    Using it doesn't sound like an offer that you cannot refuse.



    So, what are the benefits of using it?




    1. Many features and low samples/features ratio will introduce noise into your dataset. In such a case your classification algorithm are likely to overfit, and give you a false feeling of good performance.

    2. Reducing the number of features will reduce the running time in the later stages. That in turn will enable you using algorithms of higher complexity, search for more hyper parameters or do more evaluations.

    3. A smaller set of feature is more comprehendible to humans. That will enable you to focus on the main sources of predictability and do more exact feature engineering. If you will have to explain your model to a client, you are better presenting a model with 5 features than a model with 200 features.


    Now for your specific case:
    I recommend that you'll begin in computing the correlations among the features and the concept. Computing correlations among all features is also informative.
    Note that there are many types of useful correlations (e.g., Pearson, Mutual information) and many attributes that might effect them (e.g., sparseness, concept imbalance). Examining them instead of blindly go with a feature selection algorithm might save you plenty of time in the future.



    I don't think that you will have a lot of running time problems with your dataset. However, your samples/features ratio isn't too high so you might benefit from feature selection.



    Choose a classifier of low complexity(e.g., linear regression, a small decision tree) and use it as a benchmark. Try it on the full data set and on some dataset with a subset of the features. Such a benchmark will guid you in the use of feature selection. You will need such guidance since there are many options (e.g., the number of features to select, the feature selection algorithm) an since the goal is usually the predication and not the feature selection so feedback is at least one step away.






    share|improve this answer









    $endgroup$













    • $begingroup$
      Thanks a lot. But for a non-linear classifier like random forest, does it also require the predictor variables independent from each other? I guess not but could be wrong. How the correlation guide the feature selection?
      $endgroup$
      – LUSAQX
      Jan 4 '17 at 20:19










    • $begingroup$
      Random forest is a collection of trees that cope well with dependent variable. That is since in each node in the tree, the dataset is conditioned by all the variables above it. The problem is that the tree growing is heuristic so the choice of the above variable might have not been optimal.
      $endgroup$
      – DaL
      Jan 5 '17 at 7:57










    • $begingroup$
      The correlation only compare pairs of variables and therefore cannot give you the full picture. On the other hand, you get the result in O(n^2) and not O(2^n)... The guidance is usually specific to the dataset so I find it hard to explain it this way. Some examples is the removal of variables that are redundant (very correlated to other variables). Examining the strength of the correlation might indicate whether you can use a small model or will be needed to use many weak learners. Identifying subset that are not too correlated with each other might indicate the co-training will be useful.
      $endgroup$
      – DaL
      Jan 5 '17 at 8:03














    10












    10








    10





    $begingroup$

    Feature selection might be consider a stage to avoid.
    You have to spend computation time in order to remove features and actually lose data and the methods that you have to do feature selection are not optimal since the problem is NP-Complete.
    Using it doesn't sound like an offer that you cannot refuse.



    So, what are the benefits of using it?




    1. Many features and low samples/features ratio will introduce noise into your dataset. In such a case your classification algorithm are likely to overfit, and give you a false feeling of good performance.

    2. Reducing the number of features will reduce the running time in the later stages. That in turn will enable you using algorithms of higher complexity, search for more hyper parameters or do more evaluations.

    3. A smaller set of feature is more comprehendible to humans. That will enable you to focus on the main sources of predictability and do more exact feature engineering. If you will have to explain your model to a client, you are better presenting a model with 5 features than a model with 200 features.


    Now for your specific case:
    I recommend that you'll begin in computing the correlations among the features and the concept. Computing correlations among all features is also informative.
    Note that there are many types of useful correlations (e.g., Pearson, Mutual information) and many attributes that might effect them (e.g., sparseness, concept imbalance). Examining them instead of blindly go with a feature selection algorithm might save you plenty of time in the future.



    I don't think that you will have a lot of running time problems with your dataset. However, your samples/features ratio isn't too high so you might benefit from feature selection.



    Choose a classifier of low complexity(e.g., linear regression, a small decision tree) and use it as a benchmark. Try it on the full data set and on some dataset with a subset of the features. Such a benchmark will guid you in the use of feature selection. You will need such guidance since there are many options (e.g., the number of features to select, the feature selection algorithm) an since the goal is usually the predication and not the feature selection so feedback is at least one step away.






    share|improve this answer









    $endgroup$



    Feature selection might be consider a stage to avoid.
    You have to spend computation time in order to remove features and actually lose data and the methods that you have to do feature selection are not optimal since the problem is NP-Complete.
    Using it doesn't sound like an offer that you cannot refuse.



    So, what are the benefits of using it?




    1. Many features and low samples/features ratio will introduce noise into your dataset. In such a case your classification algorithm are likely to overfit, and give you a false feeling of good performance.

    2. Reducing the number of features will reduce the running time in the later stages. That in turn will enable you using algorithms of higher complexity, search for more hyper parameters or do more evaluations.

    3. A smaller set of feature is more comprehendible to humans. That will enable you to focus on the main sources of predictability and do more exact feature engineering. If you will have to explain your model to a client, you are better presenting a model with 5 features than a model with 200 features.


    Now for your specific case:
    I recommend that you'll begin in computing the correlations among the features and the concept. Computing correlations among all features is also informative.
    Note that there are many types of useful correlations (e.g., Pearson, Mutual information) and many attributes that might effect them (e.g., sparseness, concept imbalance). Examining them instead of blindly go with a feature selection algorithm might save you plenty of time in the future.



    I don't think that you will have a lot of running time problems with your dataset. However, your samples/features ratio isn't too high so you might benefit from feature selection.



    Choose a classifier of low complexity(e.g., linear regression, a small decision tree) and use it as a benchmark. Try it on the full data set and on some dataset with a subset of the features. Such a benchmark will guid you in the use of feature selection. You will need such guidance since there are many options (e.g., the number of features to select, the feature selection algorithm) an since the goal is usually the predication and not the feature selection so feedback is at least one step away.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jan 4 '17 at 12:03









    DaLDaL

    2,174410




    2,174410












    • $begingroup$
      Thanks a lot. But for a non-linear classifier like random forest, does it also require the predictor variables independent from each other? I guess not but could be wrong. How the correlation guide the feature selection?
      $endgroup$
      – LUSAQX
      Jan 4 '17 at 20:19










    • $begingroup$
      Random forest is a collection of trees that cope well with dependent variable. That is since in each node in the tree, the dataset is conditioned by all the variables above it. The problem is that the tree growing is heuristic so the choice of the above variable might have not been optimal.
      $endgroup$
      – DaL
      Jan 5 '17 at 7:57










    • $begingroup$
      The correlation only compare pairs of variables and therefore cannot give you the full picture. On the other hand, you get the result in O(n^2) and not O(2^n)... The guidance is usually specific to the dataset so I find it hard to explain it this way. Some examples is the removal of variables that are redundant (very correlated to other variables). Examining the strength of the correlation might indicate whether you can use a small model or will be needed to use many weak learners. Identifying subset that are not too correlated with each other might indicate the co-training will be useful.
      $endgroup$
      – DaL
      Jan 5 '17 at 8:03


















    • $begingroup$
      Thanks a lot. But for a non-linear classifier like random forest, does it also require the predictor variables independent from each other? I guess not but could be wrong. How the correlation guide the feature selection?
      $endgroup$
      – LUSAQX
      Jan 4 '17 at 20:19










    • $begingroup$
      Random forest is a collection of trees that cope well with dependent variable. That is since in each node in the tree, the dataset is conditioned by all the variables above it. The problem is that the tree growing is heuristic so the choice of the above variable might have not been optimal.
      $endgroup$
      – DaL
      Jan 5 '17 at 7:57










    • $begingroup$
      The correlation only compare pairs of variables and therefore cannot give you the full picture. On the other hand, you get the result in O(n^2) and not O(2^n)... The guidance is usually specific to the dataset so I find it hard to explain it this way. Some examples is the removal of variables that are redundant (very correlated to other variables). Examining the strength of the correlation might indicate whether you can use a small model or will be needed to use many weak learners. Identifying subset that are not too correlated with each other might indicate the co-training will be useful.
      $endgroup$
      – DaL
      Jan 5 '17 at 8:03
















    $begingroup$
    Thanks a lot. But for a non-linear classifier like random forest, does it also require the predictor variables independent from each other? I guess not but could be wrong. How the correlation guide the feature selection?
    $endgroup$
    – LUSAQX
    Jan 4 '17 at 20:19




    $begingroup$
    Thanks a lot. But for a non-linear classifier like random forest, does it also require the predictor variables independent from each other? I guess not but could be wrong. How the correlation guide the feature selection?
    $endgroup$
    – LUSAQX
    Jan 4 '17 at 20:19












    $begingroup$
    Random forest is a collection of trees that cope well with dependent variable. That is since in each node in the tree, the dataset is conditioned by all the variables above it. The problem is that the tree growing is heuristic so the choice of the above variable might have not been optimal.
    $endgroup$
    – DaL
    Jan 5 '17 at 7:57




    $begingroup$
    Random forest is a collection of trees that cope well with dependent variable. That is since in each node in the tree, the dataset is conditioned by all the variables above it. The problem is that the tree growing is heuristic so the choice of the above variable might have not been optimal.
    $endgroup$
    – DaL
    Jan 5 '17 at 7:57












    $begingroup$
    The correlation only compare pairs of variables and therefore cannot give you the full picture. On the other hand, you get the result in O(n^2) and not O(2^n)... The guidance is usually specific to the dataset so I find it hard to explain it this way. Some examples is the removal of variables that are redundant (very correlated to other variables). Examining the strength of the correlation might indicate whether you can use a small model or will be needed to use many weak learners. Identifying subset that are not too correlated with each other might indicate the co-training will be useful.
    $endgroup$
    – DaL
    Jan 5 '17 at 8:03




    $begingroup$
    The correlation only compare pairs of variables and therefore cannot give you the full picture. On the other hand, you get the result in O(n^2) and not O(2^n)... The guidance is usually specific to the dataset so I find it hard to explain it this way. Some examples is the removal of variables that are redundant (very correlated to other variables). Examining the strength of the correlation might indicate whether you can use a small model or will be needed to use many weak learners. Identifying subset that are not too correlated with each other might indicate the co-training will be useful.
    $endgroup$
    – DaL
    Jan 5 '17 at 8:03











    7












    $begingroup$

    I've posted a very similar question on Cross Validated few months ago and got a very large number of responses. Read the responses and the comments.



    https://stats.stackexchange.com/questions/215154/variable-selection-for-predictive-modeling-really-needed-in-2016






    share|improve this answer











    $endgroup$













    • $begingroup$
      Great question!
      $endgroup$
      – Aaron
      Oct 12 '17 at 18:29
















    7












    $begingroup$

    I've posted a very similar question on Cross Validated few months ago and got a very large number of responses. Read the responses and the comments.



    https://stats.stackexchange.com/questions/215154/variable-selection-for-predictive-modeling-really-needed-in-2016






    share|improve this answer











    $endgroup$













    • $begingroup$
      Great question!
      $endgroup$
      – Aaron
      Oct 12 '17 at 18:29














    7












    7








    7





    $begingroup$

    I've posted a very similar question on Cross Validated few months ago and got a very large number of responses. Read the responses and the comments.



    https://stats.stackexchange.com/questions/215154/variable-selection-for-predictive-modeling-really-needed-in-2016






    share|improve this answer











    $endgroup$



    I've posted a very similar question on Cross Validated few months ago and got a very large number of responses. Read the responses and the comments.



    https://stats.stackexchange.com/questions/215154/variable-selection-for-predictive-modeling-really-needed-in-2016







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Apr 13 '17 at 12:44









    Community

    1




    1










    answered Jan 4 '17 at 21:34









    horaceThoraceT

    860511




    860511












    • $begingroup$
      Great question!
      $endgroup$
      – Aaron
      Oct 12 '17 at 18:29


















    • $begingroup$
      Great question!
      $endgroup$
      – Aaron
      Oct 12 '17 at 18:29
















    $begingroup$
    Great question!
    $endgroup$
    – Aaron
    Oct 12 '17 at 18:29




    $begingroup$
    Great question!
    $endgroup$
    – Aaron
    Oct 12 '17 at 18:29











    0












    $begingroup$

    Yes, feature selection is one of the most crucial task for machine learning problems, after performing data wrangling and cleaning.
    you can find the functions implementing the feature selection process using XGBOOST feature importance here.



    https://github.com/abhisheksharma4194/Machine-learning





    share








    New contributor




    Abhishek Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$


















      0












      $begingroup$

      Yes, feature selection is one of the most crucial task for machine learning problems, after performing data wrangling and cleaning.
      you can find the functions implementing the feature selection process using XGBOOST feature importance here.



      https://github.com/abhisheksharma4194/Machine-learning





      share








      New contributor




      Abhishek Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$
















        0












        0








        0





        $begingroup$

        Yes, feature selection is one of the most crucial task for machine learning problems, after performing data wrangling and cleaning.
        you can find the functions implementing the feature selection process using XGBOOST feature importance here.



        https://github.com/abhisheksharma4194/Machine-learning





        share








        New contributor




        Abhishek Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        $endgroup$



        Yes, feature selection is one of the most crucial task for machine learning problems, after performing data wrangling and cleaning.
        you can find the functions implementing the feature selection process using XGBOOST feature importance here.



        https://github.com/abhisheksharma4194/Machine-learning






        share








        New contributor




        Abhishek Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.








        share


        share






        New contributor




        Abhishek Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        answered 7 mins ago









        Abhishek SharmaAbhishek Sharma

        1




        1




        New contributor




        Abhishek Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.





        New contributor





        Abhishek Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        Abhishek Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f16062%2fis-feature-selection-necessary%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Ponta tanko

            Tantalo (mitologio)

            Erzsébet Schaár