Including identifier in machine learning model as feature vs separate model for every identifier












0












$begingroup$


I am new to machine learning and i am building a model to predict number of customers for the model branch at specific hour/season/other feature.



I know it will be bad idea to pit id(branch_id in my case) into model but customer count in this case hugely depend on which branch it is so i cannot exclude it.



I can think of two solutions, i am not sure which one is right and what is the best practice.




  1. Create dummy variable(one hot encoding to avoid wieghing one id more than other) for all branch ids,but since i have 600 unique branch ids my features will go up-to 600+rest_of_features.

  2. Learn a separate model for each of the branch(600 models), i am not sure if it is right approach and also i am not very familiar with this approach and it will be very time consuming.


Looking for the suggestion



Example of the data is below



    +-----------+------+-----------+-----------+-------------------+
| branch_id | hour | feature_2 | feature_3 | Count of customer |
+-----------+------+-----------+-----------+-------------------+
| 1 | 12 | .. | .. | 19 |
| 1 | 01 | .. | .. | 25 |
| 2 | 23 | .. | .. | 14 |
| 2 | 01 | .. | .. | 5 |
+-----------+------+-----------+-----------+-------------------+









share|improve this question







New contributor




mashraf is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    0












    $begingroup$


    I am new to machine learning and i am building a model to predict number of customers for the model branch at specific hour/season/other feature.



    I know it will be bad idea to pit id(branch_id in my case) into model but customer count in this case hugely depend on which branch it is so i cannot exclude it.



    I can think of two solutions, i am not sure which one is right and what is the best practice.




    1. Create dummy variable(one hot encoding to avoid wieghing one id more than other) for all branch ids,but since i have 600 unique branch ids my features will go up-to 600+rest_of_features.

    2. Learn a separate model for each of the branch(600 models), i am not sure if it is right approach and also i am not very familiar with this approach and it will be very time consuming.


    Looking for the suggestion



    Example of the data is below



        +-----------+------+-----------+-----------+-------------------+
    | branch_id | hour | feature_2 | feature_3 | Count of customer |
    +-----------+------+-----------+-----------+-------------------+
    | 1 | 12 | .. | .. | 19 |
    | 1 | 01 | .. | .. | 25 |
    | 2 | 23 | .. | .. | 14 |
    | 2 | 01 | .. | .. | 5 |
    +-----------+------+-----------+-----------+-------------------+









    share|improve this question







    New contributor




    mashraf is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      0












      0








      0





      $begingroup$


      I am new to machine learning and i am building a model to predict number of customers for the model branch at specific hour/season/other feature.



      I know it will be bad idea to pit id(branch_id in my case) into model but customer count in this case hugely depend on which branch it is so i cannot exclude it.



      I can think of two solutions, i am not sure which one is right and what is the best practice.




      1. Create dummy variable(one hot encoding to avoid wieghing one id more than other) for all branch ids,but since i have 600 unique branch ids my features will go up-to 600+rest_of_features.

      2. Learn a separate model for each of the branch(600 models), i am not sure if it is right approach and also i am not very familiar with this approach and it will be very time consuming.


      Looking for the suggestion



      Example of the data is below



          +-----------+------+-----------+-----------+-------------------+
      | branch_id | hour | feature_2 | feature_3 | Count of customer |
      +-----------+------+-----------+-----------+-------------------+
      | 1 | 12 | .. | .. | 19 |
      | 1 | 01 | .. | .. | 25 |
      | 2 | 23 | .. | .. | 14 |
      | 2 | 01 | .. | .. | 5 |
      +-----------+------+-----------+-----------+-------------------+









      share|improve this question







      New contributor




      mashraf is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      I am new to machine learning and i am building a model to predict number of customers for the model branch at specific hour/season/other feature.



      I know it will be bad idea to pit id(branch_id in my case) into model but customer count in this case hugely depend on which branch it is so i cannot exclude it.



      I can think of two solutions, i am not sure which one is right and what is the best practice.




      1. Create dummy variable(one hot encoding to avoid wieghing one id more than other) for all branch ids,but since i have 600 unique branch ids my features will go up-to 600+rest_of_features.

      2. Learn a separate model for each of the branch(600 models), i am not sure if it is right approach and also i am not very familiar with this approach and it will be very time consuming.


      Looking for the suggestion



      Example of the data is below



          +-----------+------+-----------+-----------+-------------------+
      | branch_id | hour | feature_2 | feature_3 | Count of customer |
      +-----------+------+-----------+-----------+-------------------+
      | 1 | 12 | .. | .. | 19 |
      | 1 | 01 | .. | .. | 25 |
      | 2 | 23 | .. | .. | 14 |
      | 2 | 01 | .. | .. | 5 |
      +-----------+------+-----------+-----------+-------------------+






      machine-learning feature-selection






      share|improve this question







      New contributor




      mashraf is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      mashraf is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      mashraf is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 17 mins ago









      mashrafmashraf

      1




      1




      New contributor




      mashraf is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      mashraf is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      mashraf is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          In my opinion including id as a feature will not make sense at all, because the model will treat the id as a numeric value which will decrease the model performance, because there should be no connection how big the id is and how many customers there are for that id.



          Option 2 can make sense if you have enough data for every branch.



          My suggestion will be to look deep into your features and try to find a feature which will replace the branch id. Let's say the number of supporting desks in a branch or the location of a branch as a categorical value. If you find enough features that can describe the specifics of branches, then no need to include ids or to do it separately.





          share








          New contributor




          Karen Danielyan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$













            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "557"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });






            mashraf is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47388%2fincluding-identifier-in-machine-learning-model-as-feature-vs-separate-model-for%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0












            $begingroup$

            In my opinion including id as a feature will not make sense at all, because the model will treat the id as a numeric value which will decrease the model performance, because there should be no connection how big the id is and how many customers there are for that id.



            Option 2 can make sense if you have enough data for every branch.



            My suggestion will be to look deep into your features and try to find a feature which will replace the branch id. Let's say the number of supporting desks in a branch or the location of a branch as a categorical value. If you find enough features that can describe the specifics of branches, then no need to include ids or to do it separately.





            share








            New contributor




            Karen Danielyan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.






            $endgroup$


















              0












              $begingroup$

              In my opinion including id as a feature will not make sense at all, because the model will treat the id as a numeric value which will decrease the model performance, because there should be no connection how big the id is and how many customers there are for that id.



              Option 2 can make sense if you have enough data for every branch.



              My suggestion will be to look deep into your features and try to find a feature which will replace the branch id. Let's say the number of supporting desks in a branch or the location of a branch as a categorical value. If you find enough features that can describe the specifics of branches, then no need to include ids or to do it separately.





              share








              New contributor




              Karen Danielyan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              $endgroup$
















                0












                0








                0





                $begingroup$

                In my opinion including id as a feature will not make sense at all, because the model will treat the id as a numeric value which will decrease the model performance, because there should be no connection how big the id is and how many customers there are for that id.



                Option 2 can make sense if you have enough data for every branch.



                My suggestion will be to look deep into your features and try to find a feature which will replace the branch id. Let's say the number of supporting desks in a branch or the location of a branch as a categorical value. If you find enough features that can describe the specifics of branches, then no need to include ids or to do it separately.





                share








                New contributor




                Karen Danielyan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                $endgroup$



                In my opinion including id as a feature will not make sense at all, because the model will treat the id as a numeric value which will decrease the model performance, because there should be no connection how big the id is and how many customers there are for that id.



                Option 2 can make sense if you have enough data for every branch.



                My suggestion will be to look deep into your features and try to find a feature which will replace the branch id. Let's say the number of supporting desks in a branch or the location of a branch as a categorical value. If you find enough features that can describe the specifics of branches, then no need to include ids or to do it separately.






                share








                New contributor




                Karen Danielyan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.








                share


                share






                New contributor




                Karen Danielyan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                answered 2 mins ago









                Karen DanielyanKaren Danielyan

                1




                1




                New contributor




                Karen Danielyan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.





                New contributor





                Karen Danielyan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                Karen Danielyan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






















                    mashraf is a new contributor. Be nice, and check out our Code of Conduct.










                    draft saved

                    draft discarded


















                    mashraf is a new contributor. Be nice, and check out our Code of Conduct.













                    mashraf is a new contributor. Be nice, and check out our Code of Conduct.












                    mashraf is a new contributor. Be nice, and check out our Code of Conduct.
















                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47388%2fincluding-identifier-in-machine-learning-model-as-feature-vs-separate-model-for%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Ponta tanko

                    Tantalo (mitologio)

                    Erzsébet Schaár