fix first two levels of decision tree?












3












$begingroup$


I am trying to build a regression tree with 70 attributes where the business team wants to fix the first two levels namely country and product type.To achieve this,I have two proposals:



1.Build a separate tree for each combination of country & product type and use subsets of the data accordingly and pass on to respective tree for prediction.Saw here in comments.I have 88 levels in country and 3 levels in product type so it will generate 264 trees.



2.Build a basic tree with two variables namely country and product type with appropriate cp value to generate all combination as leaf nodes(264).Build a second tree with rest all variables and stack tree one upon tree two as a single decision tree.



I don't think the first one is the right way to do.Also, struck on how to stack the trees in second approach, even if it is not the right way would love to know how to achieve this.



Please guide me to approach the problem.Thanks.










share|improve this question











$endgroup$




bumped to the homepage by Community 12 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 4




    $begingroup$
    Why do you not like the first method?
    $endgroup$
    – Hobbes
    Nov 1 '16 at 14:58










  • $begingroup$
    @Hobbes It will be hard to monitor and tune the performance of each tree.
    $endgroup$
    – Aravind
    Nov 2 '16 at 0:46






  • 1




    $begingroup$
    What is the business problem? I had a similar case. We wanted the best set of prospects to target for each country/product group. The business felt that prospects in say South Africa for product A are very different from prospects in South Korea for product B. I could argue the merits of different marketing campaigns/messages/etc but that is the business's decision. I did not look at it as fixing the first 2 levels of the tree or any unnatural adjustments to an algorithm. I looked at it as how to find the best set of prospects for each country/product combination. Where I did not have enough d
    $endgroup$
    – Craig
    Mar 3 '17 at 10:09










  • $begingroup$
    @Aravind If you are worried about the tuning of each tree in Approach 1 then I would caution you that you might not be on the right track. Your decision to, essentially, hard-code the first two levels should be based on some business rules. If your intent is to keep the algorithm fixed then, are you really writing an algorithm? Are you not introducing a form of bias into your overall model? I would only be comfortable in proceeding if these choices were hard-coded and would rarely change. Otherwise you need to push back on the business and make them aware of the potential bias.
    $endgroup$
    – I_Play_With_Data
    Oct 25 '18 at 18:02


















3












$begingroup$


I am trying to build a regression tree with 70 attributes where the business team wants to fix the first two levels namely country and product type.To achieve this,I have two proposals:



1.Build a separate tree for each combination of country & product type and use subsets of the data accordingly and pass on to respective tree for prediction.Saw here in comments.I have 88 levels in country and 3 levels in product type so it will generate 264 trees.



2.Build a basic tree with two variables namely country and product type with appropriate cp value to generate all combination as leaf nodes(264).Build a second tree with rest all variables and stack tree one upon tree two as a single decision tree.



I don't think the first one is the right way to do.Also, struck on how to stack the trees in second approach, even if it is not the right way would love to know how to achieve this.



Please guide me to approach the problem.Thanks.










share|improve this question











$endgroup$




bumped to the homepage by Community 12 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 4




    $begingroup$
    Why do you not like the first method?
    $endgroup$
    – Hobbes
    Nov 1 '16 at 14:58










  • $begingroup$
    @Hobbes It will be hard to monitor and tune the performance of each tree.
    $endgroup$
    – Aravind
    Nov 2 '16 at 0:46






  • 1




    $begingroup$
    What is the business problem? I had a similar case. We wanted the best set of prospects to target for each country/product group. The business felt that prospects in say South Africa for product A are very different from prospects in South Korea for product B. I could argue the merits of different marketing campaigns/messages/etc but that is the business's decision. I did not look at it as fixing the first 2 levels of the tree or any unnatural adjustments to an algorithm. I looked at it as how to find the best set of prospects for each country/product combination. Where I did not have enough d
    $endgroup$
    – Craig
    Mar 3 '17 at 10:09










  • $begingroup$
    @Aravind If you are worried about the tuning of each tree in Approach 1 then I would caution you that you might not be on the right track. Your decision to, essentially, hard-code the first two levels should be based on some business rules. If your intent is to keep the algorithm fixed then, are you really writing an algorithm? Are you not introducing a form of bias into your overall model? I would only be comfortable in proceeding if these choices were hard-coded and would rarely change. Otherwise you need to push back on the business and make them aware of the potential bias.
    $endgroup$
    – I_Play_With_Data
    Oct 25 '18 at 18:02
















3












3








3





$begingroup$


I am trying to build a regression tree with 70 attributes where the business team wants to fix the first two levels namely country and product type.To achieve this,I have two proposals:



1.Build a separate tree for each combination of country & product type and use subsets of the data accordingly and pass on to respective tree for prediction.Saw here in comments.I have 88 levels in country and 3 levels in product type so it will generate 264 trees.



2.Build a basic tree with two variables namely country and product type with appropriate cp value to generate all combination as leaf nodes(264).Build a second tree with rest all variables and stack tree one upon tree two as a single decision tree.



I don't think the first one is the right way to do.Also, struck on how to stack the trees in second approach, even if it is not the right way would love to know how to achieve this.



Please guide me to approach the problem.Thanks.










share|improve this question











$endgroup$




I am trying to build a regression tree with 70 attributes where the business team wants to fix the first two levels namely country and product type.To achieve this,I have two proposals:



1.Build a separate tree for each combination of country & product type and use subsets of the data accordingly and pass on to respective tree for prediction.Saw here in comments.I have 88 levels in country and 3 levels in product type so it will generate 264 trees.



2.Build a basic tree with two variables namely country and product type with appropriate cp value to generate all combination as leaf nodes(264).Build a second tree with rest all variables and stack tree one upon tree two as a single decision tree.



I don't think the first one is the right way to do.Also, struck on how to stack the trees in second approach, even if it is not the right way would love to know how to achieve this.



Please guide me to approach the problem.Thanks.







machine-learning r predictive-modeling decision-trees






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 23 '17 at 12:38









Community

1




1










asked Nov 1 '16 at 12:03









AravindAravind

162




162





bumped to the homepage by Community 12 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 12 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 4




    $begingroup$
    Why do you not like the first method?
    $endgroup$
    – Hobbes
    Nov 1 '16 at 14:58










  • $begingroup$
    @Hobbes It will be hard to monitor and tune the performance of each tree.
    $endgroup$
    – Aravind
    Nov 2 '16 at 0:46






  • 1




    $begingroup$
    What is the business problem? I had a similar case. We wanted the best set of prospects to target for each country/product group. The business felt that prospects in say South Africa for product A are very different from prospects in South Korea for product B. I could argue the merits of different marketing campaigns/messages/etc but that is the business's decision. I did not look at it as fixing the first 2 levels of the tree or any unnatural adjustments to an algorithm. I looked at it as how to find the best set of prospects for each country/product combination. Where I did not have enough d
    $endgroup$
    – Craig
    Mar 3 '17 at 10:09










  • $begingroup$
    @Aravind If you are worried about the tuning of each tree in Approach 1 then I would caution you that you might not be on the right track. Your decision to, essentially, hard-code the first two levels should be based on some business rules. If your intent is to keep the algorithm fixed then, are you really writing an algorithm? Are you not introducing a form of bias into your overall model? I would only be comfortable in proceeding if these choices were hard-coded and would rarely change. Otherwise you need to push back on the business and make them aware of the potential bias.
    $endgroup$
    – I_Play_With_Data
    Oct 25 '18 at 18:02
















  • 4




    $begingroup$
    Why do you not like the first method?
    $endgroup$
    – Hobbes
    Nov 1 '16 at 14:58










  • $begingroup$
    @Hobbes It will be hard to monitor and tune the performance of each tree.
    $endgroup$
    – Aravind
    Nov 2 '16 at 0:46






  • 1




    $begingroup$
    What is the business problem? I had a similar case. We wanted the best set of prospects to target for each country/product group. The business felt that prospects in say South Africa for product A are very different from prospects in South Korea for product B. I could argue the merits of different marketing campaigns/messages/etc but that is the business's decision. I did not look at it as fixing the first 2 levels of the tree or any unnatural adjustments to an algorithm. I looked at it as how to find the best set of prospects for each country/product combination. Where I did not have enough d
    $endgroup$
    – Craig
    Mar 3 '17 at 10:09










  • $begingroup$
    @Aravind If you are worried about the tuning of each tree in Approach 1 then I would caution you that you might not be on the right track. Your decision to, essentially, hard-code the first two levels should be based on some business rules. If your intent is to keep the algorithm fixed then, are you really writing an algorithm? Are you not introducing a form of bias into your overall model? I would only be comfortable in proceeding if these choices were hard-coded and would rarely change. Otherwise you need to push back on the business and make them aware of the potential bias.
    $endgroup$
    – I_Play_With_Data
    Oct 25 '18 at 18:02










4




4




$begingroup$
Why do you not like the first method?
$endgroup$
– Hobbes
Nov 1 '16 at 14:58




$begingroup$
Why do you not like the first method?
$endgroup$
– Hobbes
Nov 1 '16 at 14:58












$begingroup$
@Hobbes It will be hard to monitor and tune the performance of each tree.
$endgroup$
– Aravind
Nov 2 '16 at 0:46




$begingroup$
@Hobbes It will be hard to monitor and tune the performance of each tree.
$endgroup$
– Aravind
Nov 2 '16 at 0:46




1




1




$begingroup$
What is the business problem? I had a similar case. We wanted the best set of prospects to target for each country/product group. The business felt that prospects in say South Africa for product A are very different from prospects in South Korea for product B. I could argue the merits of different marketing campaigns/messages/etc but that is the business's decision. I did not look at it as fixing the first 2 levels of the tree or any unnatural adjustments to an algorithm. I looked at it as how to find the best set of prospects for each country/product combination. Where I did not have enough d
$endgroup$
– Craig
Mar 3 '17 at 10:09




$begingroup$
What is the business problem? I had a similar case. We wanted the best set of prospects to target for each country/product group. The business felt that prospects in say South Africa for product A are very different from prospects in South Korea for product B. I could argue the merits of different marketing campaigns/messages/etc but that is the business's decision. I did not look at it as fixing the first 2 levels of the tree or any unnatural adjustments to an algorithm. I looked at it as how to find the best set of prospects for each country/product combination. Where I did not have enough d
$endgroup$
– Craig
Mar 3 '17 at 10:09












$begingroup$
@Aravind If you are worried about the tuning of each tree in Approach 1 then I would caution you that you might not be on the right track. Your decision to, essentially, hard-code the first two levels should be based on some business rules. If your intent is to keep the algorithm fixed then, are you really writing an algorithm? Are you not introducing a form of bias into your overall model? I would only be comfortable in proceeding if these choices were hard-coded and would rarely change. Otherwise you need to push back on the business and make them aware of the potential bias.
$endgroup$
– I_Play_With_Data
Oct 25 '18 at 18:02






$begingroup$
@Aravind If you are worried about the tuning of each tree in Approach 1 then I would caution you that you might not be on the right track. Your decision to, essentially, hard-code the first two levels should be based on some business rules. If your intent is to keep the algorithm fixed then, are you really writing an algorithm? Are you not introducing a form of bias into your overall model? I would only be comfortable in proceeding if these choices were hard-coded and would rarely change. Otherwise you need to push back on the business and make them aware of the potential bias.
$endgroup$
– I_Play_With_Data
Oct 25 '18 at 18:02












2 Answers
2






active

oldest

votes


















0












$begingroup$

Depending which tree algorithm you want to use you could manually construct the two first levels of the tree. You can just follow the pseudo code explained for example here for the C4.5 tree. Once you have done this you can remove the two features from the data set and create trees for the remaining part of the tree. If you want to create a rpart object you would be required to take some parts of the source and this may be a bit more demanding. Depending on what tree algorithm you use you will just have a binary split at both levels so you will only need to build 4 separate trees and not 264. Note that you may not have the optimal decision tree since after stepping through the first two levels, the country and product type may still be variables that cause a split. But without seeing the data is impossible to tell.



Side note, it may be valuable to explain the business that country and product type are not the most sensible variables to have in the top of the decision tree. Sometimes it is better to educate the end users than to force machine learning to do something inaccurate. In my experience end users prefer to have a correct solution than a solution that works because people have a gut feeling that it should be in a certain way.






share|improve this answer









$endgroup$













  • $begingroup$
    I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
    $endgroup$
    – Aravind
    Nov 3 '16 at 1:05












  • $begingroup$
    What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
    $endgroup$
    – Stereo
    Nov 3 '16 at 10:12



















0












$begingroup$

I think you could do this fairly automatically if you're open to using Python. A library called auto_ml* has a feature called categorical ensembling, where you can explicitly say "I want a model built for each level of this feature". If you made a feature that was country-product type and used that as your category, the rest should be pretty easy.



*Disclosure: I've made minor contributions to auto_ml. It is FOSS under the MIT license.






share|improve this answer









$endgroup$














    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "557"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f14864%2ffix-first-two-levels-of-decision-tree%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    Depending which tree algorithm you want to use you could manually construct the two first levels of the tree. You can just follow the pseudo code explained for example here for the C4.5 tree. Once you have done this you can remove the two features from the data set and create trees for the remaining part of the tree. If you want to create a rpart object you would be required to take some parts of the source and this may be a bit more demanding. Depending on what tree algorithm you use you will just have a binary split at both levels so you will only need to build 4 separate trees and not 264. Note that you may not have the optimal decision tree since after stepping through the first two levels, the country and product type may still be variables that cause a split. But without seeing the data is impossible to tell.



    Side note, it may be valuable to explain the business that country and product type are not the most sensible variables to have in the top of the decision tree. Sometimes it is better to educate the end users than to force machine learning to do something inaccurate. In my experience end users prefer to have a correct solution than a solution that works because people have a gut feeling that it should be in a certain way.






    share|improve this answer









    $endgroup$













    • $begingroup$
      I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
      $endgroup$
      – Aravind
      Nov 3 '16 at 1:05












    • $begingroup$
      What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
      $endgroup$
      – Stereo
      Nov 3 '16 at 10:12
















    0












    $begingroup$

    Depending which tree algorithm you want to use you could manually construct the two first levels of the tree. You can just follow the pseudo code explained for example here for the C4.5 tree. Once you have done this you can remove the two features from the data set and create trees for the remaining part of the tree. If you want to create a rpart object you would be required to take some parts of the source and this may be a bit more demanding. Depending on what tree algorithm you use you will just have a binary split at both levels so you will only need to build 4 separate trees and not 264. Note that you may not have the optimal decision tree since after stepping through the first two levels, the country and product type may still be variables that cause a split. But without seeing the data is impossible to tell.



    Side note, it may be valuable to explain the business that country and product type are not the most sensible variables to have in the top of the decision tree. Sometimes it is better to educate the end users than to force machine learning to do something inaccurate. In my experience end users prefer to have a correct solution than a solution that works because people have a gut feeling that it should be in a certain way.






    share|improve this answer









    $endgroup$













    • $begingroup$
      I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
      $endgroup$
      – Aravind
      Nov 3 '16 at 1:05












    • $begingroup$
      What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
      $endgroup$
      – Stereo
      Nov 3 '16 at 10:12














    0












    0








    0





    $begingroup$

    Depending which tree algorithm you want to use you could manually construct the two first levels of the tree. You can just follow the pseudo code explained for example here for the C4.5 tree. Once you have done this you can remove the two features from the data set and create trees for the remaining part of the tree. If you want to create a rpart object you would be required to take some parts of the source and this may be a bit more demanding. Depending on what tree algorithm you use you will just have a binary split at both levels so you will only need to build 4 separate trees and not 264. Note that you may not have the optimal decision tree since after stepping through the first two levels, the country and product type may still be variables that cause a split. But without seeing the data is impossible to tell.



    Side note, it may be valuable to explain the business that country and product type are not the most sensible variables to have in the top of the decision tree. Sometimes it is better to educate the end users than to force machine learning to do something inaccurate. In my experience end users prefer to have a correct solution than a solution that works because people have a gut feeling that it should be in a certain way.






    share|improve this answer









    $endgroup$



    Depending which tree algorithm you want to use you could manually construct the two first levels of the tree. You can just follow the pseudo code explained for example here for the C4.5 tree. Once you have done this you can remove the two features from the data set and create trees for the remaining part of the tree. If you want to create a rpart object you would be required to take some parts of the source and this may be a bit more demanding. Depending on what tree algorithm you use you will just have a binary split at both levels so you will only need to build 4 separate trees and not 264. Note that you may not have the optimal decision tree since after stepping through the first two levels, the country and product type may still be variables that cause a split. But without seeing the data is impossible to tell.



    Side note, it may be valuable to explain the business that country and product type are not the most sensible variables to have in the top of the decision tree. Sometimes it is better to educate the end users than to force machine learning to do something inaccurate. In my experience end users prefer to have a correct solution than a solution that works because people have a gut feeling that it should be in a certain way.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 2 '16 at 11:47









    StereoStereo

    1,303423




    1,303423












    • $begingroup$
      I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
      $endgroup$
      – Aravind
      Nov 3 '16 at 1:05












    • $begingroup$
      What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
      $endgroup$
      – Stereo
      Nov 3 '16 at 10:12


















    • $begingroup$
      I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
      $endgroup$
      – Aravind
      Nov 3 '16 at 1:05












    • $begingroup$
      What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
      $endgroup$
      – Stereo
      Nov 3 '16 at 10:12
















    $begingroup$
    I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
    $endgroup$
    – Aravind
    Nov 3 '16 at 1:05






    $begingroup$
    I have 88 levels in country and 3 levels in product type so it will be 264 trees.if there are only 4 separate trees then i will take the easy option namely the first choice.I feel it will be easier to convince end user when i have the results for both what they want and the correct way of solving the problem. Can you help me find reference material for stacking two trees after built completely?
    $endgroup$
    – Aravind
    Nov 3 '16 at 1:05














    $begingroup$
    What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
    $endgroup$
    – Stereo
    Nov 3 '16 at 10:12




    $begingroup$
    What you could do is calculate the entropy or gini for country, product type and the first element that gets selected by CHAID and C4.5. Educate users on these metrics. If that fails you can always go back. Additionally when you run a binary decision tree the first splits will lump countries or/and products together so at a minimum 4 subtrees.
    $endgroup$
    – Stereo
    Nov 3 '16 at 10:12











    0












    $begingroup$

    I think you could do this fairly automatically if you're open to using Python. A library called auto_ml* has a feature called categorical ensembling, where you can explicitly say "I want a model built for each level of this feature". If you made a feature that was country-product type and used that as your category, the rest should be pretty easy.



    *Disclosure: I've made minor contributions to auto_ml. It is FOSS under the MIT license.






    share|improve this answer









    $endgroup$


















      0












      $begingroup$

      I think you could do this fairly automatically if you're open to using Python. A library called auto_ml* has a feature called categorical ensembling, where you can explicitly say "I want a model built for each level of this feature". If you made a feature that was country-product type and used that as your category, the rest should be pretty easy.



      *Disclosure: I've made minor contributions to auto_ml. It is FOSS under the MIT license.






      share|improve this answer









      $endgroup$
















        0












        0








        0





        $begingroup$

        I think you could do this fairly automatically if you're open to using Python. A library called auto_ml* has a feature called categorical ensembling, where you can explicitly say "I want a model built for each level of this feature". If you made a feature that was country-product type and used that as your category, the rest should be pretty easy.



        *Disclosure: I've made minor contributions to auto_ml. It is FOSS under the MIT license.






        share|improve this answer









        $endgroup$



        I think you could do this fairly automatically if you're open to using Python. A library called auto_ml* has a feature called categorical ensembling, where you can explicitly say "I want a model built for each level of this feature". If you made a feature that was country-product type and used that as your category, the rest should be pretty easy.



        *Disclosure: I've made minor contributions to auto_ml. It is FOSS under the MIT license.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jun 1 '17 at 12:26









        CalZCalZ

        1,438213




        1,438213






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f14864%2ffix-first-two-levels-of-decision-tree%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Ponta tanko

            Tantalo (mitologio)

            Erzsébet Schaár