Predicting contract churn/cancellation: Great model results does not work in the real world












5












$begingroup$


I'm busy with a supervised machine learning problem where I am predicting contract cancellation. Although a lengthy question, I do hope someone will take the time as I'm convinced it will help others out there (I've just been unable to find ANY solutions that have helped me)



I have the following two datasets:



1) "Modelling Dataset"



Contains about 400k contracts (rows) with 300 features and a single label (0 = "Not Cancelled", 1 = "Cancelled").



Each row represents a single contract, and each contract is only represented once in the data. There are 350k "Not Cancelled" and 50k "Cancelled" cases.



Features are all extracted as at a specific date for each contract. This date is referred to as the "Effective Date". For "Cancelled" contracts, the "Effective Date" is the date of cancellation. For "Not Cancelled" contracts, the "Effective Date" is a date say 6 months ago. This will be explained in a moment.



2) "Live Dataset"



Contains 300k contracts (rows) with the same list of 300 features. All these contracts are "Not Cancelled" of course, as we want to predict which of them will cancel. These contracts were followed for a period of 2 months, and I then added a Label to this data to indicate whether it actually ended up cancelling in those two months: 0 = "Not Cancelled", 1 = "Cancelled"



The problem:



I get amazing results on the "Modelling Dataset" (random train/test split) (eg Precision 95%, AUC 0.98), but as soon as that model is applied to the "Live Dataset", it performs poorly (cannot predict well which contracts ends up cancelling) (eg Precision 50%, AUC 0.7).



On the Modelling Dataset, the results are great, almost irrespective of model or data preparation. I test a number of models (E.g. SkLearn random forest, Keras neural network, Microsoft GbmLight, SkLearn Recursive feature elimination). Even with default settings, the models generally perform well. I've standardized features. I've binned features to attempt improving how well it will generalize. Nothing has help it generalize to the "Live Dataset"



My suspicion:



In my mind, this is not an over-training issue because I've got a test set within the "Modelling Dataset" and those results are great on the test set. It is not a modelling or even a hyper-parameter optimization issue, as the results are already great.



I've also investigated whether there are significant differences in the profile of the features between the two datasets by looking at histograms, feature-by-feature. Nothing is worryingly different.



I suspect the issue lies therein that the same contracts that are marked as "Not Cancelled" in the "Modelling Dataset", which the model trains to recognize "Not Cancelled" of course, is basically the exact same contracts in the "Live Dataset", except that 6 months have now passed.



I suspect that the features for the "Not Cancelled" cases has not changed enough to now make the model recognize some of them as about to be "Cancelled". In other words, the contracts have not moved enough in the feature space.



My questions:



Firstly, does my suspicion sound correct?



Secondly, if I've stated the problem to be solved incorrectly, how would I then set up the problem-statement if the purpose is to predict cancellation of something like contracts (when the data on which you train will almost certainly contain the data one which you want to predict)?



For the record, the problem-statement I've used here is similar to the way others have done this. And they reported great results. But I'm not sure that the models were ever tested in real live. In other cases, the problem to be solved was lightly different, e.g. hotel booking cancellations, which is different because there a stream of new incoming bookings and booking duration is relatively short, so no bookings in common between the modelling and live dataset. Contracts on the other hand have long duration and can cancel at any time, and sometimes never.










share|improve this question









$endgroup$












  • $begingroup$
    Another theory suggests that it is not correct to only allow only a single sample for each contract and one should create a sample for each past time-period (e.g. month) of each contract. E.g., if a contract was not cancelled and it is 12 months old, we should have 12 samples, each representing the features as at each of those 12 months respectively and each with label = 0. On the other hand if a contract was cancelled now and is 12 months old, we should have 11 samples where the contract was not cancelled (label = 0) and 1 (the most recent) where it was cancelled (label = 1).
    $endgroup$
    – Ernst Dinkelmann
    Jun 15 '17 at 9:58










  • $begingroup$
    I am having exactly the same problem and I have found no solution yet. One suspicion I have is that we are dealing with some type of unbalanced class problem. I my case, I am training the model where churners are 60% of total sample because built from 6 years worth of data, but the "live dataset" only has 6% churners because it collects data from only 2 months into the future. So, the model learns from a population which is not realistic, and when applied to the true population, it fails. You might be in the same situation.
    $endgroup$
    – VinceP
    Nov 24 '17 at 15:48










  • $begingroup$
    I´m building a similar model to that. Do you have any updates on the matter? Could we maybe share some insights?
    $endgroup$
    – Luciano Almeida Filho
    Sep 3 '18 at 19:04










  • $begingroup$
    @LucianoAlmeidaFilho Do you still need insights? I missed the question, sorry.
    $endgroup$
    – Ernst Dinkelmann
    16 mins ago
















5












$begingroup$


I'm busy with a supervised machine learning problem where I am predicting contract cancellation. Although a lengthy question, I do hope someone will take the time as I'm convinced it will help others out there (I've just been unable to find ANY solutions that have helped me)



I have the following two datasets:



1) "Modelling Dataset"



Contains about 400k contracts (rows) with 300 features and a single label (0 = "Not Cancelled", 1 = "Cancelled").



Each row represents a single contract, and each contract is only represented once in the data. There are 350k "Not Cancelled" and 50k "Cancelled" cases.



Features are all extracted as at a specific date for each contract. This date is referred to as the "Effective Date". For "Cancelled" contracts, the "Effective Date" is the date of cancellation. For "Not Cancelled" contracts, the "Effective Date" is a date say 6 months ago. This will be explained in a moment.



2) "Live Dataset"



Contains 300k contracts (rows) with the same list of 300 features. All these contracts are "Not Cancelled" of course, as we want to predict which of them will cancel. These contracts were followed for a period of 2 months, and I then added a Label to this data to indicate whether it actually ended up cancelling in those two months: 0 = "Not Cancelled", 1 = "Cancelled"



The problem:



I get amazing results on the "Modelling Dataset" (random train/test split) (eg Precision 95%, AUC 0.98), but as soon as that model is applied to the "Live Dataset", it performs poorly (cannot predict well which contracts ends up cancelling) (eg Precision 50%, AUC 0.7).



On the Modelling Dataset, the results are great, almost irrespective of model or data preparation. I test a number of models (E.g. SkLearn random forest, Keras neural network, Microsoft GbmLight, SkLearn Recursive feature elimination). Even with default settings, the models generally perform well. I've standardized features. I've binned features to attempt improving how well it will generalize. Nothing has help it generalize to the "Live Dataset"



My suspicion:



In my mind, this is not an over-training issue because I've got a test set within the "Modelling Dataset" and those results are great on the test set. It is not a modelling or even a hyper-parameter optimization issue, as the results are already great.



I've also investigated whether there are significant differences in the profile of the features between the two datasets by looking at histograms, feature-by-feature. Nothing is worryingly different.



I suspect the issue lies therein that the same contracts that are marked as "Not Cancelled" in the "Modelling Dataset", which the model trains to recognize "Not Cancelled" of course, is basically the exact same contracts in the "Live Dataset", except that 6 months have now passed.



I suspect that the features for the "Not Cancelled" cases has not changed enough to now make the model recognize some of them as about to be "Cancelled". In other words, the contracts have not moved enough in the feature space.



My questions:



Firstly, does my suspicion sound correct?



Secondly, if I've stated the problem to be solved incorrectly, how would I then set up the problem-statement if the purpose is to predict cancellation of something like contracts (when the data on which you train will almost certainly contain the data one which you want to predict)?



For the record, the problem-statement I've used here is similar to the way others have done this. And they reported great results. But I'm not sure that the models were ever tested in real live. In other cases, the problem to be solved was lightly different, e.g. hotel booking cancellations, which is different because there a stream of new incoming bookings and booking duration is relatively short, so no bookings in common between the modelling and live dataset. Contracts on the other hand have long duration and can cancel at any time, and sometimes never.










share|improve this question









$endgroup$












  • $begingroup$
    Another theory suggests that it is not correct to only allow only a single sample for each contract and one should create a sample for each past time-period (e.g. month) of each contract. E.g., if a contract was not cancelled and it is 12 months old, we should have 12 samples, each representing the features as at each of those 12 months respectively and each with label = 0. On the other hand if a contract was cancelled now and is 12 months old, we should have 11 samples where the contract was not cancelled (label = 0) and 1 (the most recent) where it was cancelled (label = 1).
    $endgroup$
    – Ernst Dinkelmann
    Jun 15 '17 at 9:58










  • $begingroup$
    I am having exactly the same problem and I have found no solution yet. One suspicion I have is that we are dealing with some type of unbalanced class problem. I my case, I am training the model where churners are 60% of total sample because built from 6 years worth of data, but the "live dataset" only has 6% churners because it collects data from only 2 months into the future. So, the model learns from a population which is not realistic, and when applied to the true population, it fails. You might be in the same situation.
    $endgroup$
    – VinceP
    Nov 24 '17 at 15:48










  • $begingroup$
    I´m building a similar model to that. Do you have any updates on the matter? Could we maybe share some insights?
    $endgroup$
    – Luciano Almeida Filho
    Sep 3 '18 at 19:04










  • $begingroup$
    @LucianoAlmeidaFilho Do you still need insights? I missed the question, sorry.
    $endgroup$
    – Ernst Dinkelmann
    16 mins ago














5












5








5


2



$begingroup$


I'm busy with a supervised machine learning problem where I am predicting contract cancellation. Although a lengthy question, I do hope someone will take the time as I'm convinced it will help others out there (I've just been unable to find ANY solutions that have helped me)



I have the following two datasets:



1) "Modelling Dataset"



Contains about 400k contracts (rows) with 300 features and a single label (0 = "Not Cancelled", 1 = "Cancelled").



Each row represents a single contract, and each contract is only represented once in the data. There are 350k "Not Cancelled" and 50k "Cancelled" cases.



Features are all extracted as at a specific date for each contract. This date is referred to as the "Effective Date". For "Cancelled" contracts, the "Effective Date" is the date of cancellation. For "Not Cancelled" contracts, the "Effective Date" is a date say 6 months ago. This will be explained in a moment.



2) "Live Dataset"



Contains 300k contracts (rows) with the same list of 300 features. All these contracts are "Not Cancelled" of course, as we want to predict which of them will cancel. These contracts were followed for a period of 2 months, and I then added a Label to this data to indicate whether it actually ended up cancelling in those two months: 0 = "Not Cancelled", 1 = "Cancelled"



The problem:



I get amazing results on the "Modelling Dataset" (random train/test split) (eg Precision 95%, AUC 0.98), but as soon as that model is applied to the "Live Dataset", it performs poorly (cannot predict well which contracts ends up cancelling) (eg Precision 50%, AUC 0.7).



On the Modelling Dataset, the results are great, almost irrespective of model or data preparation. I test a number of models (E.g. SkLearn random forest, Keras neural network, Microsoft GbmLight, SkLearn Recursive feature elimination). Even with default settings, the models generally perform well. I've standardized features. I've binned features to attempt improving how well it will generalize. Nothing has help it generalize to the "Live Dataset"



My suspicion:



In my mind, this is not an over-training issue because I've got a test set within the "Modelling Dataset" and those results are great on the test set. It is not a modelling or even a hyper-parameter optimization issue, as the results are already great.



I've also investigated whether there are significant differences in the profile of the features between the two datasets by looking at histograms, feature-by-feature. Nothing is worryingly different.



I suspect the issue lies therein that the same contracts that are marked as "Not Cancelled" in the "Modelling Dataset", which the model trains to recognize "Not Cancelled" of course, is basically the exact same contracts in the "Live Dataset", except that 6 months have now passed.



I suspect that the features for the "Not Cancelled" cases has not changed enough to now make the model recognize some of them as about to be "Cancelled". In other words, the contracts have not moved enough in the feature space.



My questions:



Firstly, does my suspicion sound correct?



Secondly, if I've stated the problem to be solved incorrectly, how would I then set up the problem-statement if the purpose is to predict cancellation of something like contracts (when the data on which you train will almost certainly contain the data one which you want to predict)?



For the record, the problem-statement I've used here is similar to the way others have done this. And they reported great results. But I'm not sure that the models were ever tested in real live. In other cases, the problem to be solved was lightly different, e.g. hotel booking cancellations, which is different because there a stream of new incoming bookings and booking duration is relatively short, so no bookings in common between the modelling and live dataset. Contracts on the other hand have long duration and can cancel at any time, and sometimes never.










share|improve this question









$endgroup$




I'm busy with a supervised machine learning problem where I am predicting contract cancellation. Although a lengthy question, I do hope someone will take the time as I'm convinced it will help others out there (I've just been unable to find ANY solutions that have helped me)



I have the following two datasets:



1) "Modelling Dataset"



Contains about 400k contracts (rows) with 300 features and a single label (0 = "Not Cancelled", 1 = "Cancelled").



Each row represents a single contract, and each contract is only represented once in the data. There are 350k "Not Cancelled" and 50k "Cancelled" cases.



Features are all extracted as at a specific date for each contract. This date is referred to as the "Effective Date". For "Cancelled" contracts, the "Effective Date" is the date of cancellation. For "Not Cancelled" contracts, the "Effective Date" is a date say 6 months ago. This will be explained in a moment.



2) "Live Dataset"



Contains 300k contracts (rows) with the same list of 300 features. All these contracts are "Not Cancelled" of course, as we want to predict which of them will cancel. These contracts were followed for a period of 2 months, and I then added a Label to this data to indicate whether it actually ended up cancelling in those two months: 0 = "Not Cancelled", 1 = "Cancelled"



The problem:



I get amazing results on the "Modelling Dataset" (random train/test split) (eg Precision 95%, AUC 0.98), but as soon as that model is applied to the "Live Dataset", it performs poorly (cannot predict well which contracts ends up cancelling) (eg Precision 50%, AUC 0.7).



On the Modelling Dataset, the results are great, almost irrespective of model or data preparation. I test a number of models (E.g. SkLearn random forest, Keras neural network, Microsoft GbmLight, SkLearn Recursive feature elimination). Even with default settings, the models generally perform well. I've standardized features. I've binned features to attempt improving how well it will generalize. Nothing has help it generalize to the "Live Dataset"



My suspicion:



In my mind, this is not an over-training issue because I've got a test set within the "Modelling Dataset" and those results are great on the test set. It is not a modelling or even a hyper-parameter optimization issue, as the results are already great.



I've also investigated whether there are significant differences in the profile of the features between the two datasets by looking at histograms, feature-by-feature. Nothing is worryingly different.



I suspect the issue lies therein that the same contracts that are marked as "Not Cancelled" in the "Modelling Dataset", which the model trains to recognize "Not Cancelled" of course, is basically the exact same contracts in the "Live Dataset", except that 6 months have now passed.



I suspect that the features for the "Not Cancelled" cases has not changed enough to now make the model recognize some of them as about to be "Cancelled". In other words, the contracts have not moved enough in the feature space.



My questions:



Firstly, does my suspicion sound correct?



Secondly, if I've stated the problem to be solved incorrectly, how would I then set up the problem-statement if the purpose is to predict cancellation of something like contracts (when the data on which you train will almost certainly contain the data one which you want to predict)?



For the record, the problem-statement I've used here is similar to the way others have done this. And they reported great results. But I'm not sure that the models were ever tested in real live. In other cases, the problem to be solved was lightly different, e.g. hotel booking cancellations, which is different because there a stream of new incoming bookings and booking duration is relatively short, so no bookings in common between the modelling and live dataset. Contracts on the other hand have long duration and can cancel at any time, and sometimes never.







machine-learning python classification scikit-learn churn






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jun 14 '17 at 13:46









Ernst DinkelmannErnst Dinkelmann

263




263












  • $begingroup$
    Another theory suggests that it is not correct to only allow only a single sample for each contract and one should create a sample for each past time-period (e.g. month) of each contract. E.g., if a contract was not cancelled and it is 12 months old, we should have 12 samples, each representing the features as at each of those 12 months respectively and each with label = 0. On the other hand if a contract was cancelled now and is 12 months old, we should have 11 samples where the contract was not cancelled (label = 0) and 1 (the most recent) where it was cancelled (label = 1).
    $endgroup$
    – Ernst Dinkelmann
    Jun 15 '17 at 9:58










  • $begingroup$
    I am having exactly the same problem and I have found no solution yet. One suspicion I have is that we are dealing with some type of unbalanced class problem. I my case, I am training the model where churners are 60% of total sample because built from 6 years worth of data, but the "live dataset" only has 6% churners because it collects data from only 2 months into the future. So, the model learns from a population which is not realistic, and when applied to the true population, it fails. You might be in the same situation.
    $endgroup$
    – VinceP
    Nov 24 '17 at 15:48










  • $begingroup$
    I´m building a similar model to that. Do you have any updates on the matter? Could we maybe share some insights?
    $endgroup$
    – Luciano Almeida Filho
    Sep 3 '18 at 19:04










  • $begingroup$
    @LucianoAlmeidaFilho Do you still need insights? I missed the question, sorry.
    $endgroup$
    – Ernst Dinkelmann
    16 mins ago


















  • $begingroup$
    Another theory suggests that it is not correct to only allow only a single sample for each contract and one should create a sample for each past time-period (e.g. month) of each contract. E.g., if a contract was not cancelled and it is 12 months old, we should have 12 samples, each representing the features as at each of those 12 months respectively and each with label = 0. On the other hand if a contract was cancelled now and is 12 months old, we should have 11 samples where the contract was not cancelled (label = 0) and 1 (the most recent) where it was cancelled (label = 1).
    $endgroup$
    – Ernst Dinkelmann
    Jun 15 '17 at 9:58










  • $begingroup$
    I am having exactly the same problem and I have found no solution yet. One suspicion I have is that we are dealing with some type of unbalanced class problem. I my case, I am training the model where churners are 60% of total sample because built from 6 years worth of data, but the "live dataset" only has 6% churners because it collects data from only 2 months into the future. So, the model learns from a population which is not realistic, and when applied to the true population, it fails. You might be in the same situation.
    $endgroup$
    – VinceP
    Nov 24 '17 at 15:48










  • $begingroup$
    I´m building a similar model to that. Do you have any updates on the matter? Could we maybe share some insights?
    $endgroup$
    – Luciano Almeida Filho
    Sep 3 '18 at 19:04










  • $begingroup$
    @LucianoAlmeidaFilho Do you still need insights? I missed the question, sorry.
    $endgroup$
    – Ernst Dinkelmann
    16 mins ago
















$begingroup$
Another theory suggests that it is not correct to only allow only a single sample for each contract and one should create a sample for each past time-period (e.g. month) of each contract. E.g., if a contract was not cancelled and it is 12 months old, we should have 12 samples, each representing the features as at each of those 12 months respectively and each with label = 0. On the other hand if a contract was cancelled now and is 12 months old, we should have 11 samples where the contract was not cancelled (label = 0) and 1 (the most recent) where it was cancelled (label = 1).
$endgroup$
– Ernst Dinkelmann
Jun 15 '17 at 9:58




$begingroup$
Another theory suggests that it is not correct to only allow only a single sample for each contract and one should create a sample for each past time-period (e.g. month) of each contract. E.g., if a contract was not cancelled and it is 12 months old, we should have 12 samples, each representing the features as at each of those 12 months respectively and each with label = 0. On the other hand if a contract was cancelled now and is 12 months old, we should have 11 samples where the contract was not cancelled (label = 0) and 1 (the most recent) where it was cancelled (label = 1).
$endgroup$
– Ernst Dinkelmann
Jun 15 '17 at 9:58












$begingroup$
I am having exactly the same problem and I have found no solution yet. One suspicion I have is that we are dealing with some type of unbalanced class problem. I my case, I am training the model where churners are 60% of total sample because built from 6 years worth of data, but the "live dataset" only has 6% churners because it collects data from only 2 months into the future. So, the model learns from a population which is not realistic, and when applied to the true population, it fails. You might be in the same situation.
$endgroup$
– VinceP
Nov 24 '17 at 15:48




$begingroup$
I am having exactly the same problem and I have found no solution yet. One suspicion I have is that we are dealing with some type of unbalanced class problem. I my case, I am training the model where churners are 60% of total sample because built from 6 years worth of data, but the "live dataset" only has 6% churners because it collects data from only 2 months into the future. So, the model learns from a population which is not realistic, and when applied to the true population, it fails. You might be in the same situation.
$endgroup$
– VinceP
Nov 24 '17 at 15:48












$begingroup$
I´m building a similar model to that. Do you have any updates on the matter? Could we maybe share some insights?
$endgroup$
– Luciano Almeida Filho
Sep 3 '18 at 19:04




$begingroup$
I´m building a similar model to that. Do you have any updates on the matter? Could we maybe share some insights?
$endgroup$
– Luciano Almeida Filho
Sep 3 '18 at 19:04












$begingroup$
@LucianoAlmeidaFilho Do you still need insights? I missed the question, sorry.
$endgroup$
– Ernst Dinkelmann
16 mins ago




$begingroup$
@LucianoAlmeidaFilho Do you still need insights? I missed the question, sorry.
$endgroup$
– Ernst Dinkelmann
16 mins ago










3 Answers
3






active

oldest

votes


















3












$begingroup$

If your model makes a prediction 6 months into the future, then it doesn't make sense to judge its performance before 6 months. If only 2 months have passed, then possibly 2/3 of the true positives have yet to reveal their true nature and you are arriving a premature conclusion.



To test this theory, I would train a new model to predict 2 months out and use that to get an approximation of live accuracy while your wait 4 more months for the first model. Of course, there could be other problems, but this is what I would try first.






share|improve this answer









$endgroup$













  • $begingroup$
    The model learns to identify contracts that are about to lapse., because that's the date at which we pass the features. So I doubt there is a long "run-off" of cancellations. Having said that, it's relatively simple to pull the datasets again and test the theory. I will feed back the results.
    $endgroup$
    – Ernst Dinkelmann
    Jun 15 '17 at 9:51










  • $begingroup$
    I've checked and this did not solve the problem. Modelling data was pulled as at 8 months ago, Live data as at 4 months ago and hence observed for 4 months. No improvement. Precision Score 0.53%, AUC 0.61. So there must be other problems. I'm working on the suggestions of others and will provide feedback on those too.
    $endgroup$
    – Ernst Dinkelmann
    Jun 20 '17 at 10:21



















2












$begingroup$

Its hard to answer without a good look at the data.
But if I had to guess, your point seems valid. (considering there's no problem with cross validation methods or leaks)



If you are "measuring" the contract features at different points in time, there might be a high bias that the features of the cancelled contracts, measured at the point in time in which they were cancelled, might be very different from the "initial" features of those same contracts.



Hence, your modelling wld be learning how to predict a contract is being cancelled at the given date it is cancelled, and not prior to it, thats why it wouldnt be working properly on your "real world data".



If you can, try using the data from the moment the contract was set (initialized), to build your model.






share|improve this answer









$endgroup$













  • $begingroup$
    Most features that are important (based on feature importance measures) are the ones that change over time. E.g. the client's broker displays certain behaviour before contracts are cancelled. So using features as at the initialization would miss out on the behavioural aspects completely as well as any other things that happen close to cancellation. I really doubt using features at at initialization would solve this problem. However, I do value the idea that it then only learns to identify contracts about to cancel. This is certainly a weakness.
    $endgroup$
    – Ernst Dinkelmann
    Jun 15 '17 at 9:39












  • $begingroup$
    Perhaps to solve that issue of it only learning to identify contracts about to lapse, is perhaps to, for each lapsed contract, create a number of samples (eg representing how the features looked right before cancellations, 5 days before, 10 days before, 15 days before, etc). This is a form of oversampling (and it's OK as it's a class-imbalance problem anyways), which is better than for example SMOTE based oversampling and certainly better than random over-sampling. I'll see if my data sources allow me to do it this way and I will report back.
    $endgroup$
    – Ernst Dinkelmann
    Jun 15 '17 at 9:46



















0












$begingroup$

This is some time after the question, but I thought worthwhile to include it.
The solution to get similar results between the datasets was to include different data in the "modelling data"of which the training data is a subset.



Instead of only including each contract once in the data, I had to include every contract multiple times, e.g. from 2016/01/01 up to cancellation date (if cancelled) or today (if not cancelled). So contract included at many effective dates.



In each case, the label would now be whether a cancellation had occurred from the effective date of that record within a fixed period of interest (eg 1 month). So "1" for those that did cancel within 1 month and "0" for those that did not cancel within 1 month.



Now the model learns to recognise whether a contract will likely cancel within a month.



The results were not amazing, but a least consistent between modelling and live sets. But it was actually expected as cancellation of long-term contracts over the short term is just difficult to predict in many cases.






share|improve this answer









$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "557"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f19711%2fpredicting-contract-churn-cancellation-great-model-results-does-not-work-in-the%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3












    $begingroup$

    If your model makes a prediction 6 months into the future, then it doesn't make sense to judge its performance before 6 months. If only 2 months have passed, then possibly 2/3 of the true positives have yet to reveal their true nature and you are arriving a premature conclusion.



    To test this theory, I would train a new model to predict 2 months out and use that to get an approximation of live accuracy while your wait 4 more months for the first model. Of course, there could be other problems, but this is what I would try first.






    share|improve this answer









    $endgroup$













    • $begingroup$
      The model learns to identify contracts that are about to lapse., because that's the date at which we pass the features. So I doubt there is a long "run-off" of cancellations. Having said that, it's relatively simple to pull the datasets again and test the theory. I will feed back the results.
      $endgroup$
      – Ernst Dinkelmann
      Jun 15 '17 at 9:51










    • $begingroup$
      I've checked and this did not solve the problem. Modelling data was pulled as at 8 months ago, Live data as at 4 months ago and hence observed for 4 months. No improvement. Precision Score 0.53%, AUC 0.61. So there must be other problems. I'm working on the suggestions of others and will provide feedback on those too.
      $endgroup$
      – Ernst Dinkelmann
      Jun 20 '17 at 10:21
















    3












    $begingroup$

    If your model makes a prediction 6 months into the future, then it doesn't make sense to judge its performance before 6 months. If only 2 months have passed, then possibly 2/3 of the true positives have yet to reveal their true nature and you are arriving a premature conclusion.



    To test this theory, I would train a new model to predict 2 months out and use that to get an approximation of live accuracy while your wait 4 more months for the first model. Of course, there could be other problems, but this is what I would try first.






    share|improve this answer









    $endgroup$













    • $begingroup$
      The model learns to identify contracts that are about to lapse., because that's the date at which we pass the features. So I doubt there is a long "run-off" of cancellations. Having said that, it's relatively simple to pull the datasets again and test the theory. I will feed back the results.
      $endgroup$
      – Ernst Dinkelmann
      Jun 15 '17 at 9:51










    • $begingroup$
      I've checked and this did not solve the problem. Modelling data was pulled as at 8 months ago, Live data as at 4 months ago and hence observed for 4 months. No improvement. Precision Score 0.53%, AUC 0.61. So there must be other problems. I'm working on the suggestions of others and will provide feedback on those too.
      $endgroup$
      – Ernst Dinkelmann
      Jun 20 '17 at 10:21














    3












    3








    3





    $begingroup$

    If your model makes a prediction 6 months into the future, then it doesn't make sense to judge its performance before 6 months. If only 2 months have passed, then possibly 2/3 of the true positives have yet to reveal their true nature and you are arriving a premature conclusion.



    To test this theory, I would train a new model to predict 2 months out and use that to get an approximation of live accuracy while your wait 4 more months for the first model. Of course, there could be other problems, but this is what I would try first.






    share|improve this answer









    $endgroup$



    If your model makes a prediction 6 months into the future, then it doesn't make sense to judge its performance before 6 months. If only 2 months have passed, then possibly 2/3 of the true positives have yet to reveal their true nature and you are arriving a premature conclusion.



    To test this theory, I would train a new model to predict 2 months out and use that to get an approximation of live accuracy while your wait 4 more months for the first model. Of course, there could be other problems, but this is what I would try first.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jun 14 '17 at 21:05









    Ryan ZottiRyan Zotti

    2,45931227




    2,45931227












    • $begingroup$
      The model learns to identify contracts that are about to lapse., because that's the date at which we pass the features. So I doubt there is a long "run-off" of cancellations. Having said that, it's relatively simple to pull the datasets again and test the theory. I will feed back the results.
      $endgroup$
      – Ernst Dinkelmann
      Jun 15 '17 at 9:51










    • $begingroup$
      I've checked and this did not solve the problem. Modelling data was pulled as at 8 months ago, Live data as at 4 months ago and hence observed for 4 months. No improvement. Precision Score 0.53%, AUC 0.61. So there must be other problems. I'm working on the suggestions of others and will provide feedback on those too.
      $endgroup$
      – Ernst Dinkelmann
      Jun 20 '17 at 10:21


















    • $begingroup$
      The model learns to identify contracts that are about to lapse., because that's the date at which we pass the features. So I doubt there is a long "run-off" of cancellations. Having said that, it's relatively simple to pull the datasets again and test the theory. I will feed back the results.
      $endgroup$
      – Ernst Dinkelmann
      Jun 15 '17 at 9:51










    • $begingroup$
      I've checked and this did not solve the problem. Modelling data was pulled as at 8 months ago, Live data as at 4 months ago and hence observed for 4 months. No improvement. Precision Score 0.53%, AUC 0.61. So there must be other problems. I'm working on the suggestions of others and will provide feedback on those too.
      $endgroup$
      – Ernst Dinkelmann
      Jun 20 '17 at 10:21
















    $begingroup$
    The model learns to identify contracts that are about to lapse., because that's the date at which we pass the features. So I doubt there is a long "run-off" of cancellations. Having said that, it's relatively simple to pull the datasets again and test the theory. I will feed back the results.
    $endgroup$
    – Ernst Dinkelmann
    Jun 15 '17 at 9:51




    $begingroup$
    The model learns to identify contracts that are about to lapse., because that's the date at which we pass the features. So I doubt there is a long "run-off" of cancellations. Having said that, it's relatively simple to pull the datasets again and test the theory. I will feed back the results.
    $endgroup$
    – Ernst Dinkelmann
    Jun 15 '17 at 9:51












    $begingroup$
    I've checked and this did not solve the problem. Modelling data was pulled as at 8 months ago, Live data as at 4 months ago and hence observed for 4 months. No improvement. Precision Score 0.53%, AUC 0.61. So there must be other problems. I'm working on the suggestions of others and will provide feedback on those too.
    $endgroup$
    – Ernst Dinkelmann
    Jun 20 '17 at 10:21




    $begingroup$
    I've checked and this did not solve the problem. Modelling data was pulled as at 8 months ago, Live data as at 4 months ago and hence observed for 4 months. No improvement. Precision Score 0.53%, AUC 0.61. So there must be other problems. I'm working on the suggestions of others and will provide feedback on those too.
    $endgroup$
    – Ernst Dinkelmann
    Jun 20 '17 at 10:21











    2












    $begingroup$

    Its hard to answer without a good look at the data.
    But if I had to guess, your point seems valid. (considering there's no problem with cross validation methods or leaks)



    If you are "measuring" the contract features at different points in time, there might be a high bias that the features of the cancelled contracts, measured at the point in time in which they were cancelled, might be very different from the "initial" features of those same contracts.



    Hence, your modelling wld be learning how to predict a contract is being cancelled at the given date it is cancelled, and not prior to it, thats why it wouldnt be working properly on your "real world data".



    If you can, try using the data from the moment the contract was set (initialized), to build your model.






    share|improve this answer









    $endgroup$













    • $begingroup$
      Most features that are important (based on feature importance measures) are the ones that change over time. E.g. the client's broker displays certain behaviour before contracts are cancelled. So using features as at the initialization would miss out on the behavioural aspects completely as well as any other things that happen close to cancellation. I really doubt using features at at initialization would solve this problem. However, I do value the idea that it then only learns to identify contracts about to cancel. This is certainly a weakness.
      $endgroup$
      – Ernst Dinkelmann
      Jun 15 '17 at 9:39












    • $begingroup$
      Perhaps to solve that issue of it only learning to identify contracts about to lapse, is perhaps to, for each lapsed contract, create a number of samples (eg representing how the features looked right before cancellations, 5 days before, 10 days before, 15 days before, etc). This is a form of oversampling (and it's OK as it's a class-imbalance problem anyways), which is better than for example SMOTE based oversampling and certainly better than random over-sampling. I'll see if my data sources allow me to do it this way and I will report back.
      $endgroup$
      – Ernst Dinkelmann
      Jun 15 '17 at 9:46
















    2












    $begingroup$

    Its hard to answer without a good look at the data.
    But if I had to guess, your point seems valid. (considering there's no problem with cross validation methods or leaks)



    If you are "measuring" the contract features at different points in time, there might be a high bias that the features of the cancelled contracts, measured at the point in time in which they were cancelled, might be very different from the "initial" features of those same contracts.



    Hence, your modelling wld be learning how to predict a contract is being cancelled at the given date it is cancelled, and not prior to it, thats why it wouldnt be working properly on your "real world data".



    If you can, try using the data from the moment the contract was set (initialized), to build your model.






    share|improve this answer









    $endgroup$













    • $begingroup$
      Most features that are important (based on feature importance measures) are the ones that change over time. E.g. the client's broker displays certain behaviour before contracts are cancelled. So using features as at the initialization would miss out on the behavioural aspects completely as well as any other things that happen close to cancellation. I really doubt using features at at initialization would solve this problem. However, I do value the idea that it then only learns to identify contracts about to cancel. This is certainly a weakness.
      $endgroup$
      – Ernst Dinkelmann
      Jun 15 '17 at 9:39












    • $begingroup$
      Perhaps to solve that issue of it only learning to identify contracts about to lapse, is perhaps to, for each lapsed contract, create a number of samples (eg representing how the features looked right before cancellations, 5 days before, 10 days before, 15 days before, etc). This is a form of oversampling (and it's OK as it's a class-imbalance problem anyways), which is better than for example SMOTE based oversampling and certainly better than random over-sampling. I'll see if my data sources allow me to do it this way and I will report back.
      $endgroup$
      – Ernst Dinkelmann
      Jun 15 '17 at 9:46














    2












    2








    2





    $begingroup$

    Its hard to answer without a good look at the data.
    But if I had to guess, your point seems valid. (considering there's no problem with cross validation methods or leaks)



    If you are "measuring" the contract features at different points in time, there might be a high bias that the features of the cancelled contracts, measured at the point in time in which they were cancelled, might be very different from the "initial" features of those same contracts.



    Hence, your modelling wld be learning how to predict a contract is being cancelled at the given date it is cancelled, and not prior to it, thats why it wouldnt be working properly on your "real world data".



    If you can, try using the data from the moment the contract was set (initialized), to build your model.






    share|improve this answer









    $endgroup$



    Its hard to answer without a good look at the data.
    But if I had to guess, your point seems valid. (considering there's no problem with cross validation methods or leaks)



    If you are "measuring" the contract features at different points in time, there might be a high bias that the features of the cancelled contracts, measured at the point in time in which they were cancelled, might be very different from the "initial" features of those same contracts.



    Hence, your modelling wld be learning how to predict a contract is being cancelled at the given date it is cancelled, and not prior to it, thats why it wouldnt be working properly on your "real world data".



    If you can, try using the data from the moment the contract was set (initialized), to build your model.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jun 14 '17 at 15:55









    epattaroepattaro

    1665




    1665












    • $begingroup$
      Most features that are important (based on feature importance measures) are the ones that change over time. E.g. the client's broker displays certain behaviour before contracts are cancelled. So using features as at the initialization would miss out on the behavioural aspects completely as well as any other things that happen close to cancellation. I really doubt using features at at initialization would solve this problem. However, I do value the idea that it then only learns to identify contracts about to cancel. This is certainly a weakness.
      $endgroup$
      – Ernst Dinkelmann
      Jun 15 '17 at 9:39












    • $begingroup$
      Perhaps to solve that issue of it only learning to identify contracts about to lapse, is perhaps to, for each lapsed contract, create a number of samples (eg representing how the features looked right before cancellations, 5 days before, 10 days before, 15 days before, etc). This is a form of oversampling (and it's OK as it's a class-imbalance problem anyways), which is better than for example SMOTE based oversampling and certainly better than random over-sampling. I'll see if my data sources allow me to do it this way and I will report back.
      $endgroup$
      – Ernst Dinkelmann
      Jun 15 '17 at 9:46


















    • $begingroup$
      Most features that are important (based on feature importance measures) are the ones that change over time. E.g. the client's broker displays certain behaviour before contracts are cancelled. So using features as at the initialization would miss out on the behavioural aspects completely as well as any other things that happen close to cancellation. I really doubt using features at at initialization would solve this problem. However, I do value the idea that it then only learns to identify contracts about to cancel. This is certainly a weakness.
      $endgroup$
      – Ernst Dinkelmann
      Jun 15 '17 at 9:39












    • $begingroup$
      Perhaps to solve that issue of it only learning to identify contracts about to lapse, is perhaps to, for each lapsed contract, create a number of samples (eg representing how the features looked right before cancellations, 5 days before, 10 days before, 15 days before, etc). This is a form of oversampling (and it's OK as it's a class-imbalance problem anyways), which is better than for example SMOTE based oversampling and certainly better than random over-sampling. I'll see if my data sources allow me to do it this way and I will report back.
      $endgroup$
      – Ernst Dinkelmann
      Jun 15 '17 at 9:46
















    $begingroup$
    Most features that are important (based on feature importance measures) are the ones that change over time. E.g. the client's broker displays certain behaviour before contracts are cancelled. So using features as at the initialization would miss out on the behavioural aspects completely as well as any other things that happen close to cancellation. I really doubt using features at at initialization would solve this problem. However, I do value the idea that it then only learns to identify contracts about to cancel. This is certainly a weakness.
    $endgroup$
    – Ernst Dinkelmann
    Jun 15 '17 at 9:39






    $begingroup$
    Most features that are important (based on feature importance measures) are the ones that change over time. E.g. the client's broker displays certain behaviour before contracts are cancelled. So using features as at the initialization would miss out on the behavioural aspects completely as well as any other things that happen close to cancellation. I really doubt using features at at initialization would solve this problem. However, I do value the idea that it then only learns to identify contracts about to cancel. This is certainly a weakness.
    $endgroup$
    – Ernst Dinkelmann
    Jun 15 '17 at 9:39














    $begingroup$
    Perhaps to solve that issue of it only learning to identify contracts about to lapse, is perhaps to, for each lapsed contract, create a number of samples (eg representing how the features looked right before cancellations, 5 days before, 10 days before, 15 days before, etc). This is a form of oversampling (and it's OK as it's a class-imbalance problem anyways), which is better than for example SMOTE based oversampling and certainly better than random over-sampling. I'll see if my data sources allow me to do it this way and I will report back.
    $endgroup$
    – Ernst Dinkelmann
    Jun 15 '17 at 9:46




    $begingroup$
    Perhaps to solve that issue of it only learning to identify contracts about to lapse, is perhaps to, for each lapsed contract, create a number of samples (eg representing how the features looked right before cancellations, 5 days before, 10 days before, 15 days before, etc). This is a form of oversampling (and it's OK as it's a class-imbalance problem anyways), which is better than for example SMOTE based oversampling and certainly better than random over-sampling. I'll see if my data sources allow me to do it this way and I will report back.
    $endgroup$
    – Ernst Dinkelmann
    Jun 15 '17 at 9:46











    0












    $begingroup$

    This is some time after the question, but I thought worthwhile to include it.
    The solution to get similar results between the datasets was to include different data in the "modelling data"of which the training data is a subset.



    Instead of only including each contract once in the data, I had to include every contract multiple times, e.g. from 2016/01/01 up to cancellation date (if cancelled) or today (if not cancelled). So contract included at many effective dates.



    In each case, the label would now be whether a cancellation had occurred from the effective date of that record within a fixed period of interest (eg 1 month). So "1" for those that did cancel within 1 month and "0" for those that did not cancel within 1 month.



    Now the model learns to recognise whether a contract will likely cancel within a month.



    The results were not amazing, but a least consistent between modelling and live sets. But it was actually expected as cancellation of long-term contracts over the short term is just difficult to predict in many cases.






    share|improve this answer









    $endgroup$


















      0












      $begingroup$

      This is some time after the question, but I thought worthwhile to include it.
      The solution to get similar results between the datasets was to include different data in the "modelling data"of which the training data is a subset.



      Instead of only including each contract once in the data, I had to include every contract multiple times, e.g. from 2016/01/01 up to cancellation date (if cancelled) or today (if not cancelled). So contract included at many effective dates.



      In each case, the label would now be whether a cancellation had occurred from the effective date of that record within a fixed period of interest (eg 1 month). So "1" for those that did cancel within 1 month and "0" for those that did not cancel within 1 month.



      Now the model learns to recognise whether a contract will likely cancel within a month.



      The results were not amazing, but a least consistent between modelling and live sets. But it was actually expected as cancellation of long-term contracts over the short term is just difficult to predict in many cases.






      share|improve this answer









      $endgroup$
















        0












        0








        0





        $begingroup$

        This is some time after the question, but I thought worthwhile to include it.
        The solution to get similar results between the datasets was to include different data in the "modelling data"of which the training data is a subset.



        Instead of only including each contract once in the data, I had to include every contract multiple times, e.g. from 2016/01/01 up to cancellation date (if cancelled) or today (if not cancelled). So contract included at many effective dates.



        In each case, the label would now be whether a cancellation had occurred from the effective date of that record within a fixed period of interest (eg 1 month). So "1" for those that did cancel within 1 month and "0" for those that did not cancel within 1 month.



        Now the model learns to recognise whether a contract will likely cancel within a month.



        The results were not amazing, but a least consistent between modelling and live sets. But it was actually expected as cancellation of long-term contracts over the short term is just difficult to predict in many cases.






        share|improve this answer









        $endgroup$



        This is some time after the question, but I thought worthwhile to include it.
        The solution to get similar results between the datasets was to include different data in the "modelling data"of which the training data is a subset.



        Instead of only including each contract once in the data, I had to include every contract multiple times, e.g. from 2016/01/01 up to cancellation date (if cancelled) or today (if not cancelled). So contract included at many effective dates.



        In each case, the label would now be whether a cancellation had occurred from the effective date of that record within a fixed period of interest (eg 1 month). So "1" for those that did cancel within 1 month and "0" for those that did not cancel within 1 month.



        Now the model learns to recognise whether a contract will likely cancel within a month.



        The results were not amazing, but a least consistent between modelling and live sets. But it was actually expected as cancellation of long-term contracts over the short term is just difficult to predict in many cases.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 17 mins ago









        Ernst DinkelmannErnst Dinkelmann

        263




        263






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f19711%2fpredicting-contract-churn-cancellation-great-model-results-does-not-work-in-the%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Aikido

            The minimum number of groups for any class cannot be less than 2 error

            SMOTE: ValueError: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6