Generate predictions that are orthogonal to another variable












0












$begingroup$


I have an X matrix, a y variable, and another variable ORTHO_VAR. I need to predict the y variable using X, however, the predictions from that model need to be orthogonal to ORTHO_VAR while being as correlated with y as possible.



I would prefer that the predictions are generated with a non-parametric method such as xgboost.XGBRegressor but I could use a linear method if absolutely necessary.



This code:



import numpy as np
import pandas as pd
from sklearn.datasets import make_regression
from xgboost import XGBRegressor

ORTHO_VAR = 'ortho_var'
IND_VARNM = 'indep_var'
TARGET = 'target'
CORRECTED_VARNM = 'indep_var_fixed'

# Create regression dataset with two correlated targets
X, y = make_regression(n_features=20, random_state=245, n_targets=2)
indep_vars = ['var{}'.format(i) for i in range(X.shape[1])]

# Pull into dataframe
df = pd.DataFrame(X, columns=indep_vars)
df[TARGET] = y[:, 0]
df[ORTHO_VAR] = y[:, 1]

# Fit a model to predict TARGET
xgb = XGBRegressor(n_estimators=10)
xgb.fit(df[indep_vars], df[TARGET])
df['yhat'] = xgb.predict(df[indep_vars])

# Correlation should be low or preferably zero
pred_corr_w_ortho = df.corr().abs()['yhat']['ortho_var']
assert pred_corr_w_ortho < 0.01, pred_corr_w_ortho


Returns this:



---------------------------------------------------------------------------
AssertionError
1 pred_corr_w_ortho = df.corr().abs()['yhat']['ortho_var']
----> 2 assert pred_corr_w_ortho < 0.05, pred_corr_w_ortho

AssertionError: 0.5895885756753665


...and I would like something that maintains as much predictive accuracy as possible while remaining orthogonal to ORTHO_VAR










share|improve this question









$endgroup$

















    0












    $begingroup$


    I have an X matrix, a y variable, and another variable ORTHO_VAR. I need to predict the y variable using X, however, the predictions from that model need to be orthogonal to ORTHO_VAR while being as correlated with y as possible.



    I would prefer that the predictions are generated with a non-parametric method such as xgboost.XGBRegressor but I could use a linear method if absolutely necessary.



    This code:



    import numpy as np
    import pandas as pd
    from sklearn.datasets import make_regression
    from xgboost import XGBRegressor

    ORTHO_VAR = 'ortho_var'
    IND_VARNM = 'indep_var'
    TARGET = 'target'
    CORRECTED_VARNM = 'indep_var_fixed'

    # Create regression dataset with two correlated targets
    X, y = make_regression(n_features=20, random_state=245, n_targets=2)
    indep_vars = ['var{}'.format(i) for i in range(X.shape[1])]

    # Pull into dataframe
    df = pd.DataFrame(X, columns=indep_vars)
    df[TARGET] = y[:, 0]
    df[ORTHO_VAR] = y[:, 1]

    # Fit a model to predict TARGET
    xgb = XGBRegressor(n_estimators=10)
    xgb.fit(df[indep_vars], df[TARGET])
    df['yhat'] = xgb.predict(df[indep_vars])

    # Correlation should be low or preferably zero
    pred_corr_w_ortho = df.corr().abs()['yhat']['ortho_var']
    assert pred_corr_w_ortho < 0.01, pred_corr_w_ortho


    Returns this:



    ---------------------------------------------------------------------------
    AssertionError
    1 pred_corr_w_ortho = df.corr().abs()['yhat']['ortho_var']
    ----> 2 assert pred_corr_w_ortho < 0.05, pred_corr_w_ortho

    AssertionError: 0.5895885756753665


    ...and I would like something that maintains as much predictive accuracy as possible while remaining orthogonal to ORTHO_VAR










    share|improve this question









    $endgroup$















      0












      0








      0





      $begingroup$


      I have an X matrix, a y variable, and another variable ORTHO_VAR. I need to predict the y variable using X, however, the predictions from that model need to be orthogonal to ORTHO_VAR while being as correlated with y as possible.



      I would prefer that the predictions are generated with a non-parametric method such as xgboost.XGBRegressor but I could use a linear method if absolutely necessary.



      This code:



      import numpy as np
      import pandas as pd
      from sklearn.datasets import make_regression
      from xgboost import XGBRegressor

      ORTHO_VAR = 'ortho_var'
      IND_VARNM = 'indep_var'
      TARGET = 'target'
      CORRECTED_VARNM = 'indep_var_fixed'

      # Create regression dataset with two correlated targets
      X, y = make_regression(n_features=20, random_state=245, n_targets=2)
      indep_vars = ['var{}'.format(i) for i in range(X.shape[1])]

      # Pull into dataframe
      df = pd.DataFrame(X, columns=indep_vars)
      df[TARGET] = y[:, 0]
      df[ORTHO_VAR] = y[:, 1]

      # Fit a model to predict TARGET
      xgb = XGBRegressor(n_estimators=10)
      xgb.fit(df[indep_vars], df[TARGET])
      df['yhat'] = xgb.predict(df[indep_vars])

      # Correlation should be low or preferably zero
      pred_corr_w_ortho = df.corr().abs()['yhat']['ortho_var']
      assert pred_corr_w_ortho < 0.01, pred_corr_w_ortho


      Returns this:



      ---------------------------------------------------------------------------
      AssertionError
      1 pred_corr_w_ortho = df.corr().abs()['yhat']['ortho_var']
      ----> 2 assert pred_corr_w_ortho < 0.05, pred_corr_w_ortho

      AssertionError: 0.5895885756753665


      ...and I would like something that maintains as much predictive accuracy as possible while remaining orthogonal to ORTHO_VAR










      share|improve this question









      $endgroup$




      I have an X matrix, a y variable, and another variable ORTHO_VAR. I need to predict the y variable using X, however, the predictions from that model need to be orthogonal to ORTHO_VAR while being as correlated with y as possible.



      I would prefer that the predictions are generated with a non-parametric method such as xgboost.XGBRegressor but I could use a linear method if absolutely necessary.



      This code:



      import numpy as np
      import pandas as pd
      from sklearn.datasets import make_regression
      from xgboost import XGBRegressor

      ORTHO_VAR = 'ortho_var'
      IND_VARNM = 'indep_var'
      TARGET = 'target'
      CORRECTED_VARNM = 'indep_var_fixed'

      # Create regression dataset with two correlated targets
      X, y = make_regression(n_features=20, random_state=245, n_targets=2)
      indep_vars = ['var{}'.format(i) for i in range(X.shape[1])]

      # Pull into dataframe
      df = pd.DataFrame(X, columns=indep_vars)
      df[TARGET] = y[:, 0]
      df[ORTHO_VAR] = y[:, 1]

      # Fit a model to predict TARGET
      xgb = XGBRegressor(n_estimators=10)
      xgb.fit(df[indep_vars], df[TARGET])
      df['yhat'] = xgb.predict(df[indep_vars])

      # Correlation should be low or preferably zero
      pred_corr_w_ortho = df.corr().abs()['yhat']['ortho_var']
      assert pred_corr_w_ortho < 0.01, pred_corr_w_ortho


      Returns this:



      ---------------------------------------------------------------------------
      AssertionError
      1 pred_corr_w_ortho = df.corr().abs()['yhat']['ortho_var']
      ----> 2 assert pred_corr_w_ortho < 0.05, pred_corr_w_ortho

      AssertionError: 0.5895885756753665


      ...and I would like something that maintains as much predictive accuracy as possible while remaining orthogonal to ORTHO_VAR







      correlation






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 11 mins ago









      ChrisChris

      1627




      1627






















          0






          active

          oldest

          votes












          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49226%2fgenerate-predictions-that-are-orthogonal-to-another-variable%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49226%2fgenerate-predictions-that-are-orthogonal-to-another-variable%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Ponta tanko

          Tantalo (mitologio)

          Erzsébet Schaár