Exclude observations with measurements below limit of detection?












3












$begingroup$


I am analysing a dataset for the relationship between an exposure variable x and a response y (in my case, these are urinary concentration of a specific compound and a measure of cognitive function). x is measured using an analytical method which has a lower detection limit - and approximately 12% of the population have concentrations below the detection limit.



In a first analysis, I compared participants with y above and below detection limit, and found a significant difference - which is not surprising.



My question is: when I conduct a regression analysis for y ~ x - should I exclude all those x < detection limit, or not? It does affect the results and actually inverses the association (if I include all observations, the association is positive - if I exclude them, it is negative).










share|cite|improve this question











$endgroup$

















    3












    $begingroup$


    I am analysing a dataset for the relationship between an exposure variable x and a response y (in my case, these are urinary concentration of a specific compound and a measure of cognitive function). x is measured using an analytical method which has a lower detection limit - and approximately 12% of the population have concentrations below the detection limit.



    In a first analysis, I compared participants with y above and below detection limit, and found a significant difference - which is not surprising.



    My question is: when I conduct a regression analysis for y ~ x - should I exclude all those x < detection limit, or not? It does affect the results and actually inverses the association (if I include all observations, the association is positive - if I exclude them, it is negative).










    share|cite|improve this question











    $endgroup$















      3












      3








      3





      $begingroup$


      I am analysing a dataset for the relationship between an exposure variable x and a response y (in my case, these are urinary concentration of a specific compound and a measure of cognitive function). x is measured using an analytical method which has a lower detection limit - and approximately 12% of the population have concentrations below the detection limit.



      In a first analysis, I compared participants with y above and below detection limit, and found a significant difference - which is not surprising.



      My question is: when I conduct a regression analysis for y ~ x - should I exclude all those x < detection limit, or not? It does affect the results and actually inverses the association (if I include all observations, the association is positive - if I exclude them, it is negative).










      share|cite|improve this question











      $endgroup$




      I am analysing a dataset for the relationship between an exposure variable x and a response y (in my case, these are urinary concentration of a specific compound and a measure of cognitive function). x is measured using an analytical method which has a lower detection limit - and approximately 12% of the population have concentrations below the detection limit.



      In a first analysis, I compared participants with y above and below detection limit, and found a significant difference - which is not surprising.



      My question is: when I conduct a regression analysis for y ~ x - should I exclude all those x < detection limit, or not? It does affect the results and actually inverses the association (if I include all observations, the association is positive - if I exclude them, it is negative).







      regression censoring chemometrics






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited 1 hour ago









      cbeleites

      23.2k147100




      23.2k147100










      asked 2 hours ago









      GuxGux

      8110




      8110






















          1 Answer
          1






          active

          oldest

          votes


















          4












          $begingroup$

          Don't exclude cases solely because they are below LLOQ! (lower limit of quantitation)




          • The LLOQ is not a magic hard threshold below which nothing can be said. It is rather a convention to mark the concentration where the relative error of the analyses falls below 10 %.

          • Note that LLOQ is often computed assuming homescedasticity, i.e. the absolute error being independent of the concentration. That is, you don't even assume different absolute error for cases below or above LLOQ. From that point of view, LLOQ is essentially just a way to express the absoute uncertainty of the analytical method in a concentration unit. (Like fuel economy in l/100 km vs. miles/gallon)

          • Even if analytical error is concentration dependent, two cases with true concentration almost the same but slightly below and above LLOQ have almost the same uncertainty.

          • (Left) censoring data (which is the technical term for excluding cases below LLOQ) leads to all kinds of complications in consecutive data analyses (and you'd need to use particular statistical methods that can deal with such data).

          • Say thank you to your clinical lab that they provide you with full data: I've met many people who have the opposite difficulty: getting a report that just says below LLOQ, and no possibility to recover any further information.


          Bottomline: never censor your data unless you have really, really good reasons for doing so.






          share|cite|improve this answer









          $endgroup$













          • $begingroup$
            If the analytical study is only for those with a large concentration of the compound, that is, to look at severity of effect in extreme cases, that would be a different study that would exclude all low concentrations. That might be useful, bit does not appear to be the goal here. Is this correct?
            $endgroup$
            – James Phillips
            1 hour ago










          • $begingroup$
            @JamesPhillips: IMHO that would indeed be a totally different question. And it would require that the analyte concentration can be measured with sufficient precision that the inclusion/exclusion decision is not hampered by analytical error.
            $endgroup$
            – cbeleites
            1 hour ago










          • $begingroup$
            @JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
            $endgroup$
            – cbeleites
            1 hour ago











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "65"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f388567%2fexclude-observations-with-measurements-below-limit-of-detection%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          4












          $begingroup$

          Don't exclude cases solely because they are below LLOQ! (lower limit of quantitation)




          • The LLOQ is not a magic hard threshold below which nothing can be said. It is rather a convention to mark the concentration where the relative error of the analyses falls below 10 %.

          • Note that LLOQ is often computed assuming homescedasticity, i.e. the absolute error being independent of the concentration. That is, you don't even assume different absolute error for cases below or above LLOQ. From that point of view, LLOQ is essentially just a way to express the absoute uncertainty of the analytical method in a concentration unit. (Like fuel economy in l/100 km vs. miles/gallon)

          • Even if analytical error is concentration dependent, two cases with true concentration almost the same but slightly below and above LLOQ have almost the same uncertainty.

          • (Left) censoring data (which is the technical term for excluding cases below LLOQ) leads to all kinds of complications in consecutive data analyses (and you'd need to use particular statistical methods that can deal with such data).

          • Say thank you to your clinical lab that they provide you with full data: I've met many people who have the opposite difficulty: getting a report that just says below LLOQ, and no possibility to recover any further information.


          Bottomline: never censor your data unless you have really, really good reasons for doing so.






          share|cite|improve this answer









          $endgroup$













          • $begingroup$
            If the analytical study is only for those with a large concentration of the compound, that is, to look at severity of effect in extreme cases, that would be a different study that would exclude all low concentrations. That might be useful, bit does not appear to be the goal here. Is this correct?
            $endgroup$
            – James Phillips
            1 hour ago










          • $begingroup$
            @JamesPhillips: IMHO that would indeed be a totally different question. And it would require that the analyte concentration can be measured with sufficient precision that the inclusion/exclusion decision is not hampered by analytical error.
            $endgroup$
            – cbeleites
            1 hour ago










          • $begingroup$
            @JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
            $endgroup$
            – cbeleites
            1 hour ago
















          4












          $begingroup$

          Don't exclude cases solely because they are below LLOQ! (lower limit of quantitation)




          • The LLOQ is not a magic hard threshold below which nothing can be said. It is rather a convention to mark the concentration where the relative error of the analyses falls below 10 %.

          • Note that LLOQ is often computed assuming homescedasticity, i.e. the absolute error being independent of the concentration. That is, you don't even assume different absolute error for cases below or above LLOQ. From that point of view, LLOQ is essentially just a way to express the absoute uncertainty of the analytical method in a concentration unit. (Like fuel economy in l/100 km vs. miles/gallon)

          • Even if analytical error is concentration dependent, two cases with true concentration almost the same but slightly below and above LLOQ have almost the same uncertainty.

          • (Left) censoring data (which is the technical term for excluding cases below LLOQ) leads to all kinds of complications in consecutive data analyses (and you'd need to use particular statistical methods that can deal with such data).

          • Say thank you to your clinical lab that they provide you with full data: I've met many people who have the opposite difficulty: getting a report that just says below LLOQ, and no possibility to recover any further information.


          Bottomline: never censor your data unless you have really, really good reasons for doing so.






          share|cite|improve this answer









          $endgroup$













          • $begingroup$
            If the analytical study is only for those with a large concentration of the compound, that is, to look at severity of effect in extreme cases, that would be a different study that would exclude all low concentrations. That might be useful, bit does not appear to be the goal here. Is this correct?
            $endgroup$
            – James Phillips
            1 hour ago










          • $begingroup$
            @JamesPhillips: IMHO that would indeed be a totally different question. And it would require that the analyte concentration can be measured with sufficient precision that the inclusion/exclusion decision is not hampered by analytical error.
            $endgroup$
            – cbeleites
            1 hour ago










          • $begingroup$
            @JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
            $endgroup$
            – cbeleites
            1 hour ago














          4












          4








          4





          $begingroup$

          Don't exclude cases solely because they are below LLOQ! (lower limit of quantitation)




          • The LLOQ is not a magic hard threshold below which nothing can be said. It is rather a convention to mark the concentration where the relative error of the analyses falls below 10 %.

          • Note that LLOQ is often computed assuming homescedasticity, i.e. the absolute error being independent of the concentration. That is, you don't even assume different absolute error for cases below or above LLOQ. From that point of view, LLOQ is essentially just a way to express the absoute uncertainty of the analytical method in a concentration unit. (Like fuel economy in l/100 km vs. miles/gallon)

          • Even if analytical error is concentration dependent, two cases with true concentration almost the same but slightly below and above LLOQ have almost the same uncertainty.

          • (Left) censoring data (which is the technical term for excluding cases below LLOQ) leads to all kinds of complications in consecutive data analyses (and you'd need to use particular statistical methods that can deal with such data).

          • Say thank you to your clinical lab that they provide you with full data: I've met many people who have the opposite difficulty: getting a report that just says below LLOQ, and no possibility to recover any further information.


          Bottomline: never censor your data unless you have really, really good reasons for doing so.






          share|cite|improve this answer









          $endgroup$



          Don't exclude cases solely because they are below LLOQ! (lower limit of quantitation)




          • The LLOQ is not a magic hard threshold below which nothing can be said. It is rather a convention to mark the concentration where the relative error of the analyses falls below 10 %.

          • Note that LLOQ is often computed assuming homescedasticity, i.e. the absolute error being independent of the concentration. That is, you don't even assume different absolute error for cases below or above LLOQ. From that point of view, LLOQ is essentially just a way to express the absoute uncertainty of the analytical method in a concentration unit. (Like fuel economy in l/100 km vs. miles/gallon)

          • Even if analytical error is concentration dependent, two cases with true concentration almost the same but slightly below and above LLOQ have almost the same uncertainty.

          • (Left) censoring data (which is the technical term for excluding cases below LLOQ) leads to all kinds of complications in consecutive data analyses (and you'd need to use particular statistical methods that can deal with such data).

          • Say thank you to your clinical lab that they provide you with full data: I've met many people who have the opposite difficulty: getting a report that just says below LLOQ, and no possibility to recover any further information.


          Bottomline: never censor your data unless you have really, really good reasons for doing so.







          share|cite|improve this answer












          share|cite|improve this answer



          share|cite|improve this answer










          answered 1 hour ago









          cbeleitescbeleites

          23.2k147100




          23.2k147100












          • $begingroup$
            If the analytical study is only for those with a large concentration of the compound, that is, to look at severity of effect in extreme cases, that would be a different study that would exclude all low concentrations. That might be useful, bit does not appear to be the goal here. Is this correct?
            $endgroup$
            – James Phillips
            1 hour ago










          • $begingroup$
            @JamesPhillips: IMHO that would indeed be a totally different question. And it would require that the analyte concentration can be measured with sufficient precision that the inclusion/exclusion decision is not hampered by analytical error.
            $endgroup$
            – cbeleites
            1 hour ago










          • $begingroup$
            @JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
            $endgroup$
            – cbeleites
            1 hour ago


















          • $begingroup$
            If the analytical study is only for those with a large concentration of the compound, that is, to look at severity of effect in extreme cases, that would be a different study that would exclude all low concentrations. That might be useful, bit does not appear to be the goal here. Is this correct?
            $endgroup$
            – James Phillips
            1 hour ago










          • $begingroup$
            @JamesPhillips: IMHO that would indeed be a totally different question. And it would require that the analyte concentration can be measured with sufficient precision that the inclusion/exclusion decision is not hampered by analytical error.
            $endgroup$
            – cbeleites
            1 hour ago










          • $begingroup$
            @JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
            $endgroup$
            – cbeleites
            1 hour ago
















          $begingroup$
          If the analytical study is only for those with a large concentration of the compound, that is, to look at severity of effect in extreme cases, that would be a different study that would exclude all low concentrations. That might be useful, bit does not appear to be the goal here. Is this correct?
          $endgroup$
          – James Phillips
          1 hour ago




          $begingroup$
          If the analytical study is only for those with a large concentration of the compound, that is, to look at severity of effect in extreme cases, that would be a different study that would exclude all low concentrations. That might be useful, bit does not appear to be the goal here. Is this correct?
          $endgroup$
          – James Phillips
          1 hour ago












          $begingroup$
          @JamesPhillips: IMHO that would indeed be a totally different question. And it would require that the analyte concentration can be measured with sufficient precision that the inclusion/exclusion decision is not hampered by analytical error.
          $endgroup$
          – cbeleites
          1 hour ago




          $begingroup$
          @JamesPhillips: IMHO that would indeed be a totally different question. And it would require that the analyte concentration can be measured with sufficient precision that the inclusion/exclusion decision is not hampered by analytical error.
          $endgroup$
          – cbeleites
          1 hour ago












          $begingroup$
          @JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
          $endgroup$
          – cbeleites
          1 hour ago




          $begingroup$
          @JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
          $endgroup$
          – cbeleites
          1 hour ago


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Cross Validated!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f388567%2fexclude-observations-with-measurements-below-limit-of-detection%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Aikido

          Tivadar Csontváry Kosztka

          Metroo de Marsejlo