What is metrics.roc_curve and metrics.auc measuring when I'm comparing binary data with probability estimates?Fastest way to compare ROC curvesLogistic regression: maximizing true positives - false positivesHow to compute the AUROC for a single categorical variableWhat is the effect of training a model on an imbalanced dataset & using it on a balanced dataset?Is sensitivity, specificity and g-mean considered as “point-wise” metricsHow to improve F1 score with skewed classes?Using “accuracy” as a measure of performance for logistic regressionHow to determine if the predicted probabilities from sklearn logistic regresssion are accurate?Bootstrapping for imbalanced and small sample sized dataDoes a low Area Under Curve (AUC) for ROC imply worthless classifier?

Salesforce bug enabled "Modify All"

How to tease a romance without a cat and mouse chase?

What does it mean to "take the Cross"

Why use nominative in Coniugatio periphrastica passiva?

Do most Taxis give Receipts in London?

How to safely discharge oneself

Can dirty bird feeders make birds sick?

400–430 degrees Celsius heated bath

Is being an extrovert a necessary condition to be a manager?

How can I prevent Bash expansion from passing files starting with "-" as argument?

What should I wear to go and sign an employment contract?

On a piano, are the effects of holding notes and the sustain pedal the same for a single chord?

Story about encounter with hostile aliens

Why is こと used in 「私に何かできること」?

Why is this python script running in background consuming 100 % CPU?

Was Tyrion always a poor strategist?

tikz: 5 squares on a row, roman numbered 1 -> 5

Why was Harry at the Weasley's at the beginning of Goblet of Fire but at the Dursleys' after?

What city and town structures are important in a low fantasy medieval world?

Does a windmilling propeller create more drag than a stopped propeller in an engine out scenario?

Simple Arithmetic Puzzle 7. Or is it?

How did Arya and the Hound get into King's Landing so easily?

Connecting circles clockwise in TikZ

Why did Nick Fury not hesitate in blowing up the plane he thought was carrying a nuke?



What is metrics.roc_curve and metrics.auc measuring when I'm comparing binary data with probability estimates?


Fastest way to compare ROC curvesLogistic regression: maximizing true positives - false positivesHow to compute the AUROC for a single categorical variableWhat is the effect of training a model on an imbalanced dataset & using it on a balanced dataset?Is sensitivity, specificity and g-mean considered as “point-wise” metricsHow to improve F1 score with skewed classes?Using “accuracy” as a measure of performance for logistic regressionHow to determine if the predicted probabilities from sklearn logistic regresssion are accurate?Bootstrapping for imbalanced and small sample sized dataDoes a low Area Under Curve (AUC) for ROC imply worthless classifier?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1












$begingroup$


I was working on a challenge, and I was excited because the metric.auc for my predicted values compared to my test values was very high. This was for a binary selection process.



However, when I looked at it, my predicted values outputted by logistic regression were actually probabilities, not binary values.



So I rounded them, as the challenge requires binary predictions. When I rounded them, the auc score dropped drastically.



My understanding of the auc score and roc curve is that it compares false positives/negatives etc., and I don't even know how it came up with an actual value for these probabilistic predictions.



What was it computing before, and why was it so high?










share|cite|improve this question







New contributor



Brian Rushton is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$


















    1












    $begingroup$


    I was working on a challenge, and I was excited because the metric.auc for my predicted values compared to my test values was very high. This was for a binary selection process.



    However, when I looked at it, my predicted values outputted by logistic regression were actually probabilities, not binary values.



    So I rounded them, as the challenge requires binary predictions. When I rounded them, the auc score dropped drastically.



    My understanding of the auc score and roc curve is that it compares false positives/negatives etc., and I don't even know how it came up with an actual value for these probabilistic predictions.



    What was it computing before, and why was it so high?










    share|cite|improve this question







    New contributor



    Brian Rushton is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$














      1












      1








      1





      $begingroup$


      I was working on a challenge, and I was excited because the metric.auc for my predicted values compared to my test values was very high. This was for a binary selection process.



      However, when I looked at it, my predicted values outputted by logistic regression were actually probabilities, not binary values.



      So I rounded them, as the challenge requires binary predictions. When I rounded them, the auc score dropped drastically.



      My understanding of the auc score and roc curve is that it compares false positives/negatives etc., and I don't even know how it came up with an actual value for these probabilistic predictions.



      What was it computing before, and why was it so high?










      share|cite|improve this question







      New contributor



      Brian Rushton is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$




      I was working on a challenge, and I was excited because the metric.auc for my predicted values compared to my test values was very high. This was for a binary selection process.



      However, when I looked at it, my predicted values outputted by logistic regression were actually probabilities, not binary values.



      So I rounded them, as the challenge requires binary predictions. When I rounded them, the auc score dropped drastically.



      My understanding of the auc score and roc curve is that it compares false positives/negatives etc., and I don't even know how it came up with an actual value for these probabilistic predictions.



      What was it computing before, and why was it so high?







      logistic python auc






      share|cite|improve this question







      New contributor



      Brian Rushton is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.










      share|cite|improve this question







      New contributor



      Brian Rushton is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      share|cite|improve this question




      share|cite|improve this question






      New contributor



      Brian Rushton is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      asked 3 hours ago









      Brian RushtonBrian Rushton

      82




      82




      New contributor



      Brian Rushton is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




      New contributor




      Brian Rushton is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          1 Answer
          1






          active

          oldest

          votes


















          2












          $begingroup$

          When you round up/down the predicted probabilities, you are essentially using 0.5 as a threshold for your classification. ROC curves do this not for one but for every possible threshold. The false positive rates and true positive rates are then plotted as roc curve (with the integral being the auc).



          If the challenge requires you to provide binary predictions, they are unlikely to use AUC as performance measure.






          share|cite|improve this answer











          $endgroup$








          • 1




            $begingroup$
            Congrats on hitting 1000!
            $endgroup$
            – Matthew Drury
            3 hours ago






          • 1




            $begingroup$
            The $c$-index (concordance probability; area under ROC curve) is a decent pure measure of predictive discrimination when computed on the continuous probabilities and the binary outcomes. But proper accuracy scoring rules in this case are the Brier score and the pseudo $R^2$, which are more sensitive because they give more credit to extreme probabilities that are "right".
            $endgroup$
            – Frank Harrell
            2 hours ago











          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "65"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );






          Brian Rushton is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f408976%2fwhat-is-metrics-roc-curve-and-metrics-auc-measuring-when-im-comparing-binary-da%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2












          $begingroup$

          When you round up/down the predicted probabilities, you are essentially using 0.5 as a threshold for your classification. ROC curves do this not for one but for every possible threshold. The false positive rates and true positive rates are then plotted as roc curve (with the integral being the auc).



          If the challenge requires you to provide binary predictions, they are unlikely to use AUC as performance measure.






          share|cite|improve this answer











          $endgroup$








          • 1




            $begingroup$
            Congrats on hitting 1000!
            $endgroup$
            – Matthew Drury
            3 hours ago






          • 1




            $begingroup$
            The $c$-index (concordance probability; area under ROC curve) is a decent pure measure of predictive discrimination when computed on the continuous probabilities and the binary outcomes. But proper accuracy scoring rules in this case are the Brier score and the pseudo $R^2$, which are more sensitive because they give more credit to extreme probabilities that are "right".
            $endgroup$
            – Frank Harrell
            2 hours ago















          2












          $begingroup$

          When you round up/down the predicted probabilities, you are essentially using 0.5 as a threshold for your classification. ROC curves do this not for one but for every possible threshold. The false positive rates and true positive rates are then plotted as roc curve (with the integral being the auc).



          If the challenge requires you to provide binary predictions, they are unlikely to use AUC as performance measure.






          share|cite|improve this answer











          $endgroup$








          • 1




            $begingroup$
            Congrats on hitting 1000!
            $endgroup$
            – Matthew Drury
            3 hours ago






          • 1




            $begingroup$
            The $c$-index (concordance probability; area under ROC curve) is a decent pure measure of predictive discrimination when computed on the continuous probabilities and the binary outcomes. But proper accuracy scoring rules in this case are the Brier score and the pseudo $R^2$, which are more sensitive because they give more credit to extreme probabilities that are "right".
            $endgroup$
            – Frank Harrell
            2 hours ago













          2












          2








          2





          $begingroup$

          When you round up/down the predicted probabilities, you are essentially using 0.5 as a threshold for your classification. ROC curves do this not for one but for every possible threshold. The false positive rates and true positive rates are then plotted as roc curve (with the integral being the auc).



          If the challenge requires you to provide binary predictions, they are unlikely to use AUC as performance measure.






          share|cite|improve this answer











          $endgroup$



          When you round up/down the predicted probabilities, you are essentially using 0.5 as a threshold for your classification. ROC curves do this not for one but for every possible threshold. The false positive rates and true positive rates are then plotted as roc curve (with the integral being the auc).



          If the challenge requires you to provide binary predictions, they are unlikely to use AUC as performance measure.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited 3 hours ago

























          answered 3 hours ago









          lnathanlnathan

          1,0281523




          1,0281523







          • 1




            $begingroup$
            Congrats on hitting 1000!
            $endgroup$
            – Matthew Drury
            3 hours ago






          • 1




            $begingroup$
            The $c$-index (concordance probability; area under ROC curve) is a decent pure measure of predictive discrimination when computed on the continuous probabilities and the binary outcomes. But proper accuracy scoring rules in this case are the Brier score and the pseudo $R^2$, which are more sensitive because they give more credit to extreme probabilities that are "right".
            $endgroup$
            – Frank Harrell
            2 hours ago












          • 1




            $begingroup$
            Congrats on hitting 1000!
            $endgroup$
            – Matthew Drury
            3 hours ago






          • 1




            $begingroup$
            The $c$-index (concordance probability; area under ROC curve) is a decent pure measure of predictive discrimination when computed on the continuous probabilities and the binary outcomes. But proper accuracy scoring rules in this case are the Brier score and the pseudo $R^2$, which are more sensitive because they give more credit to extreme probabilities that are "right".
            $endgroup$
            – Frank Harrell
            2 hours ago







          1




          1




          $begingroup$
          Congrats on hitting 1000!
          $endgroup$
          – Matthew Drury
          3 hours ago




          $begingroup$
          Congrats on hitting 1000!
          $endgroup$
          – Matthew Drury
          3 hours ago




          1




          1




          $begingroup$
          The $c$-index (concordance probability; area under ROC curve) is a decent pure measure of predictive discrimination when computed on the continuous probabilities and the binary outcomes. But proper accuracy scoring rules in this case are the Brier score and the pseudo $R^2$, which are more sensitive because they give more credit to extreme probabilities that are "right".
          $endgroup$
          – Frank Harrell
          2 hours ago




          $begingroup$
          The $c$-index (concordance probability; area under ROC curve) is a decent pure measure of predictive discrimination when computed on the continuous probabilities and the binary outcomes. But proper accuracy scoring rules in this case are the Brier score and the pseudo $R^2$, which are more sensitive because they give more credit to extreme probabilities that are "right".
          $endgroup$
          – Frank Harrell
          2 hours ago










          Brian Rushton is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          Brian Rushton is a new contributor. Be nice, and check out our Code of Conduct.












          Brian Rushton is a new contributor. Be nice, and check out our Code of Conduct.











          Brian Rushton is a new contributor. Be nice, and check out our Code of Conduct.














          Thanks for contributing an answer to Cross Validated!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f408976%2fwhat-is-metrics-roc-curve-and-metrics-auc-measuring-when-im-comparing-binary-da%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Log på Navigationsmenu

          Creating second map without labels using QGIS?How to lock map labels for inset map in Print Composer?How to Force the Showing of Labels of a Vector File in QGISQGIS Valmiera, Labels only show for part of polygonsRemoving duplicate point labels in QGISLabeling every feature using QGIS?Show labels for point features outside map canvasAbbreviate Road Labels in QGIS only when requiredExporting map from composer in QGIS - text labels have moved in output?How to make sure labels in qgis turn up in layout map?Writing label expression with ArcMap and If then Statement?

          Nuuk Indholdsfortegnelse Etyomologi | Historie | Geografi | Transport og infrastruktur | Politik og administration | Uddannelsesinstitutioner | Kultur | Venskabsbyer | Noter | Eksterne henvisninger | Se også | Navigationsmenuwww.sermersooq.gl64°10′N 51°45′V / 64.167°N 51.750°V / 64.167; -51.75064°10′N 51°45′V / 64.167°N 51.750°V / 64.167; -51.750DMI - KlimanormalerSalmonsen, s. 850Grønlands Naturinstitut undersøger rensdyr i Akia og Maniitsoq foråret 2008Grønlands NaturinstitutNy vej til Qinngorput indviet i dagAntallet af biler i Nuuk må begrænsesNy taxacentral mødt med demonstrationKøreplan. Rute 1, 2 og 3SnescootersporNuukNord er for storSkoler i Kommuneqarfik SermersooqAtuarfik Samuel KleinschmidtKangillinguit AtuarfiatNuussuup AtuarfiaNuuk Internationale FriskoleIlinniarfissuaq, Grønlands SeminariumLedelseÅrsberetning for 2008Kunst og arkitekturÅrsberetning for 2008Julie om naturenNuuk KunstmuseumSilamiutGrønlands Nationalmuseum og ArkivStatistisk ÅrbogGrønlands LandsbibliotekStore koncerter på stribeVandhund nummer 1.000.000Kommuneqarfik Sermersooq – MalikForsidenVenskabsbyerLyngby-Taarbæk i GrønlandArctic Business NetworkWinter Cities 2008 i NuukDagligt opdaterede satellitbilleder fra NuukområdetKommuneqarfik Sermersooqs hjemmesideTurist i NuukGrønlands Statistiks databankGrønlands Hjemmestyres valgresultaterrrWorldCat124325457671310-5