17  ROC Analysis

We begin by importing data.

Code
df_data <- 
    readxl::read_xlsx("C:\\Dataset\\SBPDATA.xlsx") %>% 
    janitor::clean_names() %>% 
    select(
        disease_class, a1_gender, sbp_0, sbp_6, sbp_12, sbp_18) %>% 
    mutate(
        a1_gender = factor(
            a1_gender, 
            levels = c(0,1), 
            labels = c("Female","Male")),
        hpt = case_when(
            str_detect(disease_class, "HPT") ~ "Yes",
            TRUE ~ "No"),
        dm = case_when(
            str_detect(disease_class, "DM") ~ "Yes",
            TRUE ~ "No")) %>% 
    select(hpt, dm, a1_gender, sbp_0) %>% 
    drop_na()


df_data %>% head()
# A tibble: 6 × 4
  hpt   dm    a1_gender sbp_0
  <chr> <chr> <fct>     <dbl>
1 Yes   No    Female      139
2 Yes   Yes   Female      155
3 Yes   No    Female      109
4 Yes   No    Female      130
5 Yes   No    Female      124
6 Yes   Yes   Female      140

Next, we look at the distribution of systolic blood pressure for those with and without hypertension pressure.

Code
cut_one <- 
    df_data %>% 
    cutpointr::cutpointr(x = sbp_0, class = hpt)
Assuming the positive class is Yes
Assuming the positive class has higher x values
Code
cut_one %>% summary() 
Method: maximize_metric 
Predictor: sbp_0 
Outcome: hpt 
Direction: >= 

    AUC    n n_pos n_neg
 0.7383 3285  2864   421

 optimal_cutpoint sum_sens_spec    acc sensitivity specificity   tp   fn fp  tn
              137        1.3715 0.6128      0.5876      0.7838 1683 1181 91 330

Predictor summary: 
    Data Min.  5% 1st Qu. Median     Mean 3rd Qu. 95% Max.       SD NAs
 Overall   70 109     125    139 141.2003     155 180  230 22.21493   0
      No   81 100     113    124 125.8385     135 160  208 18.26231   0
     Yes   70 111     128    141 143.4584     157 181  230 21.84820   0

We then visualise the ROC curve.

Code
cut_one %>% 
    cutpointr::plot_roc() +
    theme_bw() 

We can visualise the sensitivity _ specificity at all the cut-offs below.

Code
cut_one %>% 
    cutpointr::plot_metric(add_unsmoothed = T) +
    theme_bw()

The analysis below can show multiple cut-off points, whereas we have only one here.

Code
cutoff_2 <- 
    cutpointr::cutpointr(
        data = df_data,
        x = sbp_0,
        class = dm,
        method = cutpointr::maximize_metric,
        metric = cutpointr::sum_sens_spec,
        break_ties = c) 
Assuming the positive class is No
Assuming the positive class has higher x values
Code
summary(cutoff_2)
Method: maximize_metric 
Predictor: sbp_0 
Outcome: dm 
Direction: >= 

    AUC    n n_pos n_neg
 0.5432 3285  1861  1424

 optimal_cutpoint sum_sens_spec    acc sensitivity specificity   tp  fn  fp  tn
              128        1.0789 0.5668      0.7453      0.3336 1387 474 949 475

Predictor summary: 
    Data Min.  5% 1st Qu. Median     Mean 3rd Qu. 95% Max.       SD NAs
 Overall   70 109     125    139 141.2003     155 180  230 22.21493   0
      No   70 111     127    140 142.4562     156 180  230 21.29190   0
     Yes   81 106     123    136 139.5590     155 181  228 23.27191   0
Code
cutpointr::plot_roc(cutoff_2)

The confidence interval of the cutoff can be determined by bootstrapping as below:

Code
set.seed(999)
cutoff_3 <- 
    cutpointr::cutpointr(
        data = df_data,
        x = sbp_0,
        class = dm,
        boot_runs = 500) 
Assuming the positive class is No
Assuming the positive class has higher x values
Running bootstrap...
Code
summary(cutoff_3)
Method: maximize_metric 
Predictor: sbp_0 
Outcome: dm 
Direction: >= 
Nr. of bootstraps: 500 

    AUC    n n_pos n_neg
 0.5432 3285  1861  1424

 optimal_cutpoint sum_sens_spec    acc sensitivity specificity   tp  fn  fp  tn
              128        1.0789 0.5668      0.7453      0.3336 1387 474 949 475

Predictor summary: 
    Data Min.  5% 1st Qu. Median     Mean 3rd Qu. 95% Max.       SD NAs
 Overall   70 109     125    139 141.2003     155 180  230 22.21493   0
      No   70 111     127    140 142.4562     156 180  230 21.29190   0
     Yes   81 106     123    136 139.5590     155 181  228 23.27191   0

Bootstrap summary: 
          Variable   Min.     5% 1st Qu. Median   Mean 3rd Qu.    95%   Max.
  optimal_cutpoint 112.00 125.00  128.00 132.00 131.36  135.62 139.00 147.00
             AUC_b   0.51   0.53    0.54   0.54   0.54    0.55   0.56   0.58
           AUC_oob   0.49   0.52    0.53   0.54   0.54    0.55   0.57   0.58
   sum_sens_spec_b   1.05   1.07    1.08   1.09   1.09    1.10   1.11   1.14
 sum_sens_spec_oob   1.00   1.03    1.05   1.06   1.06    1.08   1.10   1.13
             acc_b   0.52   0.54    0.55   0.56   0.56    0.57   0.58   0.60
           acc_oob   0.48   0.52    0.54   0.55   0.55    0.56   0.58   0.60
     sensitivity_b   0.41   0.55    0.61   0.68   0.68    0.76   0.81   0.96
   sensitivity_oob   0.37   0.53    0.59   0.67   0.67    0.75   0.80   0.94
     specificity_b   0.10   0.27    0.33   0.41   0.40    0.49   0.54   0.66
   specificity_oob   0.10   0.25    0.32   0.39   0.39    0.47   0.53   0.64
    cohens_kappa_b   0.05   0.07    0.08   0.09   0.09    0.10   0.11   0.14
  cohens_kappa_oob   0.00   0.03    0.05   0.07   0.06    0.08   0.10   0.14
   SD NAs
 5.30   0
 0.01   0
 0.01   0
 0.01   0
 0.02   0
 0.01   0
 0.02   0
 0.10   0
 0.10   0
 0.10   0
 0.10   0
 0.01   0
 0.02   0
Code
cutpointr::plot_metric(cutoff_3)

Code
cutpointr::plot_cut_boot(cutoff_3)

Two different cutoffs could mean clustering. We, therefore, run cutoffs by sex to see

Code
set.seed(999)
cutoff_4 <- 
    cutpointr::cutpointr(
        data = df_data,
        x = sbp_0,
        class = dm,
        boot_runs = 500,
        subgroup = a1_gender) 
Assuming the positive class is No
Assuming the positive class has higher x values
Running bootstrap...

We then summarise it.

Code
cutoff_4 %>% summary()
Method: maximize_metric 
Predictor: sbp_0 
Outcome: dm 
Direction: >= 
Subgroups: Female, Male 
Nr. of bootstraps: 500 

Subgroup: Female 
-------------------------------------------------------------------------------- 
    AUC    n n_pos n_neg
 0.5421 2515  1431  1084

 optimal_cutpoint sum_sens_spec    acc sensitivity specificity   tp  fn  fp  tn
              130        1.0794 0.5622       0.703      0.3764 1006 425 676 408

Predictor summary: 
    Data Min.    5% 1st Qu. Median     Mean 3rd Qu. 95% Max.       SD NAs
 Overall   70 108.0     125    139 140.7869     154 180  230 22.30675   0
      No   70 110.5     127    140 141.9958     155 180  230 21.40570   0
     Yes   89 105.0     123    136 139.1910     154 180  228 23.35758   0

Bootstrap summary: 
          Variable   Min.     5% 1st Qu. Median   Mean 3rd Qu.    95%   Max.
  optimal_cutpoint 112.00 117.00  126.00 130.00 129.33  133.00 140.00 146.00
             AUC_b   0.51   0.52    0.53   0.54   0.54    0.55   0.56   0.58
           AUC_oob   0.50   0.52    0.53   0.54   0.54    0.55   0.57   0.59
   sum_sens_spec_b   1.05   1.06    1.08   1.09   1.09    1.10   1.11   1.15
 sum_sens_spec_oob   0.99   1.02    1.04   1.06   1.06    1.08   1.10   1.13
             acc_b   0.52   0.54    0.56   0.57   0.57    0.58   0.59   0.61
           acc_oob   0.48   0.51    0.54   0.56   0.55    0.57   0.58   0.60
     sensitivity_b   0.42   0.53    0.65   0.72   0.71    0.78   0.90   0.95
   sensitivity_oob   0.38   0.51    0.62   0.71   0.70    0.76   0.89   0.96
     specificity_b   0.11   0.18    0.31   0.37   0.38    0.45   0.56   0.66
   specificity_oob   0.09   0.16    0.30   0.36   0.36    0.43   0.54   0.65
    cohens_kappa_b   0.05   0.06    0.08   0.09   0.09    0.10   0.12   0.15
  cohens_kappa_oob  -0.01   0.02    0.04   0.06   0.06    0.08   0.10   0.14
   SD NAs
 6.30   0
 0.01   0
 0.02   0
 0.02   0
 0.02   0
 0.02   0
 0.02   0
 0.11   0
 0.11   0
 0.11   0
 0.11   0
 0.02   0
 0.02   0

Subgroup: Male 
-------------------------------------------------------------------------------- 
    AUC   n n_pos n_neg
 0.5486 770   430   340

 optimal_cutpoint sum_sens_spec   acc sensitivity specificity  tp  fn  fp  tn
              136        1.1056 0.561      0.6233      0.4824 268 162 176 164

Predictor summary: 
    Data Min.  5% 1st Qu. Median     Mean 3rd Qu.    95% Max.       SD NAs
 Overall   81 110     127    139 142.5506     157 182.00  220 21.87227   0
      No   88 113     129    141 143.9884     159 181.10  210 20.86011   0
     Yes   81 108     125    136 140.7324     157 182.05  220 22.99138   0

Bootstrap summary: 
          Variable   Min.     5% 1st Qu. Median   Mean 3rd Qu.    95%   Max.
  optimal_cutpoint 113.00 124.00  134.00 136.00 134.62  138.00 139.00 163.00
             AUC_b   0.49   0.52    0.53   0.55   0.55    0.56   0.58   0.62
           AUC_oob   0.46   0.50    0.53   0.55   0.55    0.57   0.60   0.63
   sum_sens_spec_b   1.04   1.07    1.09   1.12   1.12    1.14   1.18   1.22
 sum_sens_spec_oob   0.93   1.00    1.04   1.08   1.08    1.11   1.15   1.20
             acc_b   0.48   0.54    0.56   0.57   0.57    0.58   0.60   0.63
           acc_oob   0.45   0.51    0.53   0.55   0.55    0.57   0.59   0.63
     sensitivity_b   0.24   0.55    0.59   0.63   0.65    0.69   0.85   0.96
   sensitivity_oob   0.15   0.52    0.57   0.62   0.63    0.67   0.84   0.96
     specificity_b   0.11   0.24    0.43   0.49   0.46    0.53   0.57   0.87
   specificity_oob   0.08   0.22    0.40   0.47   0.44    0.51   0.56   0.79
    cohens_kappa_b   0.05   0.07    0.09   0.12   0.12    0.14   0.18   0.22
  cohens_kappa_oob  -0.08   0.00    0.05   0.08   0.08    0.11   0.15   0.20
   SD NAs
 4.89   0
 0.02   0
 0.03   0
 0.03   0
 0.04   0
 0.02   0
 0.02   0
 0.09   0
 0.10   0
 0.10   0
 0.10   0
 0.03   0
 0.05   0

And then plot it

Code
cutpointr::plot_roc(cutoff_4)

And the we determine the cut-offs

Code
cutpointr::plot_metric(cutoff_4)

Multiple plot is a single command

Code
cutoff_4 %>% 
    cutpointr::plot_sensitivity_specificity()

And even more

Code
cutoff_4 %>% 
    cutpointr::plot_precision_recall()

Code
cutoff_4 %>% plot()