id treat age sex bp1
Length:50 Control:22 Min. :45.00 F:26 Min. : 87.50
Class :character Newdrug:28 1st Qu.:57.25 M:24 1st Qu.: 95.62
Mode :character Median :63.00 Median : 97.70
Mean :61.48 Mean : 98.30
3rd Qu.:65.00 3rd Qu.: 99.40
Max. :75.00 Max. :111.70
bp2 bpdiff
Min. :78.00 Min. : 0.500
1st Qu.:85.22 1st Qu.: 4.800
Median :88.15 Median : 8.250
Mean :88.60 Mean : 9.704
3rd Qu.:92.10 3rd Qu.:13.700
Max. :99.70 Max. :26.300
8 Descriptive Statistics: Continuous
The initial analysis of numeric data is usually a description of the data at hand without making inference to the population from which the data was drawn. This gives the data analyst a general overview of the data at hand, how best to describe it and what analysis best suits it. In descriptive analysis of numeric data the most basic is to determine the:
- Measure of Central Tendency: This is a description of the center of the data. These measures include mean, median and mode.
- Measure of Dispersion: A measure of how widespread the data is. These include standard deviation, variance, interquartile range and range.
For this section, we will use the NewDrug_clean.dta dataset
8.1 Single continuous variable
8.1.1 Measures of Central Tendency & Dispersion
Below we determine the mean, median, standard deviation, range (minimum, maximum) and interquartile range of out initial blood pressure
Code
newdrug %>%
summarise(
Mean = mean(bp1),
Median = median(bp1),
Standard_Dev = sd(bp1),
Minimum = min(bp1),
Maximum = max(bp1),
IQR = IQR(bp1)) # A tibble: 1 × 6
Mean Median Standard_Dev Minimum Maximum IQR
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 98.3 97.7 5.17 87.5 112. 3.78
Alternatively, the psych package gives these measures in further details. The output includes a measure of the Kurtosis and Skewness, both describing the shape of the data.
Code
newdrug %$%
psych::describe(bp1) vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 50 98.3 5.17 97.7 97.89 2.97 87.5 111.7 24.2 0.7 0.62 0.73
And to show the interquartile range we do the following.
Code
newdrug %$%
psych::describe(bp1, IQR = TRUE,quant = c(.25, .75)) vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 50 98.3 5.17 97.7 97.89 2.97 87.5 111.7 24.2 0.7 0.62 0.73
IQR Q0.25 Q0.75
X1 3.78 95.62 99.4
8.1.2 Graphs - Histogram
Code
newdrug %>%
ggplot(aes(x = bp1)) +
geom_histogram(bins = 7, col="black", alpha = .5, fill = "red") +
labs(
title = "Histogram of Blood Pressure before intervention",
x= "BP1")+
theme_light()
8.1.3 Graphs - Boxplot and violin plot
Code
newdrug %>%
ggplot(aes(y = bp1)) +
geom_boxplot(col="black",
alpha = .2,
fill = "blue",
outlier.fill = "black",
outlier.shape = 22) +
labs(
title = "Boxplot of Blood Pressure before intervention",
y = "BP1")+
theme_light()
8.1.3.1 Graphs - Density plot
Code
newdrug %>%
ggplot(aes(y = bp1)) +
geom_density(col="black", fill = "yellow", alpha=.6) +
labs(
title = "Density Plot of Blood Pressure before intervention",
y = "Blood Pressure before intervention")+
coord_flip() +
theme_light()
8.1.3.2 Graphs - Cumulative Frequency plot
Code
newdrug %>%
group_by(bp1) %>%
summarize(n = n()) %>%
ungroup() %>%
mutate(cum = cumsum(n)/sum(n)*100) %>%
ggplot(aes(y = cum, x = bp1)) +
geom_line(col=3, linewidth=1.2)+
labs(
title = "Cumulative Frequency Plot of Blood Pressure before intervention",
x = "BP1",
y = "Cumulative Frequency")+
theme_light() 
8.1.4 Multiple Continuous variables
8.1.4.1 Measures of Central tendency & Dispersion
Code
newdrug %>%
select(where(is.numeric)) %>%
psych::describe() vars n mean sd median trimmed mad min max range skew kurtosis
age 1 50 61.48 6.51 63.00 61.98 4.45 45.0 75.0 30.0 -0.60 0.16
bp1 2 50 98.30 5.17 97.70 97.89 2.97 87.5 111.7 24.2 0.70 0.62
bp2 3 50 88.60 4.56 88.15 88.46 4.52 78.0 99.7 21.7 0.25 -0.24
bpdiff 4 50 9.70 6.20 8.25 8.95 5.49 0.5 26.3 25.8 0.93 0.24
se
age 0.92
bp1 0.73
bp2 0.65
bpdiff 0.88
To illustrate graphing multiple continuous variables we use the 2 bp variables
Code
bps <-
newdrug %>%
select(bp1, bp2) %>%
pivot_longer(
cols = c(bp1, bp2),
names_to = "measure",
values_to = "bp") %>%
mutate(
measure = fct_recode(
measure,
"Pre-Treatment" = "bp1",
"Post-Treatment" = "bp2"))Next, we create multiple density plots
Code
bps %>%
ggplot(aes(y = measure, x = bp, fill = measure)) +
ggridges::geom_density_ridges2( col="black", alpha = .5, scale=1,
show.legend = F) +
labs(
x = "Blood pressure (mmHg)",
y = "Density",
fill = "Blood Pressure") +
theme_bw()Picking joint bandwidth of 1.52

Code
bps %>%
ggplot(aes(y = measure, x = bp, fill = measure))+
geom_boxplot(show.legend = FALSE) +
labs(y = NULL,
x = "Blood Pressure",
fill = "Blood Pressure") +
coord_flip()+
theme_light() 
Code
bps %>%
ggplot(aes(y = measure, x = bp, fill = measure))+
geom_violin(show.legend = FALSE) +
coord_flip()+
theme_light() 
8.2 Continuous by a single categorical variable
8.2.1 Summary
We do this with one variable.
Code
newdrug %>%
group_by(treat) %>%
summarize(
mean.bp1 = mean(bp1),
sd.bp1 = sd(bp1),
var.bp1 = var(bp1),
se.mean.bp1 = sd(bp1)/sqrt(n()),
median.bp1 = median(bp1),
min.bp1 = min(bp1),
max.bp1 = max(bp1)) %>%
ungroup()# A tibble: 2 × 8
treat mean.bp1 sd.bp1 var.bp1 se.mean.bp1 median.bp1 min.bp1 max.bp1
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Control 97.1 3.56 12.7 0.760 97.4 89.8 103.
2 Newdrug 99.2 6.05 36.6 1.14 98.2 87.5 112.
8.2.2 Graph - Histogram, Boxplot, Density plot and cumulative frequency
The graphs are similar to the above so we skip them.
8.3 Continuous by multiple categorical variables
8.3.1 Summary
This can be done as below.
Code
newdrug %>%
group_by(treat, sex) %>%
summarize(
mean.bp1 = mean(bp1),
sd.bp1 = sd(bp1),
var.bp1 = var(bp1),
se.mean.bp1 = sd(bp1)/sqrt(n()),
median.bp1 = median(bp1),
min.bp1 = min(bp1),
max.bp1 = max(bp1),
.groups = "drop") # A tibble: 4 × 9
treat sex mean.bp1 sd.bp1 var.bp1 se.mean.bp1 median.bp1 min.bp1 max.bp1
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Control F 97.2 3.82 14.6 1.15 97.4 90.1 103.
2 Control M 97.0 3.47 12.1 1.05 97.5 89.8 102.
3 Newdrug F 98.6 6.01 36.1 1.55 98.4 87.5 112.
4 Newdrug M 100. 6.25 39.1 1.73 98.1 91.7 112.
And this can be presented in a boxplot below
Code
newdrug %>%
ggplot(aes(y = bp1, x = sex, fill = treat)) +
geom_boxplot()+
labs(
y = "Blood Pressure (mmHg)",
x = "Sex",
fill = 'Treatment') +
theme_bw()