5 Data Wrangling

In this chapter, we delve into the manipulation of data in the form of a data frame or tibble. In so doing, we will introduce the tidyverse package and the various verbs (function) it provides.

The tidyverse package is not just a single package but a composite of a group of packages. These include among others the dplyr package. Most of the function we will be employing in this chapter comes from dplyr.

We begin by reading in the blood_donors.xls

Code

df_blood <- 
    readxl::read_xls("C:/Dataset/blood_donors_1.xls")

df_blood %>% 
    head(10)

id	hb	hct	sex	bldgrp	pdonor
1	10.5	31.8	Male	O	3
2	11.9	37.2	Male	AB	0
3	1	26	Male	A	1
4	8.9	26.8	Male	A	3
5	7.8	24.2	Male	A	2
6	10	30.9	Male	B	1
7	10.4	33.9	Male	B	0
8	11.3	35	Male	O	1
9	16.4		Male	AB	1
10	14.4	43.6	Male	AB	1

The output shows we have a 25-row and 6-column tibble.

5.1 Renaming variables

Below we rename the variables hb to hemog and id to studyid using he rename function, and then show the first 5 records with the head function.

Code

df_blood %>% 
    rename(hemog = hb, studyid = id) %>% 
    head(5)

studyid	hemog	hct	sex	bldgrp	pdonor
1	10.5	31.8	Male	O	3
2	11.9	37.2	Male	AB	0
3	1	26	Male	A	1
4	8.9	26.8	Male	A	3
5	7.8	24.2	Male	A	2

5.2 Sorting data

Below we use the arrange function to sort the bldgrp in ascending order and hb by descending order.

Code

df_blood %>% 
    arrange(bldgrp, desc(hb)) %>% 
    head(10)

id	hb	hct	sex	bldgrp	pdonor
17	9.8	30.5	Female	A	4
21	9.1	28		A	3
4	8.9	26.8	Male	A	3
5	7.8	24.2	Male	A	2
3	1	26	Male	A	1
9	16.4		Male	AB	1
10	14.4	43.6	Male	AB	1
16	12.7	99	Female	AB	0
24	12.3	38.2		AB	2
14	12.2	36.8	Female	AB	1

5.3 Subsetting data

In this subsection, we demonstrate the use of the filter and select function to select specific records and variables in a tibble. Below we filter to select all records with hb > 12g/dl and keep only the id, hb and sex columns.

Code

df_blood %>% 
    filter(hb > 12) %>% 
    select(id, hb, sex)

id	hb	sex
9	16.4	Male
10	14.4	Male
14	12.2	Female
14	16.4	Female
16	12.7	Female
24	12.3

5.4 Generating new variables

To generate new variables we use the mutate function. Based on our knowledge that the hematocrit is approximately three times the haemoglobin we generate a new variable, hb_from_hct.

Code

df_blood %>% 
    mutate(hb_from_hct = hct/3) %>% 
    head(10)

id	hb	hct	sex	bldgrp	pdonor	hb_from_hct
1	10.5	31.8	Male	O	3	10.6
2	11.9	37.2	Male	AB	0	12.4
3	1	26	Male	A	1	8.67
4	8.9	26.8	Male	A	3	8.93
5	7.8	24.2	Male	A	2	8.07
6	10	30.9	Male	B	1	10.3
7	10.4	33.9	Male	B	0	11.3
8	11.3	35	Male	O	1	11.7
9	16.4		Male	AB	1
10	14.4	43.6	Male	AB	1	14.5

5.5 Aggregating data

Data can be aggregated in R using the summarize function. Below we determine the mean and standard deviation of the haemoglobin for the patient in the data.

Code

df_blood %>% 
    summarize(mean_hb = mean(hb), sd_hb = sd(hb))

mean_hb	sd_hb
11	2.89

Grouping the data by the “bldgrp” before the aggregation yields the aggregated means and standard deviations for the various blood groups.

Code

df_blood %>% 
    group_by(bldgrp) %>% 
    summarize(mean_hb = mean(hb), sd_hb = sd(hb))

bldgrp	mean_hb	sd_hb
A	7.32	3.61
AB	13.1	1.69
B	10.2	0.283
O	11	0.427
P	16.4

5.6 Reshaping data

In longitudinal studies, data is captured from the same individual repeatedly. Such data is recorded either in long or wide formats. A typical example of a data frame in the long form is bpB below.

Code

bp_long <- read_csv(
    file = "C:/Dataset/bp_long.txt",
    col_names = TRUE, 
    col_types = c("c", "c", "i"))

bp_long

id	measure	sbp
B01	sbp1	141
B01	sbp2	137
B02	sbp1	155
B02	sbp2	153
B03	sbp1	153

In this format, each visit or round of data taking is captured as a new row, but with the appropriate study ID and period of record, captured as the variable measure above. Measurement of systolic blood pressure on day 1 is indicated by sbp1 in the measure variable. Day 2 measurements are indicated as sbp2.

The wide format of the same data can be obtained as below.

Code

bp_wide <- 
    bp_long %>% 
    pivot_wider(
        id_cols = id, 
        names_from = measure, 
        values_from = sbp)

bp_wide

id	sbp1	sbp2
B01	141	137
B02	155	153
B03	153

Here, each study participant’s record for the whole study is on one row of the data and the different measurements of systolic blood pressure are captured as different variables. Next, we convert the wide back to the long format.

Code

bp_wide %>% 
    pivot_longer(
        cols = c(sbp1, sbp2),
        names_to = "time",
        values_to = "syst_bp")

id	time	syst_bp
B01	sbp1	141
B01	sbp2	137
B02	sbp1	155
B02	sbp2	153
B03	sbp1	153
B03	sbp2

5.7 Combining data

In a study to determine the change in weight of athletes running a marathon, data about the athletes were obtained by the investigators. Since the marathon starts in town A and ends in town B, the investigators decided to weigh the athletes just before starting the race. Here they took records of the ID of the athlete’s sid, sex, age and weight at the start (wgtst). The records of five of these athletes are in the data marathonA. At the end point of the marathon, another member of the investigation team recorded their IDs (eid), weight upon completion (wgtend) and the time it took the athletes to complete the marathon (dura).

Code

dataA <- 
    read_delim(
        file = "C:/Dataset/marathonA.txt",
        col_names = TRUE,
        delim = "\t",
        col_types = c("c","c","i","d"))

dataB <- 
    read_delim(
        file = "C:/Dataset/marathonB.txt",
        col_names = TRUE,
        delim = "\t",
        col_types = c("c","c","i","d"))

dataA

sid	sex	age	wgtst
C001	M	23	57.1
C002	F	27	62.3
C003	M	19	54.5
C004	M	21	59.4
C005	F	32	53.4

Code

dataB

eid	wgtend	dura
C003	53.9	189
C005	53	197
C002	62.2	201
C001	56.8	209

We can determine the weight change only by matching the before and after weight of each individual. This is where merging is very useful. Below, we merge the two data into one. This is done below.

Code

dataA %>% 
    full_join(dataB, by = join_by(sid == eid))

sid	sex	age	wgtst	wgtend	dura
C001	M	23	57.1	56.8	209
C002	F	27	62.3	62.2	201
C003	M	19	54.5	53.9	189
C004	M	21	59.4
C005	F	32	53.4	53	197

5.8 Reading in data

Code

dataF <-
    readxl::read_xlsx("C:/Dataset/SBPDATA.xlsx") %>% 
    janitor::clean_names() %>% 
    rename(
        ageyrs = a3_how_old_are_you_years,
        dxs_class = disease_class,
        gender = a1_gender
        ) %>% 
    mutate(
        dxs_class = factor(dxs_class),
        gender = factor(
            gender, 
            levels = c(0, 1), 
            labels = c("Male", "Female")))

dataF %>% select(1:5) %>% head()

sid	dxs_class	sbp_0	sbp_2	sbp_4
1	HPT	139	124	130
2	DM+HPT	155
3	HPT	109	123	109
4	HPT	130
5	HPT	124	120	146
6	DM+HPT	140	114	163

Code

dat <- 
    tribble(
        ~"name", ~"day", ~"month", ~"year", ~"bp",
        "Ama", 12, 05, 2020, "120/80",
        "Kwame", 14, 02, 2019, "132/66",
        "Akosua", 21, 12, 2010, "110/76",
        "Yaw", 13, 03, 1982, "144/98",
        "Yaa", 19, 08, 2000, "117/77")

dat

name	day	month	year	bp
Ama	12	5	2.02e+03	120/80
Kwame	14	2	2.02e+03	132/66
Akosua	21	12	2.01e+03	110/76
Yaw	13	3	1.98e+03	144/98
Yaa	19	8	2e+03	117/77

5.9 `arrange`

Code

dat %>% arrange(name, desc(day))

name	day	month	year	bp
Akosua	21	12	2.01e+03	110/76
Ama	12	5	2.02e+03	120/80
Kwame	14	2	2.02e+03	132/66
Yaa	19	8	2e+03	117/77
Yaw	13	3	1.98e+03	144/98

5.10 `unite`

Code

dat %>% 
    unite(col = "dob", c(day, month, year), sep="/")

name	dob	bp
Ama	12/5/2020	120/80
Kwame	14/2/2019	132/66
Akosua	21/12/2010	110/76
Yaw	13/3/1982	144/98
Yaa	19/8/2000	117/77

5.11 `seperate`

Code

dat %>% 
    separate(col = bp, into = c("sbp", "dbp"), sep = "/")

name	day	month	year	sbp	dbp
Ama	12	5	2.02e+03	120	80
Kwame	14	2	2.02e+03	132	66
Akosua	21	12	2.01e+03	110	76
Yaw	13	3	1.98e+03	144	98
Yaa	19	8	2e+03	117	77

Code

dat %>% 
    separate(col = bp, into = c("sbp", "dbp"), sep = "/") %>% 
    unite(col = "dob", c(day, month, year), sep="/") %>% 
    mutate(dob_new = lubridate::dmy(dob))

name	dob	sbp	dbp	dob_new
Ama	12/5/2020	120	80	2020-05-12
Kwame	14/2/2019	132	66	2019-02-14
Akosua	21/12/2010	110	76	2010-12-21
Yaw	13/3/1982	144	98	1982-03-13
Yaa	19/8/2000	117	77	2000-08-19

5.12 `relocate`

Code

dataF %>% 
    relocate(ageyrs, gender, .before = sbp_0) %>% 
    select(1:8) %>% 
    slice_head(n=10)

sid	dxs_class	ageyrs	gender	sbp_0	sbp_2	sbp_4	sbp_6
1	HPT	75	Male	139	124	130	130
2	DM+HPT	60	Male	155
3	HPT	62	Male	109	123	109	126
4	HPT	70	Male	130
5	HPT	72	Male	124	120	146	144
6	DM+HPT	56	Male	140	114	163	117
7	DM+HPT	51	Male	137	135	132	147
8	DM	73	Male	160	130
9	HPT	61	Female	153	218
10	HPT	59	Male	135	130	118	150

Code

dataF %>% 
    select(1:4) %>% 
    relocate(sid, .after = last_col()) %>% 
    slice_head(n=10)

dxs_class	sbp_0	sbp_2	sid
HPT	139	124	1
DM+HPT	155		2
HPT	109	123	3
HPT	130		4
HPT	124	120	5
DM+HPT	140	114	6
DM+HPT	137	135	7
DM	160	130	8
HPT	153	218	9
HPT	135	130	10

Code

dataF %>% 
    select(1:7) %>% 
    relocate(where(is.numeric)) %>% 
    slice_head(n=10)

sid	sbp_0	sbp_2	sbp_4	sbp_6	sbp_8	dxs_class
1	139	124	130	130	104	HPT
2	155					DM+HPT
3	109	123	109	126	108	HPT
4	130					HPT
5	124	120	146	144	157	HPT
6	140	114	163	117	124	DM+HPT
7	137	135	132	147	130	DM+HPT
8	160	130				DM
9	153	218				HPT
10	135	130	118	150		HPT

Code

dataF %>% 
    select(1:6) %>% 
    relocate(contains("sbp")) %>% 
    slice_head(n=10)

sbp_0	sbp_2	sbp_4	sbp_6	sid	dxs_class
139	124	130	130	1	HPT
155				2	DM+HPT
109	123	109	126	3	HPT
130				4	HPT
124	120	146	144	5	HPT
140	114	163	117	6	DM+HPT
137	135	132	147	7	DM+HPT
160	130			8	DM
153	218			9	HPT
135	130	118	150	10	HPT

5.13 reframe & `across`

Code

dataF %>% 
    drop_na(dxs_class) %>% 
    reframe(
        across(
            sbp_2:sbp_8, 
            list(
                "Average" = ~mean(.x, na.rm=T),
                "Std" = ~sd(.x, na.rm=T)),
            .names = "{.fn}_{.col}"), 
        .by = dxs_class)

dxs_class	Average_sbp_2	Std_sbp_2	Average_sbp_4	Std_sbp_4	Average_sbp_6	Std_sbp_6	Average_sbp_8	Std_sbp_8
HPT	140	22.9	137	22.4	137	21.5	136	21.6
DM+HPT	144	24	144	24.5	143	23.9	143	24.5
DM	128	19.5	128	20.3	128	19.5	126	19.2

Code

dataF %>% 
    na.omit() %>% 
    select(dxs_class, sbp_0:sbp_6) %>% 
    group_by(dxs_class) %>% 
    reframe(across(where(is.numeric), ~quantile(.x)))

dxs_class	sbp_0	sbp_2	sbp_4	sbp_6
DM	81	70	84	94
DM	114	116	113	114
DM	124	125	125	126
DM	134	139	138	140
DM	189	194	199	187
DM+HPT	98	81	88	70
DM+HPT	129	127	125	126
DM+HPT	142	141	142	139
DM+HPT	158	157	158	155
DM+HPT	216	231	231	234
HPT	90	71	78	88
HPT	126	124	120	120
HPT	138	135	132	131
HPT	151	150	147	146
HPT	219	221	220	209

5.14 Distinct observations

Code

dataF %>% 
    reframe(across(where(is.numeric), n_distinct))

sid	sbp_0	sbp_2	sbp_4	sbp_6	sbp_8	sbp_10	sbp_12	sbp_14	sbp_16	sbp_18	ageyrs
3296	138	141	145	138	135	134	134	133	130	127	77

5.15 list of functions

Code

dataF %>% 
    filter(!is.na(dxs_class)&!is.na(gender)) %>% 
    group_by(dxs_class, gender) %>%
    reframe(
        across(
            starts_with("sbp"), 
            list(
                AVG = mean, 
                SD = sd, 
                N_missing = ~sum(is.na(.x), na.rm=TRUE))))

dxs_class	gender	sbp_0_AVG	sbp_0_SD	sbp_0_N_missing	sbp_2_N_missing	sbp_4_N_missing	sbp_6_N_missing	sbp_8_N_missing	sbp_10_N_missing	sbp_12_N_missing	sbp_14_N_missing	sbp_16_N_missing	sbp_18_N_missing
DM	Male			1	57	70	75	74	79	79	85	94	110
DM	Female	128	19.8	0	17	23	29	30	29	32	36	50	55
DM+HPT	Male	145	23	0	84	113	119	145	153	153	186	204	247
DM+HPT	Female	147	22	0	26	36	39	45	43	55	57	68	84
HPT	Male			3	237	309	354	401	414	473	522	639	741
HPT	Female	144	20.9	0	74	116	131	147	145	165	173	216	241

5.16 Summarizing by anonymous functions

Code

dataF %>% 
    filter(!is.na(dxs_class)) %>% 
    group_by(dxs_class) %>%
    reframe(
        across(
            .cols = c(sbp_0), 
            .fns = list(
                "Mean"    = ~mean(.x, na.rm=T), 
                "UpperCI" = ~mean(
                    .x, na.rm=T) + 1.96*sd(.x, na.rm=T)/sqrt(n()) ,
                "LowerCI" = ~mean(
                    .x, na.rm=T) - 1.96*sd(.x, na.rm=T)/sqrt(n()))))

dxs_class	sbp_0_Mean	sbp_0_UpperCI	sbp_0_LowerCI
DM	126	128	124
DM+HPT	145	147	144
HPT	142	143	141

5.17 `expand`

Code

dataF %>% 
    filter(!is.na(dxs_class) & !is.na(gender)) %>% 
    expand(dxs_class, gender)

dxs_class	gender
DM	Male
DM	Female
DM+HPT	Male
DM+HPT	Female
HPT	Male
HPT	Female

5.18 `crossing`

Code

dataF %>% 
    filter(!is.na(dxs_class) & !is.na(gender)) %>% 
    select(dxs_class, gender) %>% 
    crossing()

dxs_class	gender
DM	Male
DM	Female
DM+HPT	Male
DM+HPT	Female
HPT	Male
HPT	Female

5.19 Adding a running id

Code

dataF %>% 
    filter(!is.na(dxs_class) & !is.na(gender)) %>% 
    select(dxs_class, gender) %>%
    mutate(running_id = row_number()) %>% 
    slice_head(n=10)

dxs_class	gender	running_id
HPT	Male	1
DM+HPT	Male	2
HPT	Male	3
HPT	Male	4
HPT	Male	5
DM+HPT	Male	6
DM+HPT	Male	7
DM	Male	8
HPT	Female	9
HPT	Male	10

5.20 `pivot_longer` & `pivot_wider`

Code

dataF_long <-
    dataF %>% 
    select(gender, dxs_class, sbp_0:sbp_18) %>% 
    pivot_longer(
        cols = starts_with("sbp"),
        names_to = "measure",
        values_to = "sbp",
        values_drop_na = TRUE)

dataF_long %>% 
    slice_head(n=10)

gender	dxs_class	measure	sbp
Male	HPT	sbp_0	139
Male	HPT	sbp_2	124
Male	HPT	sbp_4	130
Male	HPT	sbp_6	130
Male	HPT	sbp_8	104
Male	HPT	sbp_10	129
Male	HPT	sbp_12	80
Male	HPT	sbp_14	129
Male	HPT	sbp_16	126
Male	HPT	sbp_18	135

Code

dataF %>% 
    select(dxs_class, gender, sbp_0, sbp_4) %>%
    na.omit() %>% 
    group_by(dxs_class) %>% 
    pivot_wider(
        names_from = gender, 
        values_from = c(sbp_0, sbp_4), 
        values_fn = ~mean(.x, na.rm = TRUE)) %>% 
    ungroup()

dxs_class	sbp_0_Male	sbp_0_Female	sbp_4_Male	sbp_4_Female
HPT	141	144	136	141
DM+HPT	145	147	144	146
DM	126	130	125	135

5.21 `tidyquant` tabulation

Code

dataF %>% 
    select(dxs_class, gender) %>% 
    na.omit() %>% 
    tidyquant::pivot_table(
        .rows = gender, .columns = dxs_class, .values = ~n()
    )

gender	DM	DM+HPT	HPT
Male	308	777	1434
Female	114	226	430

Code

dataF %>% 
    select(dxs_class, gender, sbp_0, sbp_2) %>% 
    na.omit() %>% 
    tidyquant::pivot_table(
        .rows = gender, 
        .columns = dxs_class, 
        .values = ~quantile(sbp_0)) %>% 
    unnest(cols = c("DM","HPT","DM+HPT"))

gender	DM	DM+HPT	HPT
Male	90	95	70
Male	113	128	126
Male	124	142	140
Male	134	158	154
Male	182	228	224
Female	81	98	98
Female	117	131	129
Female	127	145	141
Female	138	161	159
Female	208	220	210

5.22 `rowwise` manipulations

Code

dataF %>% 
    rowwise() %>% 
    mutate(
        sbp_mean = mean(
            c(sbp_0,sbp_2,sbp_4,sbp_6,sbp_8, sbp_10, sbp_12,
              sbp_14,sbp_16, sbp_18), na.rm=T),
        sbp_sd = sd(
            c(sbp_0,sbp_2,sbp_4,sbp_6,sbp_8, sbp_10, sbp_12,
              sbp_14,sbp_16, sbp_18), na.rm=T),
        n = n()) %>% 
    ungroup() %>% 
    select(sid, dxs_class, sbp_mean, sbp_sd, 
        sbp_0:sbp_4) %>% 
    slice_head(n=10)

sid	dxs_class	sbp_mean	sbp_sd	sbp_0	sbp_2	sbp_4
1	HPT	123	17.6	139	124	130
2	DM+HPT	155		155
3	HPT	116	9.25	109	123	109
4	HPT	130		130
5	HPT	131	13.4	124	120	146
6	DM+HPT	125	16.8	140	114	163
7	DM+HPT	135	7.75	137	135	132
8	DM	145	21.2	160	130
9	HPT	196	37.5	153	218
10	HPT	133	11.5	135	130	118

5.23 `str_glue`

Code

x <- c("Ama", "is", "a", "Girl")
cat(x)

Ama is a Girl

Code

name <- "Fred"
str_glue('My name is {name}.')

My name is Fred.

Code

stringr_fcn <- "`stringr::str_glue()`"
glue_fcn    <- "`glue::glue()`"

str_glue('{stringr_fcn} is essentially an alias for {glue_fcn}.')

`stringr::str_glue()` is essentially an alias for `glue::glue()`.

Code

name <- "Fred"
age <- 50
anniversary <- as.Date("1991-10-12")
str_glue('My name is {name},',
  ' my age next year is {age + 1},',
  ' my anniversary is {format(anniversary, "%A, %B %d, %Y")}.')

My name is Fred, my age next year is 51, my anniversary is Saturday, October 12, 1991.

Code

str_glue('My name is {name},',
  ' my age next year is {age + 1},',
  ' my anniversary is {format(anniversary, "%A, %B %d, %Y")}.',
  name = "Joe",
  age = 40,
  anniversary = as.Date("2001-10-12"))

My name is Joe, my age next year is 41, my anniversary is Friday, October 12, 2001.

Code

mtcars %>% 
    head()

mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
21	6	160	110	3.9	2.62	16.5	0	1	4	4
21	6	160	110	3.9	2.88	17	0	1	4	4
22.8	4	108	93	3.85	2.32	18.6	1	1	4	1
21.4	6	258	110	3.08	3.21	19.4	1	0	3	1
18.7	8	360	175	3.15	3.44	17	0	0	3	2
18.1	6	225	105	2.76	3.46	20.2	1	0	3	1

Code

head(mtcars) %>% 
    glue::glue_data("{rownames(.)} has {hp} hp")

Mazda RX4 has 110 hp
Mazda RX4 Wag has 110 hp
Datsun 710 has 93 hp
Hornet 4 Drive has 110 hp
Hornet Sportabout has 175 hp
Valiant has 105 hp

Code

head(iris) %>%
  mutate(
      description = str_glue(
          "This {Species} has a petal length of {Petal.Length}"
          )
      )

Sepal.Length	Sepal.Width	Petal.Length	Petal.Width	Species	description
5.1	3.5	1.4	0.2	setosa	This setosa has a petal length of 1.4
4.9	3	1.4	0.2	setosa	This setosa has a petal length of 1.4
4.7	3.2	1.3	0.2	setosa	This setosa has a petal length of 1.3
4.6	3.1	1.5	0.2	setosa	This setosa has a petal length of 1.5
5	3.6	1.4	0.2	setosa	This setosa has a petal length of 1.4
5.4	3.9	1.7	0.4	setosa	This setosa has a petal length of 1.7

Code

str_glue("
    A formatted string
    Can have multiple lines
      with additional indention preserved
    ")

A formatted string
Can have multiple lines
  with additional indention preserved

Code

str_glue("

  leading or trailing newlines can be added explicitly

  ")


leading or trailing newlines can be added explicitly

Code

str_glue("
    A formatted string \\
    can also be on a \\
    single line
    ")

A formatted string can also be on a single line

Code

name <- "Fred"
str_glue("My name is {name}, not {{name}}.")

My name is Fred, not {name}.

Code

one <- "1"
str_glue(
    "The value of $e^{2\\pi i}$ is $<<one>>$.", 
    .open = "<<", 
    .close = ">>")

The value of $e^{2\pi i}$ is $1$.

Code

dataF %>% 
    filter(!is.na(sbp_0)) %>% 
    ggplot(aes(x=sbp_0)) +
    geom_histogram(col = "grey", fill = "wheat") +
    labs(title = str_glue(
        "Histogram with Mean = {mean_sbp0}mmHg and \\
         Standard Deviation = {sd_sbp0}",
        mean_sbp0 = mean(dataF$sbp_0, na.rm=T) %>% 
            round(1),
        sd_sbp0 = sd(dataF$sbp_0,   na.rm=T) %>% 
            round(1)),
         x = "Systolic Blood Pressure (mmHg)",
         y = "Frequency") +
    theme_light(base_size = 12, base_family = "serif")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

5.1 Renaming variables

5.2 Sorting data

5.3 Subsetting data

5.4 Generating new variables

5.5 Aggregating data

5.6 Reshaping data

5.7 Combining data

5.8 Reading in data

5.9 arrange

5.10 unite

5.11 seperate

5.12 relocate

5.13 reframe & across

5.14 Distinct observations

5.15 list of functions

5.16 Summarizing by anonymous functions

5.17 expand

5.18 crossing

5.19 Adding a running id

5.20 pivot_longer & pivot_wider

5.21 tidyquant tabulation

5.22 rowwise manipulations

5.23 str_glue

5.9 `arrange`

5.10 `unite`

5.11 `seperate`

5.12 `relocate`

5.13 reframe & `across`

5.17 `expand`

5.18 `crossing`

5.20 `pivot_longer` & `pivot_wider`

5.21 `tidyquant` tabulation

5.22 `rowwise` manipulations

5.23 `str_glue`