align_to_baseline: Align case tracking locations (for example) to a common...
In seandavi/sars2pack: COVID-19 data resources and analysis tools

align_to_baseline

R Documentation

Align case tracking locations (for example) to a common baseline

Description

When endeavoring to compare epidemic curves (cases vs date, for example), particularly when making graphical displays, it is helpful to set a "time baseline" that aligns where all the curves start.

Usage

align_to_baseline(df, filter_criteria, date_column = "date", group_vars)

Arguments

`df`	data.frame that includes a date column and at least one other column for filtering, typically a case count.
`filter_criteria`	an expression as would normally be specified directly to `dplyr::filter()`.
`date_column`	character(1) column name of the column for ordering the data to define a "beginning" of the curve. It is called a "date column", but anything with a natural ordering will likely work.
`group_vars`	optional character() column_name(s) that specify grouping done before calculating minimum `date`s. Concretely, if the goal is to compare several countries, then the group_vars='country' with a column in `df` called `country`.

Details

This function takes this basic approach:

Filter all all data using the filter_criteria, expressed as a dplyr::filter() expression.
Optionally group the dataset.
Find the minimum date left after applying the filter criteria
"Subtract" the minimum date (on a per group basis if grouping columns are used).

The result is a plot that shifts all the curves to start at the "same" starting time with respect to the "start" of the pandemic. For example, for the COVID-19 pandemic, China started much earlier than the rest of the world. To compare the time course of China versus other countries, setting the time to the point where each country had 100 cases allows direct comparison of the shapes of the countries' curves.

Value

A data.frame with a new column, index, that gives the number of time intervals (typically days) from when the baseline counts are first encountered, done by group.

Author(s)

Sean Davis seandavi@gmail.com

Examples

library(dplyr)
library(ggplot2)

# use European CDC dataset
ecdc = ecdc_data()
head(ecdc)
dplyr::glimpse(ecdc)

# get top 10 countries by cumulative
# number of deaths
top_10 = ecdc %>%
    dplyr::filter(subset=='deaths_weekly') %>%
    dplyr::group_by(location_name) %>%
    dplyr::summarize(deaths = max(count)) %>%
    dplyr::arrange(dplyr::desc(deaths)) %>%
    head(10)

top_10

# limit ecdc data to "deaths" and
# top 10 countries

ecdc_top10 = ecdc %>%
    dplyr::filter(location_name %in% top_10[['location_name']] & subset=='deaths_weekly')
plot_epicurve(ecdc_top10, color='location_name', case_column='count')

ecdc_top10_baseline = align_to_baseline(ecdc_top10, count>100, group_vars='location_name')

plot_epicurve(ecdc_top10_baseline, date_column='index', color='location_name') +
    ggtitle('Deaths over time, aligned to date of 100 deaths per country')

seandavi/sars2pack documentation built on May 13, 2022, 3:41 p.m.