View source: R/align_to_baseline.R
align_to_baseline | R Documentation |
When endeavoring to compare epidemic curves (cases vs date, for example), particularly when making graphical displays, it is helpful to set a "time baseline" that aligns where all the curves start.
align_to_baseline(df, filter_criteria, date_column = "date", group_vars)
df |
data.frame that includes a date column and at least one other column for filtering, typically a case count. |
filter_criteria |
an expression as would normally be specified
directly to |
date_column |
character(1) column name of the column for ordering the data to define a "beginning" of the curve. It is called a "date column", but anything with a natural ordering will likely work. |
group_vars |
optional character() column_name(s) that specify
grouping done before calculating minimum |
This function takes this basic approach:
Filter all all data using the filter_criteria
, expressed as a
dplyr::filter()
expression.
Optionally group the dataset.
Find the minimum date left after applying the filter criteria
"Subtract" the minimum date (on a per group basis if grouping columns are used).
The result is a plot that shifts all the curves to start at the "same" starting time with respect to the "start" of the pandemic. For example, for the COVID-19 pandemic, China started much earlier than the rest of the world. To compare the time course of China versus other countries, setting the time to the point where each country had 100 cases allows direct comparison of the shapes of the countries' curves.
A data.frame with a new column, index
, that gives the
number of time intervals (typically days) from when the
baseline counts are first encountered, done by group.
Sean Davis seandavi@gmail.com
Other case-tracking:
beoutbreakprepared_data()
,
bulk_estimate_Rt()
,
combined_us_cases_data()
,
coronadatascraper_data()
,
covidtracker_data()
,
ecdc_data()
,
estimate_Rt()
,
jhu_data()
,
nytimes_county_data()
,
owid_data()
,
plot_epicurve()
,
test_and_trace_data()
,
usa_facts_data()
,
who_cases()
Other plotting:
plot_epicurve()
library(dplyr) library(ggplot2) # use European CDC dataset ecdc = ecdc_data() head(ecdc) dplyr::glimpse(ecdc) # get top 10 countries by cumulative # number of deaths top_10 = ecdc %>% dplyr::filter(subset=='deaths_weekly') %>% dplyr::group_by(location_name) %>% dplyr::summarize(deaths = max(count)) %>% dplyr::arrange(dplyr::desc(deaths)) %>% head(10) top_10 # limit ecdc data to "deaths" and # top 10 countries ecdc_top10 = ecdc %>% dplyr::filter(location_name %in% top_10[['location_name']] & subset=='deaths_weekly') plot_epicurve(ecdc_top10, color='location_name', case_column='count') ecdc_top10_baseline = align_to_baseline(ecdc_top10, count>100, group_vars='location_name') plot_epicurve(ecdc_top10_baseline, date_column='index', color='location_name') + ggtitle('Deaths over time, aligned to date of 100 deaths per country')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.