add_incidence_column: Add daily incidence to cumulative case counts data.frame

View source: R/add_incidence_column.R

add_incidence_columnR Documentation

Add daily incidence to cumulative case counts data.frame

Description

For a data.frame that includes cumulative case counts over time +/- extra columns for location, etc., this function adds an extra column corresponding to the daily incidence counts.

Usage

add_incidence_column(
  df,
  date_column = "date",
  count_column = "count",
  incidence_col_name = "inc",
  grouping_columns = c()
)

Arguments

df

a data.frame with at least two columns representing a date or at least ordered quantity and a cumulative count column. These types of data often arise from one of the case-count type datasets.

date_column

character(1) giving the column name of date column in the dataset

count_column

character(1) giving the column name of the cumulative counts in the dataset

incidence_col_name

character(1) giving the desired column name to add

grouping_columns

character() vector with the column names to use for grouping when calculating the incidence data. See examples for details. Be very careful to include the appropriate columns in grouping, or results will be misleading.

Details

Multiple datasets conform to the cumulative counts form, with a date and count column of cumulative cases over time. Other columns may be present.

This function summarizes by the grouping_columns and then within each group, subtracts the previous day's counts. The result is the new case count for each day.

Value

a data.frame

Author(s)

Sean Davis seandavi@gmail.com

See Also

jhu_data(), covidtracker_data(), jhu_us_data() and others for datasets that are appropriate for passing into here.

Examples

library(ggplot2)
library(dplyr)

j = jhu_data()
head(j)
colnames(j)

add_incidence_column(j, grouping_columns=c('CountryRegion','ProvinceState'))

# get top 10 countries by cumulative
# number of cases
j_top_10 = j %>%
    filter(subset=='deaths') %>%
    dplyr::group_by(CountryRegion) %>%
    dplyr::summarize(count = max(count)) %>%
    dplyr::arrange(dplyr::desc(count)) %>%
    head(10)

j_top_10

# The JHU data divides some countries into
# regions, so we can collapse to regions
# by simply summing over date/country
j = j %>% filter(CountryRegion %in% j_top_10[['CountryRegion']] & subset=='deaths') %>%
    dplyr::group_by(date, CountryRegion) %>%
    dplyr::summarize(count = sum(count))

j

# Add an incidence column to the cumulative dataset
j_inc = add_incidence_column(j, grouping_columns='CountryRegion')

j_inc

j_inc %>%
    dplyr::filter(count>0) %>%
    plot_epicurve(color='CountryRegion', case_column='inc') +
        geom_smooth() +
        ggtitle('Daily death counts in the top 10 most infected countries')


# Hospitalizations by day in Maryland
covidtracker_data() %>%
    filter(state=='MD') %>%
    add_incidence_column(count_column='hospitalized') %>%
    ggplot(aes(x=date,y=inc)) + geom_smooth() +
    ylab("New Hospitalizations per day") +
    ggtitle('Hospitalizations in Maryland', subtitle = 'From covidtracker')




seandavi/sars2pack documentation built on May 13, 2022, 3:41 p.m.