The sars2pack R package provides one-line access to over 40 COVID-related datasets. Datasets are accessed in real time directly from their sources and then transformed to tidy-data
form where possible and applicable. The result of each dataset accessor is a ready-to-use R dataset, often a dataframe. Documentation includes dataset descriptions, sources and references, and examples. Online documentation is available in two locations:
# If you do not have BiocManager installed:
install.packages('BiocManager')
# Then, if sars2pack is not already installed:
BiocManager::install('seandavi/sars2pack')
After the one-time installation, load the packge to get started.
library(sars2pack)
Updated tracking of city, county, state, national, and international confirmed cases, deaths, and testing is critical to driving policy, implementing interventions, and measuring their effectiveness. Case tracking datasets include date, a count of cases, and usually numerous other pieces of information related to location of reporting, etc.
Accessing case-tracking datasets is typically done with one function per dataset. The example here is data from the European Centers for Disease Control, or ECDC.
ecdc = ecdc_data()
Get a quick overview of the dataset.
head(ecdc)
## # A tibble: 6 x 8
## # Groups: location_name, subset [6]
## date location_name iso2c iso3c population_2019 continent subset count
## <date> <chr> <chr> <chr> <dbl> <chr> <chr> <dbl>
## 1 2019-12-31 Afghanistan AF AFG 38041757 Asia confirmed 0
## 2 2019-12-31 Afghanistan AF AFG 38041757 Asia deaths 0
## 3 2019-12-31 Algeria DZ DZA 43053054 Africa confirmed 0
## 4 2019-12-31 Algeria DZ DZA 43053054 Africa deaths 0
## 5 2019-12-31 Armenia AM ARM 2957728 Europe confirmed 0
## 6 2019-12-31 Armenia AM ARM 2957728 Europe deaths 0
The ecdc
dataset is just a data.frame
(actually, a tibble
), so
applying standard R or tidyverse functionality can get answers to basic
questions with little code. The next code block generates a top10
of
countries with the most deaths recorded to date. Note that if you do
this on your own computer, the data will be updated to today’s data
values.
library(dplyr)
top10 = ecdc %>% filter(subset=='deaths') %>%
group_by(location_name) %>%
filter(count==max(count)) %>%
arrange(desc(count)) %>%
head(10) %>% select(-starts_with('iso'),-continent,-subset) %>%
mutate(rate_per_100k = 1e5*count/population_2019)
Finally, present a nice table of those countries:
knitr::kable(
top10,
caption = "Reported COVID-19-related deaths in ten most affected countries.",
format = 'pandoc')
Reported COVID-19-related deaths in ten most affected countries.
date
location_name
population_2019
count
rate_per_100k
2020-07-06
United_States_of_America
329064917
129947
39.489776
2020-07-06
Brazil
211049519
64867
30.735441
2020-07-06
United_Kingdom
66647112
44220
66.349462
2020-07-06
Italy
60359546
34861
57.755570
2020-07-06
Mexico
127575529
30639
24.016361
2020-07-04
France
67012883
29893
44.607841
2020-07-05
France
67012883
29893
44.607841
2020-07-06
France
67012883
29893
44.607841
2020-05-24
Spain
46937060
28752
61.256500
2020-07-06
India
1366417756
19693
1.441214
Examine the spread of the pandemic throughout the world by examining cumulative deaths reported for the top 10 countries above.
ecdc_top10 = ecdc %>% filter(location_name %in% top10$location_name & subset=='deaths')
plot_epicurve(ecdc_top10,
filter_expression = count > 10,
color='location_name')
Comparing the features of disease spread is easiest if all curves are shifted to “start” at the same absolute level of infection. In this case, shift the origin for all countries to start at the first time point when more than 100 cumulative cases had been observed. Note how some curves cross others which is evidence of less infection control at the same relative time in the pandemic for that country (eg., Brazil).
ecdc_top10 %>% align_to_baseline(count>100,group_vars=c('location_name')) %>%
plot_epicurve(date_column = 'index',color='location_name')
Pull requests are gladly accepted on Github.
See the Adding new datasets vignette.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.