knitr::opts_chunk$set( collapse = TRUE, comment = "" )
library(SummarizedActigraphy)
The data is from https://github.com/THLfi/read.gt3x/files/3522749/GT3X%2B.01.day.gt3x.zip. It is a daily GT3X file from an ActiGraph.
Let's download the data:
url = "https://github.com/THLfi/read.gt3x/files/3522749/GT3X%2B.01.day.gt3x.zip" destfile = tempfile(fileext = ".zip") dl = utils::download.file(url, destfile = destfile) gt3x_file = utils::unzip(destfile, exdir = tempdir()) gt3x_file = gt3x_file[!grepl("__MACOSX", gt3x_file)] path = gt3x_file
This data represents sub-second level accelerations in the X, Y, and Z directions. Additional information from devices can be measured, such as temperature or light. We will focus only on the accelerometry data. The GGIR::g.calibrate
function is a method to calibrate the ENMO values [@GGIR_calibrate]. Other types of activity data, would be things such as activity counts, step counts, or previously summarized data. Data such as this is commonly calculated using proprietary methods or algorithms.
We will use the read_actigraphy
function to read these files into an AccData
object:
x = read_actigraphy(path) class(x) names(x)
The read_actigraphy
function uses the read.gt3x::read.gt3x
for gt3x files, and uses functions from the GGIR
package [@GGIR].
The output has a data matrix in the data
, which has X
, Y
, and Z
columns, with an additional time
column, which is a date/time column. Additionally, the header
object has additional metadata about the object:
x$header
The sampling frequency is embedded in the header, but also found in the freq
element:
x$freq
In this case, there are r x$freq
samples per second.
The summarize_daily_actigraphy
summarizes an AccData
into second-level data for each day. The output is an tsibble
:
daily = summarize_daily_actigraphy(x) head(daily)
The process is as follows, we use floor_date
from lubridate
, so that each time is rounded to 1 seconds. This rounding is what allows us to use group_by
on the time variable (which is really date/time variable) and summarize the data:
library(dplyr) data = x$data data = data %>% mutate(time = lubridate::floor_date(time, unit = "1 seconds"), enmo = sqrt(X^2 + Y^2 + Z^2) - 1) head(data)
In summarize_daily_actigraphy
, the following process is done:
data = data %>% group_by(time) %>% summarize( mad = (mad(X, na.rm = TRUE) + mad(Y, na.rm = TRUE) + mad(Z, na.rm = TRUE))/3, ai = sqrt((var(X, na.rm = TRUE) + var(Y, na.rm = TRUE) + var(Z, na.rm = TRUE))/3), n_values = sum(!is.na(enmo)), enmo = mean(enmo, na.rm = TRUE) ) head(data)
This assumes that all non-NA values are valid, no wear-time estimation is done. Also, estimating non-wear time/areas from this summarized data. The accelerometry::weartime
function estimates wear time, but is based on count values.
This daily-level data is good for thresholding activity, such as into categories like vigorous activity. Also, this data is used for estimating sedentary and active bouts, and transition probability between them.
Many of the following statistics are based on the Euclidean Norm of the three axes:
$$ r_i = \sqrt{X_{i}^2 + Y_{i}^2 + Z_{i}^2} $$
One of the statistics calculated are ENMO, the Euclidean Norm Minus One, which is:
$$
ENMO_{t} = \frac{1}{n_{t}} \sum_{i=1}^{n_t} \left(r_i - 1\right) \cdot \mathbb{1} \left(r_i > 1\right)
$$
where $t$ is the time point, rounded to one second. In this case, $i$ represents each sample, so there are r x$freq
ENMO measures per second, which means the number of samples for that time point, $n_{t}$, is r x[["freq"]]
. In most cases, however, ENMO and the other statistics are calculated at the 1-second level. So the $ENMO_{t}$ from summarize_daily_actigraphy
calculates $ENMO$ for each sample, then takes the mean $ENMO_{t}$ for 1-second intervals. We will use $ENMO_{t}$ to represent this second-level data, leaving off any bar or hat, but in truth it is an average. The subtraction of $1$ is to subtract $1$ gravity unit (g), from the data so it represents acceleration not due to gravity.
The Activity Index (AI) was introduced by @bai. It is calculated by: $$ AI_{t} = \sqrt{\frac{Var(X_{t}) + Var(Y_{t}) + Var(Z_{t})}{3}} $$
where
$$ Var(X_{t}) = \frac{1}{n_{t}} \sum_{i=1}^{n_t} \left(X_{i} - \bar{X_{t}}\right)^2 $$
where $$\bar{X_{t} = \frac{1}{n_{t}} \sum_{i=1}^{n_t} X_{i}$$.
The Mean Absolute Deviation (MAD) is calculated by:
$$ MAD_{t} = \frac{1}{n_{t}} \sum_{i=1}^{n_t} \left|r_i - \sum_{i=1}^{n_t}\frac{r_i}{n_{t}} \right| $$ Median Absolute Deviation (MEDAD) is calculated by:
$$ MEDAD_{t} = \text{median}\Bigg{ \left|r_i - \sum_{i=1}^{n_t}\frac{r_i}{n_{t}} \right|\Bigg}_{i=1}^{n_t} $$ .
The summarize_actigraphy
summarizes an AccData
into second-level data for an "average" day or average profile. For each second of the day, a summary statistic is taken, either the mean or the median. The summarize_actigraphy
gives both, and the output is an tsibble
:
average_day = summarize_actigraphy(x, .fns = list(mean = mean, median = median),) head(average_day)
The number of rows is 86400, which is 60 seconds per minute, 60 minutes per hour, 24 hours per day:
nrow(average_day) range(average_day$time)
These values represent an "average" day, where average is determined by mean and median.
Here we plot the AI values for each minute, summarized over mean and median:
if (requireNamespace("ggplot2", quietly = TRUE)) { library(ggplot2) library(magrittr) average_day %>% dplyr::rename_with(tolower) %>% ggplot(aes(x = time, y = ai_mean)) + geom_line() average_day %>% dplyr::rename_with(tolower) %>% ggplot(aes(x = time, y = ai_median)) + geom_line() }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.