In muschellij2/SummarizedActigraphy: Coerce 'Actigraphy' to Summarized Experiments

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = ""
)

library(SummarizedActigraphy)

Data

The data is from https://github.com/THLfi/read.gt3x/files/3522749/GT3X%2B.01.day.gt3x.zip. It is a daily GT3X file from an ActiGraph.

Let's download the data:

url = "https://github.com/THLfi/read.gt3x/files/3522749/GT3X%2B.01.day.gt3x.zip"
destfile = tempfile(fileext = ".zip")
dl = utils::download.file(url, destfile = destfile)
gt3x_file = utils::unzip(destfile, exdir = tempdir())
gt3x_file = gt3x_file[!grepl("__MACOSX", gt3x_file)]
path = gt3x_file

This data represents sub-second level accelerations in the X, Y, and Z directions. Additional information from devices can be measured, such as temperature or light. We will focus only on the accelerometry data. The GGIR::g.calibrate function is a method to calibrate the ENMO values [@GGIR_calibrate]. Other types of activity data, would be things such as activity counts, step counts, or previously summarized data. Data such as this is commonly calculated using proprietary methods or algorithms.

Create a Data Matrix

We will use the read_actigraphy function to read these files into an AccData object:

x = read_actigraphy(path)
class(x)
names(x)

The read_actigraphy function uses the read.gt3x::read.gt3x for gt3x files, and uses functions from the GGIR package [@GGIR].

The output has a data matrix in the data, which has X, Y, and Z columns, with an additional time column, which is a date/time column. Additionally, the header object has additional metadata about the object:

x$header

The sampling frequency is embedded in the header, but also found in the freq element:

x$freq

In this case, there are r x$freq samples per second.

Summarizing the data: Day-Second Level

The summarize_daily_actigraphy summarizes an AccData into second-level data for each day. The output is an tsibble:

daily = summarize_daily_actigraphy(x)
head(daily)

The process is as follows, we use floor_date from lubridate, so that each time is rounded to 1 seconds. This rounding is what allows us to use group_by on the time variable (which is really date/time variable) and summarize the data:

library(dplyr)
data = x$data
data = data %>% 
  mutate(time = lubridate::floor_date(time, unit = "1 seconds"),
         enmo = sqrt(X^2 + Y^2 + Z^2) - 1)
head(data)

In summarize_daily_actigraphy, the following process is done:

data = data %>%
  group_by(time) %>%
  summarize(
    mad = (mad(X, na.rm = TRUE) + 
             mad(Y, na.rm = TRUE) +
             mad(Z, na.rm = TRUE))/3,
    ai = sqrt((var(X, na.rm = TRUE) + 
                 var(Y, na.rm = TRUE) + 
                 var(Z, na.rm = TRUE))/3),
    n_values = sum(!is.na(enmo)),
    enmo = mean(enmo, na.rm = TRUE)
  )
head(data)

This assumes that all non-NA values are valid, no wear-time estimation is done. Also, estimating non-wear time/areas from this summarized data. The accelerometry::weartime function estimates wear time, but is based on count values.

This daily-level data is good for thresholding activity, such as into categories like vigorous activity. Also, this data is used for estimating sedentary and active bouts, and transition probability between them.

Euclidean Norm (r)

Many of the following statistics are based on the Euclidean Norm of the three axes:

$$ r_i = \sqrt{X_{i}^2 + Y_{i}^2 + Z_{i}^2} $$

ENMO

One of the statistics calculated are ENMO, the Euclidean Norm Minus One, which is:

$$ ENMO_{t} = \frac{1}{n_{t}} \sum_{i=1}^{n_t} \left(r_i - 1\right) \cdot \mathbb{1} \left(r_i > 1\right) $$ where $t$ is the time point, rounded to one second. In this case, $i$ represents each sample, so there are r x$freq ENMO measures per second, which means the number of samples for that time point, $n_{t}$, is r x[["freq"]]. In most cases, however, ENMO and the other statistics are calculated at the 1-second level. So the $ENMO_{t}$ from summarize_daily_actigraphy calculates $ENMO$ for each sample, then takes the mean $ENMO_{t}$ for 1-second intervals. We will use $ENMO_{t}$ to represent this second-level data, leaving off any bar or hat, but in truth it is an average. The subtraction of $1$ is to subtract $1$ gravity unit (g), from the data so it represents acceleration not due to gravity.

Activity Index (AI)

The Activity Index (AI) was introduced by @bai. It is calculated by: $$ AI_{t} = \sqrt{\frac{Var(X_{t}) + Var(Y_{t}) + Var(Z_{t})}{3}} $$

where

$$ Var(X_{t}) = \frac{1}{n_{t}} \sum_{i=1}^{n_t} \left(X_{i} - \bar{X_{t}}\right)^2 $$

where $$\bar{X_{t} = \frac{1}{n_{t}} \sum_{i=1}^{n_t} X_{i}$$.

Mean, Median Absolute Deviation (MAD, MEDAD)

The Mean Absolute Deviation (MAD) is calculated by:

$$ MAD_{t} = \frac{1}{n_{t}} \sum_{i=1}^{n_t} \left|r_i - \sum_{i=1}^{n_t}\frac{r_i}{n_{t}} \right| $$ Median Absolute Deviation (MEDAD) is calculated by:

$$ MEDAD_{t} = \text{median}\Bigg{ \left|r_i - \sum_{i=1}^{n_t}\frac{r_i}{n_{t}} \right|\Bigg}_{i=1}^{n_t} $$ .

Summarizing the data: Second Level

The summarize_actigraphy summarizes an AccData into second-level data for an "average" day or average profile. For each second of the day, a summary statistic is taken, either the mean or the median. The summarize_actigraphy gives both, and the output is an tsibble:

average_day = summarize_actigraphy(x, .fns = list(mean = mean, median = median),)
head(average_day)

The number of rows is 86400, which is 60 seconds per minute, 60 minutes per hour, 24 hours per day:

nrow(average_day)
range(average_day$time)

These values represent an "average" day, where average is determined by mean and median.

Plotting an Average Day

Here we plot the AI values for each minute, summarized over mean and median:

if (requireNamespace("ggplot2", quietly = TRUE)) {
  library(ggplot2)
  library(magrittr)
  average_day %>%
    dplyr::rename_with(tolower) %>% 
    ggplot(aes(x = time, y = ai_mean)) +
    geom_line()

  average_day %>%
    dplyr::rename_with(tolower) %>% 
    ggplot(aes(x = time, y = ai_median)) +
    geom_line()
}