knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) require(aniMotum)
This vignette is an extended set of examples to highlight aniMotum
's functionality. Please, do NOT interpret these examples as instructions for conducting analysis of animal movement data. Numerous essential steps in a proper analysis have been left out of these vignettes. It is your job to understand your data, ensure you are asking the right questions of your data, and that the analyses you undertake appropriately reflect those questions. We can not do this for you!
This is an overview of how to use aniMotum
to filter animal track locations obtained via the Argos satellite system or (possibly) another tracking system. aniMotum
provides three state-space models (SSM's), two for filtering (ie. estimating "true" locations and associated movement model parameters, while accounting for error-prone observations), and another for filtering and estimating move persistence - an index of movement behaviour:
rw
crw
mp
the models are continuous-time models, that is, they account for the time intervals between successive observations, thereby naturally accounting for the often irregularly-timed nature of animal tracking data. We won't dwell on the details of the models here (see Jonsen et al. 2020 for details on the crw
model), except to say there may be advantages to choosing one over the other in certain circumstances. The Random Walk model tends not to deal well with small to moderate gaps (relative to a specified time step) in observed locations and can over-fit to particularly noisy data. The Correlated Random Walk model can often deal better with these small to moderate data gaps and appropriately smooth through noisy data but tends to estimate nonsensical (i.e., 'looping' artefacts) movement through longer data gaps, especially when animals are mostly stationary during those gaps. The Move Persistence model can be more robust to data gaps up to a point, and estimates time-varying move persistence $\gamma_t$ along the track, which provides an index of how an animal's movement behaviour varies in space and time based on the autocorrelation of successive movements. If you have Argos location data and your goal is to infer changes in movement behaviour along tracks then estimating locations and move persistence simultaneously within a state-space model is the preferred approach, as location uncertainty is accounted for in the move persistence estimates.
Additionally, aniMotum
provides separate models (mpm
, jmpm
) for estimating move persistence after fitting either the rw
or crw
SSM (see Jonsen et al. 2019 for details). Using the fit_mpm
function, the mpm
is fit to individual tracks, whereas the jmpm
is fit to multiple tracks simultaneously with a variance parameter that is estimated jointly across the tracks. This latter model may better resolve subtle changes in movement behaviour along tracks that lack much contrast in movements. Both models can be fit to time-regularized locations or to time-irregular locations. See Auger-Méthé et al. (2017) for an example of the latter. These models can be fit to animal tracks, regardless of the geolocation technology used, but they assume that locations are known without error.
aniMotum
can accept input data in several possible formats. Here, we outline the default formats but users are not restricted to using these. Input data with variable names and/or column ordering that differs from the defaults can be used, provided the essential minimum data elements are present to allow the SSM's to be fitted: a unique animal/track identifier, date-time of each observation, location quality class, and longitude and latitude (or projected x and y) coordinates. We expand on these and additional variables in the following sections.
A pre-processing function format_data()
can be used to correctly structure the input data for use in aniMotum
. This function checks for the presence of the essential variables, converts date-time to POSIX
format, maps custom variable names onto the default names, and puts the variables into the default column order expected by aniMotum
. Users may choose to call format_data()
themselves, prior to fitting an SSM with fit_ssm()
, or they can allow fit_ssm()
to format the data automatically. We illustrate these approaches in the "Fitting a model" section, below.
We'll start out with the original Argos data, CLS Argos' Least-Squares-based (LS) locations. These data contain the minimum information required by aniMotum
's SSM's. A minimal input data.frame
or tibble
, containing only the information required to fit the SSM, looks like this:
data(sese, package = "aniMotum") head(data.frame(sese))
The (default) column names represent the following:
- id
a unique identifier for each animal (or track) in the data.frame
- date
a date-time variable in the form YYYY-mm-dd HH:MM:SS
(or YYYY/mm/dd HH:MM:SS
). These can be text strings, in which case aniMotum
converts them to POSIX
format and assumes the timezone is UTC
. If the date-times are from a local timezone then you must specify this via the tz
argument to format_data()
. A list of valid timezone names can be viewed via OlsonNames()
.
- lc
the location class variable common to Argos data, with classes in the set: 3, 2, 1, 0, A, B, Z.
- lon
the longitude variable
- lat
the latitude variable
The lc
values determine the measurement error variances (based on independent data, see Jonsen et al. 2020) used in the SSM's for each observation.
Since 2011, the default Argos location data uses CLS Argos' Kalman Filter (KF) algorithm. These data include error ellipse information for each observed location in the form of 3 variables: ellipse semi-major axis length, ellipse semi-minor axis length, and ellipse orientation. A minimal input data.frame
or tibble
for Argos KF data looks like this:
data(ellie, package = "aniMotum") head(data.frame(ellie))
The column names follow those for Argos LS data, with the following additions:
- smaj
the Argos error ellipse semi-major axis length (m)
- smin
the Argos error ellipse semi-minor axis length (m)
- eor
the Argos error ellipse ellipse orientation (degrees from N)
Here, the error ellipse parameters for each observation define the measurement error variances used in the SSM's (Jonsen et al. 2020). Missing error ellipse values are allowed, in which case, those observations are treated as Argos LS data.
The aniMotum
SSM's can be fit to GPS location data, for example to deal with (rare) extreme locations and/or to time-regularise locations. The input data format is the same as for Argos LS data, but the lc
class is set to G
for all GPS locations:
data.frame( id = rep("F02-B-17", 5), date = seq(Sys.time(), by = "1 hours", length.out = 5), lc = rep("G", 5), lon = seq(70.1, by = 0.5, length = 5), lat = seq(-49.2, by = 1, length = 5) )
The aniMotum
SSM's can be fit to processed light-level geolocation data, when they include longitude and latitude error SD's (lonerr
, laterr
; units in degrees). In this case, the lc
class is set to GL
for all geolocation observations:
data.frame( id = rep(54632, 5), date = seq(Sys.time(), by = "12 hours", length.out = 5), lc = rep("GL", 5), lon = seq(100, by = 0.5, length = 5), lat = seq(-55, by = 1, length = 5), lonerr = rexp(5, 1 / 0.5), laterr = rexp(5, 1 / 1.5) )
We caution users, that light-level geolocation errors can be (highly) non-Gaussian, depending on the species, tag technology, and algorithm used to estimate location from the light-level data. For example, we have observed strongly non-Gaussian location uncertainty in tracks of highly migratory tunas that traverse very different water masses in both the vertical and horizontal directions. Because aniMotum
's SSM's assume bi-variate Normal location errors, fitting to geolocations with non-Gaussian errors would not be appropriate and likely result in highly biased location estimates. Geolocation data collected from birds may be less prone to this issue, but we urge users to explore both the uncertainty in their tracking data and the algorithm used to estimate locations from light-level data.
Users can provide projected location data as an sf-tibble
or sf data.frame
. By default, aniMotum
's SSM's fit to locations that are transformed to the Mercator projection (EPSG 3395) and estimated locations are back-transformed to spherical coordinates (lon, lat). When an sf
-projected data set is supplied, the aniMotum
SSM's are fit to locations on the supplied projection (aniMotum
ensures that distance units are set to km). Here is an example sf-tibble
Argos data set in a polar stereographic projection:
data(ellie, package = "aniMotum") foo <- sf::st_as_sf(ellie, coords=c("lon","lat"), crs = "+proj=longlat +ellps=WGS84 +no_defs") foo <- sf::st_transform(foo, crs = "+proj=stere +lat_0=-60 +lon_0=85 +ellps=WGS84 +units=km +no_defs") head(foo)
Users can fit SSM's to data where a combination of Argos, GPS or light-level geolocation observations are inter-mixed. Below is an example of mixed Argos KF and GPS observations, as one might obtain from a double-tagged individual:
data.frame( id = rep("F02-B-17", 5), date = c("2017-09-17 05:20:00", "2017-10-04 14:35:01", "2017-10-05 04:03:25", "2017-10-05 06:28:20", "2017-10-05 10:21:18"), lc = c("G","2","G","A","B"), lon = c(70.1, 70.2, 70.1, 71.1, 70.8), lat = c(-49.2, -49.1, -49.3, -48.7, -48.5), smaj = c(NA, 1890, NA, 28532, 45546), smin = c(NA, 45, NA, 1723, 3303), eor = c(NA, 77, NA, 101, 97) )
These mixtures are possible because the aniMotum
SSM's apply the most appropriate measurement error model for each observation, based on its location class and error information (i.e., numeric values vs NA's).
Model fitting for quality control of locations is comprised of 3 steps: a data formatting step, a pre-filtering step, and the actual model fitting step. A number of checks are made on the input data during the formatting and pre-filtering steps, including applying the trip::sda
filter to identify extreme observations (see ?fit_ssm
for details). The pre-filter step is fully automated, although various arguments can be used to modify it's actions, and called via the fit_ssm
function:
## format, prefilter and fit Random Walk SSM using a 24 h time step fit <- fit_ssm( x = ellie, model = "rw", time.step = 24 )
These are the minimum arguments required: the input data, the model (rw
, crw
, or mp
) and the time.step (in h) to which locations are predicted. Additional control can be exerted over the data formatting and pre-filtering steps, via the id
, date
, lc
, coord
, epar
and sderr
variable name arguments, and via the vmax
, ang
, distlim
, min.dt
, and spdf
pre-filtering arguments (see ?fit_ssm
for details). The defaults for these arguments are quite conservative (for non-flying species), usually leading to relative few observations being flagged to be ignored by the SSM. Additional control over the optimization can also be exerted via the control = ssm_control()
argument, see see vignette('SSM_fitting')
and ?ssm_control
for more details.
Users do not need to adhere to the default aniMotum
data formatting presented above, so long as their input data contain the essential variables. The variables can have arbitrary names and be in any order. Here is an example using a custom formatted data.frame
of Argos LS data as input:
data(sese2_n)
sese2_n
Note the columns are in a different order than the default expected by aniMotum
, and the first three variables have different names: longitude
, latitude
and time
. There are two approaches to format these data. The first, is to call format_data
as a data pre-processing step prior to fitting an SSM. The data are linked to the expected names via the following arguments to format_data
: id
, date
, lc
, coord
, epar
and sderr
. In this example, we only need to use the date
and coord
arguments:
d <- format_data(sese2_n, date = "time", coord = c("longitude", "latitude")) d
We then fit the SSM to the formatted data d
via fit_ssm
:
fit <- fit_ssm(d, model = "crw", time.step = 24)
Alternatively, we can use a shortcut and have fit_ssm
format the sese2_n
data by adding the variable name arguments to the call:
fit <- fit_ssm(sese2_n, date = "time", coord = c("longitude","latitude"), model = "crw", time.step = 24)
Original variable names are not preserved in the output object fit
but rather transformed to the default expected names. The grab
function can be used to access the data and the SSM estimates (see Access Results, below).
fit_ssm
can be applied to single or multiple tracks, without modification. The specified SSM is fit to each individual separately and the resulting output is a compound tibble
with rows corresponding to each individual ssm_df
fit object. The converged
column indicates whether each model fit converged successfully.
## fit to data with two individuals fit <- fit_ssm(sese2, model = "crw", time.step=24, control = ssm_control(verbose = 0)) ## list fit outcomes for both seals fit
load("data/overview/fit.rda") fit
Individual id
is displayed in the 1st column, all fit output resides in a list (ssm
) in the 2nd column, convergence
status (whether the optimizer found a global minimum) of each model fit is displayed in the 3rd column, whether the Hessian matrix was positive-definite and could be solved to obtain parameter standard errors (pdHess
) is displayed in the 4th column, and the specified process model (rw
, crw
, or mp
) in the 5th column. In some cases, the optimizer will converge but the Hessian matrix is not positive-definite, which typically indicates the optimizer converged on a local minimum. In this case, some standard errors may be calculated but not all. One possible solution is to try specifying a longer time.step
or set time.step = NA
to turn off predictions and return only fitted values (location estimates at the pre-filtered observation times). If pdHess = FALSE
persists then careful inspection of the supplied data is warranted to determine if suspect observations not identified by prefilter
are present. The excellent glmmTMB troubleshooting vignette may also provide hints at solutions. Convergence failures should be examined for potential data issues, however, in some cases changes to the optimization parameters via ssm_control()
(see ?fit_ssm
and ?ssm_control
on usage) may overcome mild issues (see ?nlminb
or ?optim
for details on optimization control parameters).
Summary information about the fit can be obtained via the summary
function:
summary(fit)
The summary table lists information about the fit, including the number of observations in the input data (n.obs
), the number of observation flagged to be ignored by the SSM (n.filt
), the number of fitted location estimates (n.fit
), the number of predicted location estimates (n.pred
), the number of rerouted location estimates (if present, n.rr
), model convergence status, and AICc. When fitting to multiple individuals, these statistics are repeated on separate lines for each individual. Separate tables of SSM parameter estimates and their SE's are also printed for each individual. The parameter estimates displayed vary depending on the SSM process model selected by the user (rw
, crw
, or mp
) and the automatically chosen measurement model(s). Here, sigma_x
and sigma_y
are the process error standard deviations in the x and y directions, rho_p
is the correlation parameter in the covariance term. The Std. Error
column lists the standard errors, calculated via the Delta method (see TMB documentation for details), for each estimated parameter.
fit_ssm
usually returns two sets of estimated locations in the model fit object: fitted values and predicted values. The fitted values occur at the times of the observations to which the SSM was fit (i.e., the observations that passed the pre-filter step). The predicted values occur at the regular time intervals specified by the time.step
argument. If time.step = NA
, then no predicted values are estimated or returned in the model fit object.
Users can obtain the fitted or predicted locations as a data.frame by using grab()
:
## grab fitted locations floc <- grab(fit, what = "fitted") floc[1:5,] ## grab predicted locations in projected form ploc <- grab(fit, what = "predicted", as_sf = TRUE) ploc[1:5,]
Here, the output from the crw
SSM is returned as a fitted location data.frame (floc) that includes individual id, date, longitude, latitude, x and y (typically from the default Mercator projection) and their standard errors (x.se
, y.se
in km), u
, v
(and their standard errors, u.se
, v.se
in km/h) are estimates of signed velocity in the x and y directions. The u
, v
velocities should generally be ignored as their estimation uses time intervals between consecutive locations, whether they are observation times or prediction times. The columns s
and s.se
provide a more reliable 2-D velocity estimate, although standard error estimation is turned off by default as this generally increases computation time for the crw
SSM. Standard error estimation for s
can be turned on via the control
argument to fit_ssm
(i.e. control = ssm_control(se = TRUE)
, see ?ssm_control
for futher details).
The predicted location data.frame (ploc) is an sf-tibble
with geometry and Coordinate Reference System information. This sf
output format can be useful for custom mapping or calculating derived variables from the estimated locations.
The formatted and prefiltered version of the input data can also be extracted from the output:
fp.data <- grab(fit, what = "data") fp.data[1:5,]
Here, fp.data
is in the form that is passed to the crw
SSM. The first 5 columns (id
, date
, lc
, lon
, lat
) are preserved from the formatted input data, and the error ellipse parameter columns (smaj
, smin
, eor
) are appended and filled with NA's, if missing from the input data. The observation type, obs.type
, is determined for each observation during the prefiltering stage based on the combination of lc
value and the presence of error ellipse parameters with non-NA values. The keep
column indicates whether each record passed the pre-filtering stage (see ?prefilter
for details), observations with keep = FALSE
are ignored by the SSM. The x
,y
columns are the Mercator-projected coordinates (in km) that fitted to by the SSM. The emf.x
and emf.y
columns are the error multiplication factors used to scale the measurement error variances, used by the SSM, for each Argos Least-Squares location class - these are relevant only for obs.type = 'LS'
and for GPS observations.
A generic plot
(see ?plot.ssm_df
) method allows a quick visual of the SSM fit to the data:
# plot time-series of the fitted values plot(fit, what = "fitted", type = 1, pages = 1)
{width=95%}
Here, the fitted values (state estimates corresponding to the time of each observation; orange points) are plotted on top of the observations that passed the prefilter
stage (blue points and blue rug at bottom) and as separate time-series for the x and y coordinates by default. Uncertainty in the estimates is displayed as 2 x SE intervals (orange-filled ribbon). Observations that failed the prefilter
stage are also displayed (black x's and black rug at bottom).
A 2-D time series plot of the track is invoked by the argument type = 2
:
# plot fitted values as a 2-d track plot(fit, what = "predicted", type = 2, pages = 1)
{width=95%}
The predicted values (orange points) are the state estimates predicted at regular time intervals, specified by time.step
(here a 24 h interval). 95 % confidence ellipses (orange-filled ellipses) around the predicted values are also displayed, but can be faded away by choosing a low alpha
value (e.g., plot(fit, what = "predicted", type = 2, alpha = 0.05)
). Observations that failed the prefilter
stage are displayed (black x's) by default but can be turned off with the argument outlier = FALSE
).
Auger-Méthé M, Albertsen CM, Jonsen ID, Derocher AE, Lidgard DC, Studholme KR, Bowen WD, Crossin GT, Flemming JM (2017) Spatiotemporal modelling of marine movement data using Template Model Builder (TMB). Marine Ecology Progress Series 565:237-249.
Jonsen ID, McMahon CR, Patterson TA, Auger-Méthé M. Harcourt R, Hindel MA, Bestley S (2019) Movement responses to environment: fast inference of variation among southern elephant seals with a mixed effect model. Ecoloogy 100:e02566.
Jonsen ID, Patterson TA, Costa DP, Doherty PD, Godley BJ, Grecian WJ, Guinet C, Hoenner X, Kienle SS, Robinson PW, Votier SC, Whiting S, Witt MJ, Hindel MA, Harcourt RG, McMahon CR (2020) A continuous-time state-space model for rapid quality control of Argos locations from animal-borne tags. Movement Ecology 8:31.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.