| Title: | Multivariate Spatio-Temporal Models using Structural Equations |
|---|---|
| Description: | Fits a wide variety of multivariate spatio-temporal models with simultaneous and lagged interactions among variables (including vector autoregressive spatio-temporal ('VAST') dynamics) for areal, continuous, or network spatial domains. It includes time-variable, space-variable, and space-time-variable interactions using dynamic structural equation models ('DSEM') as expressive interface, and the 'mgcv' package to specify splines via the formula interface. See Thorson et al. (2025) <doi:10.1111/geb.70035> for more details. |
| Authors: | James T. Thorson [aut, cre] (ORCID: <https://orcid.org/0000-0001-7415-1010>), Sean C. Anderson [aut] (ORCID: <https://orcid.org/0000-0001-9563-1937>) |
| Maintainer: | James T. Thorson <[email protected]> |
| License: | GPL-3 |
| Version: | 1.6.0 |
| Built: | 2026-05-19 04:05:17 UTC |
| Source: | https://github.com/vast-lib/tinyvast |
Interpolates covariate values from a data frame to mesh vertices using inverse distance weighting (IDW). Uses gstat for exact IDW interpolation by default, with an optional high-performance RANN method for very large datasets.
add_mesh_covariates( mesh, data, covariates, coords, power = 2, method = c("gstat", "rann"), k = 10, barrier = NULL )add_mesh_covariates( mesh, data, covariates, coords, power = 2, method = c("gstat", "rann"), k = 10, barrier = NULL )
mesh |
A mesh object from fmesher or sdmTMB (for
|
data |
A data frame with coordinate columns and covariate columns, or an sf object. |
covariates |
Character vector of covariate column names to interpolate. |
coords |
Character vector of coordinate column names. Ignored if data is an sf object. |
power |
Numeric power parameter for inverse distance weighting (default
|
method |
Interpolation method. Options: |
k |
Number of nearest neighbours to use for |
barrier |
Optional sf polygon object defining barrier regions. If
provided, adds a logical |
Modified mesh object with vertex_covariates and triangle_covariates
elements added and class vertex_cov added. The vertex_covariates data frame
contains covariate values interpolated at mesh vertices, and triangle_covariates
contains covariate values interpolated at triangle centers.
library(sdmTMB) library(sf) # Regular data frame mesh <- fmesher::fm_mesh_2d(pcod[, c("X", "Y")], cutoff = 10) mesh_with_covs <- add_mesh_covariates( mesh, data = qcs_grid, covariates = c("depth"), coords = c("X", "Y") ) head(mesh_with_covs$vertex_covariates) # Visualize what we've done: if (requireNamespace("ggplot2", quietly = TRUE)) { library(ggplot2) df <- as.data.frame(mesh_with_covs$loc[,1:2]) df <- cbind(df, mesh_with_covs$vertex_covariates) ggplot() + geom_raster(data = qcs_grid, aes(X, Y, fill = depth), alpha = 0.7) + geom_point( data = df, aes(V1, V2, fill = depth), colour = "#00000010", pch = 21) + scale_fill_viridis_c(option = "G", trans = "log", direction = -1) df_tri <- mesh_with_covs$triangle_covariates ggplot() + geom_raster( data = qcs_grid, aes(X, Y, fill = depth), alpha = 0.7) + geom_point( data = df_tri, aes(x = .x_triangle, y = .y_triangle, fill = depth), colour = "#00000010", pch = 21) + scale_fill_viridis_c(option = "G", trans = "log", direction = -1) } # Piped version mesh_with_covs <- fmesher::fm_mesh_2d(pcod[, c("X", "Y")], cutoff = 10) |> add_mesh_covariates( qcs_grid, covariates = c("depth_scaled", "depth_scaled2"), coords = c("X", "Y") ) # With sf objects (coords automatically extracted) pcod_sf <- st_as_sf(pcod, coords = c("X", "Y")) grid_sf <- st_as_sf(qcs_grid, coords = c("X", "Y")) mesh_sf <- fmesher::fm_mesh_2d(pcod_sf, cutoff = 10) |> add_mesh_covariates(grid_sf, c("depth")) # With sdmTMB mesh (coordinate names and mesh automatically detected) mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10) |> add_mesh_covariates(qcs_grid, c("depth")) # Use RANN method for very large datasets (much faster) mesh_fast <- fmesher::fm_mesh_2d(pcod[, c("X", "Y")], cutoff = 10) |> add_mesh_covariates( qcs_grid, covariates = c("depth_scaled", "depth_scaled2"), coords = c("X", "Y"), method = "rann", k = 15 )library(sdmTMB) library(sf) # Regular data frame mesh <- fmesher::fm_mesh_2d(pcod[, c("X", "Y")], cutoff = 10) mesh_with_covs <- add_mesh_covariates( mesh, data = qcs_grid, covariates = c("depth"), coords = c("X", "Y") ) head(mesh_with_covs$vertex_covariates) # Visualize what we've done: if (requireNamespace("ggplot2", quietly = TRUE)) { library(ggplot2) df <- as.data.frame(mesh_with_covs$loc[,1:2]) df <- cbind(df, mesh_with_covs$vertex_covariates) ggplot() + geom_raster(data = qcs_grid, aes(X, Y, fill = depth), alpha = 0.7) + geom_point( data = df, aes(V1, V2, fill = depth), colour = "#00000010", pch = 21) + scale_fill_viridis_c(option = "G", trans = "log", direction = -1) df_tri <- mesh_with_covs$triangle_covariates ggplot() + geom_raster( data = qcs_grid, aes(X, Y, fill = depth), alpha = 0.7) + geom_point( data = df_tri, aes(x = .x_triangle, y = .y_triangle, fill = depth), colour = "#00000010", pch = 21) + scale_fill_viridis_c(option = "G", trans = "log", direction = -1) } # Piped version mesh_with_covs <- fmesher::fm_mesh_2d(pcod[, c("X", "Y")], cutoff = 10) |> add_mesh_covariates( qcs_grid, covariates = c("depth_scaled", "depth_scaled2"), coords = c("X", "Y") ) # With sf objects (coords automatically extracted) pcod_sf <- st_as_sf(pcod, coords = c("X", "Y")) grid_sf <- st_as_sf(qcs_grid, coords = c("X", "Y")) mesh_sf <- fmesher::fm_mesh_2d(pcod_sf, cutoff = 10) |> add_mesh_covariates(grid_sf, c("depth")) # With sdmTMB mesh (coordinate names and mesh automatically detected) mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10) |> add_mesh_covariates(qcs_grid, c("depth")) # Use RANN method for very large datasets (much faster) mesh_fast <- fmesher::fm_mesh_2d(pcod[, c("X", "Y")], cutoff = 10) |> add_mesh_covariates( qcs_grid, covariates = c("depth_scaled", "depth_scaled2"), coords = c("X", "Y"), method = "rann", k = 15 )
Given user-provided newdata, expand the object tmb_data
to include predictions corresponding to those new observations
add_predictions(object, newdata, remove_origdata = FALSE)add_predictions(object, newdata, remove_origdata = FALSE)
object |
Output from |
newdata |
New data-frame of independent variables used to predict the response. |
remove_origdata |
Whether to remove original-data to allow faster evaluation.
|
the object fit$tmb_inputs$tmb_data representing data used during fitting,
but with updated values for slots associated with predictions, where this
updated object can be recompiled by TMB to provide predictions
Data used in Thorson et al. 2025 and copied from Zenodo to allow inclusion as a package vignette. This includes bottom-trawl samples of rockfish and flatfish from the Alaska Fisheries Science Center in the Gulf of Alaska and Aleutian Islands, and drop-camera samples of rockfish, flatfish, sponges, and corals from the Alaska Fisheries Science Center and funded by the Deep Sea Coral Initiative.
data(alaska_sponge_coral_fish)data(alaska_sponge_coral_fish)
Data used to demonstrate and test seasonal spatio-temporal index standardization. Data are from doi:10.1093/icesjms/fsaa074
data(atlantic_yellowtail)data(atlantic_yellowtail)
Shapefile defining the spatial domain for the eastern and northern Bering Sea bottom trawl surveys.
data(bering_sea)data(bering_sea)
Data to estimate probabilistic forecasting, which bridges uncertainty
between short-term (decadal) projections and long-term (end-of-century)
forecasts. Here we use capelin sampling from a survey trawl survey
in the eastern and northern Bering Sea, as well as annual
sea surface temperature anomalies. The data set includes data frames
for both the fitted data Data and the projected values New_Data
data(bering_sea_capelin_forecasts)data(bering_sea_capelin_forecasts)
Data used to demonstrate and test model-based age expansion, using density= dependence corrected survey catch rates after first=stage expansion from the bottom trawl survey for ages 1-15, conducted by by the Alaska Fisheries Science Center, including annual surveys in the eastern Bering Sea 1982-2019 and 2021-2023, as well as the northern Bering Sea in 1982/85/88/91 and 2010/17/18/19/21/22/23.
data(bering_sea_pollock_ages)data(bering_sea_pollock_ages)
Estimated proporrtion-at-age for Alaska pollock using the package VAST, for comparison with output using tinyVAST.
data(bering_sea_pollock_vast)data(bering_sea_pollock_vast)
Calculates the conditional Akaike Information criterion (cAIC).
cAIC(object)cAIC(object)
object |
Output from |
cAIC is designed to optimize the expected out-of-sample predictive performance for new data that share the same random effects as the in-sample (fitted) data, e.g., spatial interpolation. In this sense, it should be a fast approximation to optimizing the model structure based on k-fold cross-validation.
By contrast, AIC() calculates the marginal Akaike Information Criterion,
which is designed to optimize expected predictive performance for new data
that have new random effects, e.g., extrapolation, or inference about
generative parameters.
Both cAIC and EDF are calculated using Eq. 6 of Zheng, Cadigan, and Thorson (2024).
For models that include profiled fixed effects, these profiles are turned off.
cAIC value
Zheng, N., Cadigan, N., & Thorson, J. T. (2024). A note on numerical evaluation of conditional Akaike information for nonlinear mixed-effects models (arXiv:2411.14185). arXiv. doi:10.48550/arXiv.2411.14185
data( red_snapper ) red_snapper = droplevels(subset(red_snapper, Data_type=="Biomass_KG")) # Define mesh mesh = fmesher::fm_mesh_2d( red_snapper[,c('Lon','Lat')], cutoff = 1 ) # define formula with a catchability covariate for gear formula = Response_variable ~ factor(Year) + offset(log(AreaSwept_km2)) # make variable column red_snapper$var = "logdens" # fit using tinyVAST fit = tinyVAST( data = red_snapper, formula = formula, space_term = "logdens <-> logdens, sd_space", space_columns = c("Lon",'Lat'), spatial_domain = mesh, family = tweedie(link="log"), variable_column = "var", control = tinyVASTcontrol( getsd = FALSE, profile = "alpha_j" ) ) cAIC(fit) # conditional AIC AIC(fit) # marginal AICdata( red_snapper ) red_snapper = droplevels(subset(red_snapper, Data_type=="Biomass_KG")) # Define mesh mesh = fmesher::fm_mesh_2d( red_snapper[,c('Lon','Lat')], cutoff = 1 ) # define formula with a catchability covariate for gear formula = Response_variable ~ factor(Year) + offset(log(AreaSwept_km2)) # make variable column red_snapper$var = "logdens" # fit using tinyVAST fit = tinyVAST( data = red_snapper, formula = formula, space_term = "logdens <-> logdens, sd_space", space_columns = c("Lon",'Lat'), spatial_domain = mesh, family = tweedie(link="log"), variable_column = "var", control = tinyVASTcontrol( getsd = FALSE, profile = "alpha_j" ) ) cAIC(fit) # conditional AIC AIC(fit) # marginal AIC
classify_variables is copied from sem:::classifyVariables
classify_variables(model)classify_variables(model)
model |
syntax for structural equation model |
Copied from package sem under licence GPL (>= 2) with permission from John Fox
Tagged-list defining exogenous and endogenous variables
Data used to demonstrate and test a bivariate model for morphometric condition (i.e., residuals in a weight-at-length relationship) and density for fishes, using the same example as was provided as a wiki example for VAST. Data are from doi:10.3354/meps13213
data(condition_and_density)data(condition_and_density)
Generates samples from a Gaussian Markov random field (GMRF) conditional upon fixed values for some elements.
conditional_gmrf( Q, observed_idx, x_obs, n_sims = 1, what = c("simulate", "predict") )conditional_gmrf( Q, observed_idx, x_obs, n_sims = 1, what = c("simulate", "predict") )
Q |
precision for a zero-centered GMRF. |
observed_idx |
integer vector listing rows of |
x_obs |
numeric vector with fixed values for indices |
n_sims |
integer listing number of simulated values |
what |
Whether to simulate from the conditional GMRF, or predict the mean and precision |
A matrix with n_sims columns and a row for every row of Q not in
observed_idx, with simulations for those rows
deviance_explained fits a null model, calculates the deviance relative to
a saturated model for both the original and the null model, and uses these
to calculate the proportion of deviance explained.
This implementation conditions upon the maximum likelihood estimate of fixed effects and the empirical Bayes ("plug-in") prediction of random effects. It can be described as "conditional deviance explained". A state-space model that estimates measurement error variance approaching zero (i.e., collapses to a process-error-only model) will have a conditional deviance explained that approaches 1.0
For several families (tweedie, negbin1, negbin2, and student), the null model is fitted using the MLE for an overdispersion parameter from the full model. This is done because, e.g., the negbin1 and negbin2 only belong to the exponential family when the overdispersion parameter is fixed, and the deviance relative to a saturated model is only defined for the exponential family.
deviance_explained(x, null_formula, null_delta_formula = ~1)deviance_explained(x, null_formula, null_delta_formula = ~1)
x |
output from |
null_formula |
formula for the null model. If missing, it uses
|
null_delta_formula |
formula for the null model for the delta component.
If missing, it uses
|
the proportion of conditional deviance explained.
Additional families compatible with tinyVAST().
delta_lognormal(link1, link2 = "log", type = c("standard", "poisson-link")) delta_gamma(link1, link2 = "log", type = c("standard", "poisson-link"))delta_lognormal(link1, link2 = "log", type = c("standard", "poisson-link")) delta_gamma(link1, link2 = "log", type = c("standard", "poisson-link"))
link1 |
Link for first part of delta/hurdle model. |
link2 |
Link for second part of delta/hurdle model. |
type |
Delta/hurdle family type. |
link |
Link. |
A list with elements common to standard R family objects including family,
link, linkfun, and linkinv. Delta/hurdle model families also have
elements delta (logical) and type (standard vs. Poisson-link).
Poisson-link delta families:
Thorson, J.T. 2018. Three problems with the conventional delta-model for biomass sampling data, and a computationally efficient alternative. Canadian Journal of Fisheries and Aquatic Sciences, 75(9), 1369-1382. doi:10.1139/cjfas-2017-0266
Poisson-link delta families:
Thorson, J.T. 2018. Three problems with the conventional delta-model for biomass sampling data, and a computationally efficient alternative. Canadian Journal of Fisheries and Aquatic Sciences, 75(9), 1369-1382. doi:10.1139/cjfas-2017-0266
S3 generic from package insight, used for crossvalidation
## S3 method for class 'tinyVAST' get_data(x, ...)## S3 method for class 'tinyVAST' get_data(x, ...)
x |
output from |
... |
not used |
S3 generic from package cv, used for crossvalidation
## S3 method for class 'tinyVAST' GetResponse(model, ...)## S3 method for class 'tinyVAST' GetResponse(model, ...)
model |
output from |
... |
not used |
Calculates an estimator for a derived quantity by summing across multiple predictions. This can be used to approximate an integral when estimating area-expanded abundance, abundance-weighting a covariate to calculate distribution shifts, and/or weighting one model variable by another.
integrate_output( object, newdata, area, block = rep(1, nrow(newdata)), type = rep(1, nrow(newdata)), weighting_index, covariate, getsd = TRUE, bias.correct = TRUE, apply.epsilon = FALSE, intern = FALSE )integrate_output( object, newdata, area, block = rep(1, nrow(newdata)), type = rep(1, nrow(newdata)), weighting_index, covariate, getsd = TRUE, bias.correct = TRUE, apply.epsilon = FALSE, intern = FALSE )
object |
Output from |
newdata |
New data-frame of independent variables used to predict the response,
where a total value is calculated by combining across these individual predictions.
If these locations are randomly drawn from a specified spatial domain, then
|
area |
vector of values used for area-weighted expansion of
estimated density surface for each row of |
block |
vector of integers, indicating blocks of predictions that are combined into one or more derived quantities. This can be used to compute area-expanded indices for more than one year or category using a single call, and might be substantially faster for large models (because it avoids extra model builds for each derived quantity) |
type |
Integer-vector indicating what type of expansion to apply to
each row of
|
weighting_index |
integer-vector used to indicate a previous row
that is used to calculate a weighted average that is then
applied to the given row of |
covariate |
numeric-vector used to provide a covariate
that is used in expansion, e.g., to provide positional
coordinates when calculating the abundance-weighted centroid with respect
to that coordinate. Only used for when |
getsd |
logical indicating whether to get the standard error, where
|
bias.correct |
logical indicating if bias correction should be applied using
standard methods in |
apply.epsilon |
Apply epsilon bias correction using a manual calculation rather than using the conventional method in TMB::sdreport? See details for more information. |
intern |
Do Laplace approximation on C++ side? Passed to |
Analysts will often want to calculate some value by combining the predicted response at multiple
locations, and potentially from multiple variables in a multivariate analysis.
This arises in a univariate model, e.g., when calculating the integral under a predicted
density function, which is approximated using a midpoint or Monte Carlo approximation
by calculating the linear predictors at each location newdata,
applying the inverse-link-trainsformation,
and calling this predicted response mu_g. Total abundance is then be approximated
by multiplying mu_g by the area associated with each midpoint or Monte Carlo
approximation point (supplied by argument area),
and summing across these area-expanded values.
In more complicated cases, an analyst can then use covariate
to calculate the weighted average
of a covariate for each midpoint location. For example, if the covariate is
positional coordinates or depth/elevation, then type=2
measures shifts in the average habitat utilization with respect to that covariate.
Alternatively, an analyst fitting a multivariate model might weight one variable
based on another using weighting_index, e.g.,
to calculate abundance-weighted average condition, or
predator-expanded stomach contents.
In practice, spatial integration in a multivariate model requires two passes through the rows of
newdata when calculating a total value. In the following, we
write equations using C++ indexing conventions such that indexing starts with 0,
to match the way that integrate_output expects indices to be supplied.
Given inverse-link-transformed predictor ,
function argument type as
function argument area as ,
function argument covariate as ,
function argument weighting_index as \eqn{ h_g }
function argument weighting_index as \eqn{ h_g }
the first pass calculates:
where the total value from this first pass is calculated as:
The second pass then applies a further weighting, which depends upon ,
and potentially upon and .
If then
If then
If then
If then
If then
Finally, the total value from this second pass is calculated as:
and is outputted by integrate_output,
along with a standard error and potentially using
the epsilon bias-correction estimator to correct for skewness and retransformation
bias.
Standard bias-correction using bias.correct=TRUE can be slow, and in
some cases it might be faster to do apply.epsilon=TRUE and intern=TRUE.
However, that option is somewhat experimental, and a user might want to confirm
that the two options give identical results. Similarly, using bias.correct=TRUE
will still calculate the standard-error, whereas using
apply.epsilon=TRUE and intern=TRUE will not.
A vector containing the plug-in estimate, standard error, the epsilon bias-corrected
estimate if available, and the standard error for the bias-corrected estimator.
Depending upon settings, one or more of these will be NA values, and the
function can be repeatedly called to get multiple estimators and/or statistics.
Extract the (marginal) log-likelihood of a tinyVAST model
## S3 method for class 'tinyVAST' logLik(object, ...)## S3 method for class 'tinyVAST' logLik(object, ...)
object |
output from |
... |
not used |
object of class logLik with attributes
val |
log-likelihood |
df |
number of parameters |
make_dsem_ram converts SEM arrow notation to ram describing SEM parameters
make_dsem_ram( dsem, times, variables, covs = NULL, quiet = FALSE, remove_na = TRUE )make_dsem_ram( dsem, times, variables, covs = NULL, quiet = FALSE, remove_na = TRUE )
dsem |
dynamic structural equation model structure,
passed to either |
times |
A character vector listing the set of times in order |
variables |
A character vector listing the set of variables |
covs |
optional: a character vector of one or more elements, with each element
giving a string of variable names, separated by commas. Variances and covariances
among all variables in each such string are added to the model. For confirmatory
factor analysis models specified via |
quiet |
Boolean indicating whether to print messages to terminal |
remove_na |
Boolean indicating whether to remove NA values from RAM (default) or not.
|
RAM specification using arrow-and-lag notation
Each line of the RAM specification for make_dsem_ram consists of four (unquoted) entries,
separated by commas:
This is a simple formula, of the form
A -> B or, equivalently, B <- A for a regression
coefficient (i.e., a single-headed or directional arrow);
A <-> A for a variance or A <-> B for a covariance
(i.e., a double-headed or bidirectional arrow). Here, A and
B are variable names in the model. If a name does not correspond
to an observed variable, then it is assumed to be a latent variable.
Spaces can appear freely in an arrow specification, and
there can be any number of hyphens in the arrows, including zero: Thus,
e.g., A->B, A --> B, and A>B are all legitimate
and equivalent.
An integer specifying whether the linkage
is simultaneous (lag=0) or lagged (e.g., X -> Y, 1, XtoY
indicates that X in time T affects Y in time T+1), where
only one-headed arrows can be lagged. Using positive values to indicate lags
then matches the notational convention used in package dynlm.
The name of the regression coefficient, variance,
or covariance specified by the arrow. Assigning the same name to two or
more arrows results in an equality constraint. Specifying the parameter name
as NA produces a fixed parameter.
start value for a free parameter or value of a fixed parameter.
If given as NA (or simply omitted), the model is provide a default
starting value.
Lines may end in a comment following #. The function extends code copied from package
sem under licence GPL (>= 2) with permission from John Fox.
Simultaneous autoregressive process for simultaneous and lagged effects
This text then specifies linkages in a multivariate time-series model for variables
with dimensions for times and variables.
make_dsem_ram then parses this text to build a path matrix with
dimensions , where
represents the impact of on , where
and . This path matrix defines a simultaneous equation
where is a matrix of exogenous errors with covariance ,
where is the Cholesky of exogenous covariance. This
simultaneous autoregressive (SAR) process then results in having covariance:
Usefully, it is also easy to compute the inverse-covariance (precision) matrix :
Example: univariate and first-order autoregressive model
This simultaneous autoregressive (SAR) process across variables and times
allows the user to specify both simultaneous effects (effects among variables within
year ) and lagged effects (effects among variables among years ).
As one example, consider a univariate and first-order autoregressive process where .
with independent errors. This is specified by passing dsem = X -> X, 1, rho; X <-> X, 0, sigma to make_dsem_ram.
This is then parsed to a RAM:
| heads | to | from | paarameter | start |
| 1 | 2 | 1 | 1 | NA |
| 1 | 3 | 2 | 1 | NA |
| 1 | 4 | 3 | 1 | NA |
| 2 | 1 | 1 | 2 | NA |
| 2 | 2 | 2 | 2 | NA |
| 2 | 3 | 3 | 2 | NA |
| 2 | 4 | 4 | 2 | NA |
Rows of this RAM where heads=1 are then interpreted to construct the path matrix :
\deqn{ \mathbf P = \begin{bmatrix}
0 & 0 & 0 & 0 \
\rho & 0 & 0 & 0 \
0 & \rho & 0 & 0 \
0 & 0 & \rho & 0\
\end{bmatrix} }
While rows where heads=2 are interpreted to construct the Cholesky of exogenous covariance :
\deqn{ \mathbf \Gamma = \begin{bmatrix}
\sigma & 0 & 0 & 0 \
0 & \sigma & 0 & 0 \
0 & 0 & \sigma & 0 \
0 & 0 & 0 & \sigma\
\end{bmatrix} }
with two estimated parameters . This then results in covariance:
\deqn{ \mathrm{Cov}(\mathbf X) = \sigma^2 \begin{bmatrix}
1 & \rho^1 & \rho^2 & \rho^3 \
\rho^1 & 1 & \rho^1 & \rho^2 \
\rho^2 & \rho^1 & 1 & \rho^1 \
\rho^3 & \rho^2 & \rho^1 & 1\
\end{bmatrix} }
Similarly, the arrow-and-lag notation can be used to specify a SAR representing a conventional structural equation model (SEM), cross-lagged (a.k.a. vector autoregressive) models (VAR), dynamic factor analysis (DFA), or many other time-series models.
A reticular action module (RAM) describing dependencies
# Univariate AR1 dsem = " X -> X, 1, rho X <-> X, 0, sigma " make_dsem_ram( dsem=dsem, variables="X", times=1:4 ) # Univariate AR2 dsem = " X -> X, 1, rho1 X -> X, 2, rho2 X <-> X, 0, sigma " make_dsem_ram( dsem=dsem, variables="X", times=1:4 ) # Bivariate VAR dsem = " X -> X, 1, XtoX X -> Y, 1, XtoY Y -> X, 1, YtoX Y -> Y, 1, YtoY X <-> X, 0, sdX Y <-> Y, 0, sdY " make_dsem_ram( dsem=dsem, variables=c("X","Y"), times=1:4 ) # Dynamic factor analysis with one factor and two manifest variables # (specifies a random-walk for the factor, and miniscule residual SD) dsem = " factor -> X, 0, loadings1 factor -> Y, 0, loadings2 factor -> factor, 1, NA, 1 X <-> X, 0, NA, 0 # No additional variance Y <-> Y, 0, NA, 0 # No additional variance " make_dsem_ram( dsem=dsem, variables=c("X","Y","factor"), times=1:4 ) # ARIMA(1,1,0) dsem = " factor -> factor, 1, rho1 # AR1 component X -> X, 1, NA, 1 # Integrated component factor -> X, 0, NA, 1 X <-> X, 0, NA, 0 # No additional variance " make_dsem_ram( dsem=dsem, variables=c("X","factor"), times=1:4 ) # ARIMA(0,0,1) dsem = " factor -> X, 0, NA, 1 factor -> X, 1, rho1 # MA1 component X <-> X, 0, NA, 0 # No additional variance " make_dsem_ram( dsem=dsem, variables=c("X","factor"), times=1:4 )# Univariate AR1 dsem = " X -> X, 1, rho X <-> X, 0, sigma " make_dsem_ram( dsem=dsem, variables="X", times=1:4 ) # Univariate AR2 dsem = " X -> X, 1, rho1 X -> X, 2, rho2 X <-> X, 0, sigma " make_dsem_ram( dsem=dsem, variables="X", times=1:4 ) # Bivariate VAR dsem = " X -> X, 1, XtoX X -> Y, 1, XtoY Y -> X, 1, YtoX Y -> Y, 1, YtoY X <-> X, 0, sdX Y <-> Y, 0, sdY " make_dsem_ram( dsem=dsem, variables=c("X","Y"), times=1:4 ) # Dynamic factor analysis with one factor and two manifest variables # (specifies a random-walk for the factor, and miniscule residual SD) dsem = " factor -> X, 0, loadings1 factor -> Y, 0, loadings2 factor -> factor, 1, NA, 1 X <-> X, 0, NA, 0 # No additional variance Y <-> Y, 0, NA, 0 # No additional variance " make_dsem_ram( dsem=dsem, variables=c("X","Y","factor"), times=1:4 ) # ARIMA(1,1,0) dsem = " factor -> factor, 1, rho1 # AR1 component X -> X, 1, NA, 1 # Integrated component factor -> X, 0, NA, 1 X <-> X, 0, NA, 0 # No additional variance " make_dsem_ram( dsem=dsem, variables=c("X","factor"), times=1:4 ) # ARIMA(0,0,1) dsem = " factor -> X, 0, NA, 1 factor -> X, 1, rho1 # MA1 component X <-> X, 0, NA, 0 # No additional variance " make_dsem_ram( dsem=dsem, variables=c("X","factor"), times=1:4 )
make_eof_ram converts SEM arrow notation to ram describing SEM parameters
make_eof_ram( times, variables, n_eof, remove_na = TRUE, standard_deviations = "unequal" )make_eof_ram( times, variables, n_eof, remove_na = TRUE, standard_deviations = "unequal" )
times |
A character vector listing the set of times in order |
variables |
A character vector listing the set of variables |
n_eof |
Number of EOF modes of variability to estimate |
remove_na |
Boolean indicating whether to remove NA values from RAM (default) or not.
|
standard_deviations |
One of |
A reticular action module (RAM) describing dependencies
# Two EOFs for two variables make_eof_ram( times = 2010:2020, variables = c("pollock","cod"), n_eof=2 )# Two EOFs for two variables make_eof_ram( times = 2010:2020, variables = c("pollock","cod"), n_eof=2 )
Convert an sf areal model object to a class "nngp_domain" that is recognized
by tinyVAST for nearest neighbors Gaussian process models
make_nngp_domain(sf_areal, nn)make_nngp_domain(sf_areal, nn)
sf_areal |
an sf or sfc object containing "POLYGON" and "MULTIPOLYGON" types |
nn |
Number of nearest neighbors |
a list of NNGP data objects created by tinyVAST:::make_nngp_data, as well as the provided sf object
make_sem_ram converts SEM arrow notation to ram describing SEM parameters
make_sem_ram(sem, variables, quiet = FALSE, covs = variables)make_sem_ram(sem, variables, quiet = FALSE, covs = variables)
sem |
structural equation model structure, passed to either |
variables |
A character vector listing the set of variables |
quiet |
if |
covs |
optional: a character vector of one or more elements, with each element
giving a string of variable names, separated by commas. Variances and covariances
among all variables in each such string are added to the model. For confirmatory
factor analysis models specified via |
An S3-class "sem_ram" containing:
modelOutput from specifyEquations or specifyModel
that defines paths and parameters
ramreticular action module (RAM) describing dependencies
parse_path is copied from sem::parse.path
parse_path(path)parse_path(path)
path |
character string indicating a one-headed or two-headed path in a structural equation model |
Copied from package sem under licence GPL (>= 2) with permission from John Fox
Tagged-list defining variables and direction for a specified path coefficient
Plot nngp_domain
## S3 method for class 'nngp_domain' plot(x, y, ...)## S3 method for class 'nngp_domain' plot(x, y, ...)
x |
output from |
y |
not used |
... |
not used |
Predicts values given new covariates using a tinyVAST model
## S3 method for class 'tinyVAST' predict( object, newdata, remove_origdata = FALSE, what = c("mu_g", "p_g", "p1_g", "palpha1_g", "pgamma1_g", "pepsilon1_g", "pomega1_g", "pdelta1_g", "pxi1_g", "p2_g", "palpha2_g", "pgamma2_g", "pepsilon2_g", "pomega2_g", "pdelta2_g", "pxi2_g"), se.fit = FALSE, bias.correct = FALSE, ... )## S3 method for class 'tinyVAST' predict( object, newdata, remove_origdata = FALSE, what = c("mu_g", "p_g", "p1_g", "palpha1_g", "pgamma1_g", "pepsilon1_g", "pomega1_g", "pdelta1_g", "pxi1_g", "p2_g", "palpha2_g", "pgamma2_g", "pepsilon2_g", "pomega2_g", "pdelta2_g", "pxi2_g"), se.fit = FALSE, bias.correct = FALSE, ... )
object |
Output from |
newdata |
New data-frame of independent variables used to predict the response. |
remove_origdata |
Whether to eliminate the original data from the TMB object, thereby speeding up the TMB object construction. However, this also eliminates information about random-effect variance, and is not appropriate when requesting predictive standard errors or epsilon bias-correction. |
what |
What REPORTed object to output, where
|
se.fit |
Calculate standard errors? |
bias.correct |
whether to epsilon bias-correct the predicted value |
... |
Not used. |
Either a vector with the prediction for each row of newdata, or a named list
with the prediction and standard error (when se.fit = TRUE).
print summary of tinyVAST model
## S3 method for class 'tinyVAST' print(x, ...)## S3 method for class 'tinyVAST' print(x, ...)
x |
output from |
... |
not used |
invisibly returns a named list of key model outputs and summary statements
Projects a fitted model forward in time.
project( object, extra_times, newdata, what = "mu_g", future_var = TRUE, past_var = FALSE, parm_var = FALSE )project( object, extra_times, newdata, what = "mu_g", future_var = TRUE, past_var = FALSE, parm_var = FALSE )
object |
fitted model from |
extra_times |
a vector of extra times, matching values in |
newdata |
data frame including new values for |
what |
What REPORTed object to output, where
|
future_var |
logical indicating whether to simulate future process errors from GMRFs, or just compute the predictive mean |
past_var |
logical indicating whether to re-simulate past process errors from predictive distribution of random effects, thus changing the boundary condition of the forecast |
parm_var |
logical indicating whether to re-sample fixed effects from their predictive distribution, thus changing the GMRF for future process errors |
A vector of values corresponding to rows in newdata
# Convert to long-form set.seed(123) n_obs = 100 rho = 0.9 sigma_x = 0.2 sigma_y = 0.1 x = rnorm(n_obs, mean=0, sd = sigma_x) for(i in 2:length(x)) x[i] = rho * x[i-1] + x[i] y = x + rnorm( length(x), mean = 0, sd = sigma_y ) data = data.frame( "val" = y, "var" = "y", "time" = seq_along(y) ) # Define AR2 time_term time_term = " y -> y, 1, rho1 y -> y, 2, rho2 y <-> y, 0, sd " # fit model mytiny = tinyVAST( time_term = time_term, data = data, times = unique(data$t), variables = "y", formula = val ~ 1, control = tinyVASTcontrol( getJointPrecision = TRUE ) ) # Deterministic projection extra_times = length(x) + 1:100 n_sims = 10 newdata = data.frame( "time" = c(seq_along(x),extra_times), "var" = "y" ) Y = project( mytiny, newdata = newdata, extra_times = extra_times, future_var = FALSE ) plot( x = seq_along(Y), y = Y, type = "l", lty = "solid", col = "black" ) # Stochastic projection with future process errors ## Not run: extra_times = length(x) + 1:100 n_sims = 10 newdata = data.frame( "time" = c(seq_along(x),extra_times), "var" = "y" ) Y = NULL for(i in seq_len(n_sims) ){ tmp = project( mytiny, newdata = newdata, extra_times = extra_times, future_var = TRUE, past_var = TRUE, parm_var = TRUE ) Y = cbind(Y, tmp) } matplot( x = row(Y), y = Y, type = "l", lty = "solid", col = "black" ) ## End(Not run)# Convert to long-form set.seed(123) n_obs = 100 rho = 0.9 sigma_x = 0.2 sigma_y = 0.1 x = rnorm(n_obs, mean=0, sd = sigma_x) for(i in 2:length(x)) x[i] = rho * x[i-1] + x[i] y = x + rnorm( length(x), mean = 0, sd = sigma_y ) data = data.frame( "val" = y, "var" = "y", "time" = seq_along(y) ) # Define AR2 time_term time_term = " y -> y, 1, rho1 y -> y, 2, rho2 y <-> y, 0, sd " # fit model mytiny = tinyVAST( time_term = time_term, data = data, times = unique(data$t), variables = "y", formula = val ~ 1, control = tinyVASTcontrol( getJointPrecision = TRUE ) ) # Deterministic projection extra_times = length(x) + 1:100 n_sims = 10 newdata = data.frame( "time" = c(seq_along(x),extra_times), "var" = "y" ) Y = project( mytiny, newdata = newdata, extra_times = extra_times, future_var = FALSE ) plot( x = seq_along(Y), y = Y, type = "l", lty = "solid", col = "black" ) # Stochastic projection with future process errors ## Not run: extra_times = length(x) + 1:100 n_sims = 10 newdata = data.frame( "time" = c(seq_along(x),extra_times), "var" = "y" ) Y = NULL for(i in seq_len(n_sims) ){ tmp = project( mytiny, newdata = newdata, extra_times = extra_times, future_var = TRUE, past_var = TRUE, parm_var = TRUE ) Y = cbind(Y, tmp) } matplot( x = row(Y), y = Y, type = "l", lty = "solid", col = "black" ) ## End(Not run)
Data to estimate predator-expanded stomach contents (PESC), i.e.,
a multi-prey single-predator model for stomach contents of red grouper
in the Gulf of Mexico that also estimates biomass density for red grouper.
Diet proportions are then the product of predicted diet proportions and prey
specific consumption, normalized across prey categories. Copied from
VAST data set PESC_example_red_grouper
data(red_grouper_diet)data(red_grouper_diet)
Data used to demonstrate and test analysis using multiple data types
data(red_snapper)data(red_snapper)
Spatial extent used for red snapper analysis, derived from Chap-7 of doi:10.1201/9781003410294
data(red_snapper_shapefile)data(red_snapper_shapefile)
reload_model allows a user to save a fitted model, reload it in a new
R terminal, and then relink the DLLs so that it functions as expected.
reload_model(x, check_gradient = TRUE)reload_model(x, check_gradient = TRUE)
x |
Output from |
check_gradient |
Whether to check the gradients of the reloaded model |
Output from tinyVAST with DLLs relinked
Calculate residuals
## S3 method for class 'tinyVAST' residuals(object, type = c("deviance", "response"), ...)## S3 method for class 'tinyVAST' residuals(object, type = c("deviance", "response"), ...)
object |
Output from |
type |
which type of residuals to compute (only option is |
... |
Note used |
a vector residuals, associated with each row of data supplied during fitting
This function provides a random number generator for
the multivariate normal distribution with mean equal
to mu and sparse precision matrix prec.
rmvnorm_prec(prec, n = 1, mu = rep(0, nrow(prec)))rmvnorm_prec(prec, n = 1, mu = rep(0, nrow(prec)))
prec |
sparse precision (inverse-covariance) matrix. |
n |
number of observations. |
mu |
mean vector. |
a matrix with dimension length(mu) by
n, containing realized draws from the specified
mean and precision
Rotate lower-triangle loadings matrix to order factors from largest to smallest variance.
rotate_pca( L_tf, x_sf = matrix(0, nrow = 0, ncol = ncol(L_tf)), order = c("none", "increasing", "decreasing") )rotate_pca( L_tf, x_sf = matrix(0, nrow = 0, ncol = ncol(L_tf)), order = c("none", "increasing", "decreasing") )
L_tf |
Loadings matrix with dimension |
x_sf |
Spatial response with dimensions |
order |
Options for resolving label-switching via reflecting
each factor to achieve a given order across dimension |
List containing the rotated loadings L_tf,
the inverse-rotated response matrix x_sf,
and the rotation H
# Extract covariance for spatial factor analysis (too slow for CRAN) # Simulate settings set.seed(101) theta_xy = 0.4 n_x = n_y = 10 n_c = 3 # Number of species n_f = 1 # Number of factors rho = 0.8 resid_sd = 0.5 # Simulate GMRFs R_s = exp(-theta_xy * abs(outer(1:n_x, 1:n_y, FUN="-")) ) R_ss = kronecker(X=R_s, Y=R_s) delta_fs = mvtnorm::rmvnorm(n_c, sigma=R_ss ) # Simulate loadings for two factors L_cf = matrix( rnorm(n_c^2), nrow=n_c ) L_cf[,seq(from=n_f+1, to=n_c)] = 0 L_cf = L_cf + resid_sd * diag(n_c) # Simulate correlated densities d_cs = L_cf %*% delta_fs # Shape into longform data-frame and add error Data = data.frame( expand.grid(species=1:n_c, x=1:n_x, y=1:n_y), "var"="logn", "z"=exp(as.vector(d_cs)) ) Data$n = rnorm( n=nrow(Data), mean=Data$z, sd=1 ) # make mesh mesh = fmesher::fm_mesh_2d( Data[,c('x','y')] ) # Specify factor model with two factors and additional independent variance with shared SD sem = " # Loadings matrix f1 -> 1, l_1.1 f1 -> 2, l_1.2 f1 -> 3, l_1.3 f2 -> 2, l_2.2 f2 -> 3, l_3.2 # Factor variance = 1 f1 <-> f1, NA, 1 f2 <-> f2, NA, 1 # Shared residual variance 1 <-> 1, sd, 1 2 <-> 2, sd, 1 3 <-> 3, sd, 1 " # fit model vars = c( "f1", "f2", 1:n_c ) out = tinyVAST( space_term = sem, data = Data, formula = n ~ 0 + factor(species), spatial_domain = mesh, variables = vars, space_columns = c("x","y"), variable_column = "species", time_column = "time", distribution_column = "dist", control = tinyVASTcontrol( extra_report = TRUE ) ) # Extract loadings from factor to species L_cf = out$rep$Rho_cc[ match(1:n_c, vars), match(c("f1","f2"), vars), drop = FALSE] # Extract spatial values for factors omega_sf = out$internal$parlist$omega_sc[, match(c("f1","f2"), vars), drop = FALSE] # Apply rotation Rotation = rotate_pca( L_cf, x_sf = omega_sf ) # Extract rotated loadings Lprime_cf = Rotation$L_tf # Extract rotated factor values for each mesh location Oprime_sf = Rotation$x_sf# Extract covariance for spatial factor analysis (too slow for CRAN) # Simulate settings set.seed(101) theta_xy = 0.4 n_x = n_y = 10 n_c = 3 # Number of species n_f = 1 # Number of factors rho = 0.8 resid_sd = 0.5 # Simulate GMRFs R_s = exp(-theta_xy * abs(outer(1:n_x, 1:n_y, FUN="-")) ) R_ss = kronecker(X=R_s, Y=R_s) delta_fs = mvtnorm::rmvnorm(n_c, sigma=R_ss ) # Simulate loadings for two factors L_cf = matrix( rnorm(n_c^2), nrow=n_c ) L_cf[,seq(from=n_f+1, to=n_c)] = 0 L_cf = L_cf + resid_sd * diag(n_c) # Simulate correlated densities d_cs = L_cf %*% delta_fs # Shape into longform data-frame and add error Data = data.frame( expand.grid(species=1:n_c, x=1:n_x, y=1:n_y), "var"="logn", "z"=exp(as.vector(d_cs)) ) Data$n = rnorm( n=nrow(Data), mean=Data$z, sd=1 ) # make mesh mesh = fmesher::fm_mesh_2d( Data[,c('x','y')] ) # Specify factor model with two factors and additional independent variance with shared SD sem = " # Loadings matrix f1 -> 1, l_1.1 f1 -> 2, l_1.2 f1 -> 3, l_1.3 f2 -> 2, l_2.2 f2 -> 3, l_3.2 # Factor variance = 1 f1 <-> f1, NA, 1 f2 <-> f2, NA, 1 # Shared residual variance 1 <-> 1, sd, 1 2 <-> 2, sd, 1 3 <-> 3, sd, 1 " # fit model vars = c( "f1", "f2", 1:n_c ) out = tinyVAST( space_term = sem, data = Data, formula = n ~ 0 + factor(species), spatial_domain = mesh, variables = vars, space_columns = c("x","y"), variable_column = "species", time_column = "time", distribution_column = "dist", control = tinyVASTcontrol( extra_report = TRUE ) ) # Extract loadings from factor to species L_cf = out$rep$Rho_cc[ match(1:n_c, vars), match(c("f1","f2"), vars), drop = FALSE] # Extract spatial values for factors omega_sf = out$internal$parlist$omega_sc[, match(c("f1","f2"), vars), drop = FALSE] # Apply rotation Rotation = rotate_pca( L_cf, x_sf = omega_sf ) # Extract rotated loadings Lprime_cf = Rotation$L_tf # Extract rotated factor values for each mesh location Oprime_sf = Rotation$x_sf
Data used to demonstrate and test multivariate second-order autoregressive models using a simultaneous autoregressive (SAR) process across regions. Data are from doi:10.1002/mcf2.10023
data(salmon_returns)data(salmon_returns)
sample_variable samples from the joint distribution of random (and optionally fixed) effects
to approximate the predictive distribution for a variable.
sample_variable( object, newdata = NULL, variable_name = "mu_i", n_samples = 100, sample_fixed = TRUE, seed = 123456 )sample_variable( object, newdata = NULL, variable_name = "mu_i", n_samples = 100, sample_fixed = TRUE, seed = 123456 )
object |
output from |
newdata |
data frame of new data, used to sample model components for predictions e.g., |
variable_name |
name of variable available in report using |
n_samples |
number of samples from the joint predictive distribution for fixed and random effects. Default is 100, which is slow. |
sample_fixed |
whether to sample fixed and random effects, |
seed |
integer used to set random-number seed when sampling variables, as passed to |
Using sample_fixed=TRUE (the default) in sample_variable propagates variance in both fixed and random effects, while
using sample_fixed=FALSE does not.
Sampling fixed effects will sometimes cause numerical under- or overflow (i.e., output values of NA) in cases when
variance parameters are estimated imprecisely. In these cases, the multivariate normal approximation being used is a poor
representation of the tail probabilities, and results in some samples with implausibly high (or negative) variances,
such that the associated random effects then have implausibly high magnitude.
A matrix with a row for each data supplied during fitting, and
n_samples columns, where each column in a vector of samples
for a requested quantity given sampled uncertainty in fixed and/or random effects
set.seed(101) x = runif(n = 100, min = 0, max = 2*pi) y = 1 + sin(x) + 0.1 * rnorm(100) # Do fit with getJointPrecision=TRUE fit = tinyVAST( formula = y ~ s(x), data = data.frame(x=x,y=y) ) # samples from distribution for the mean # excluding fixed effects due to CRAN checks samples = sample_variable(fit, sample_fixed = FALSE)set.seed(101) x = runif(n = 100, min = 0, max = 2*pi) y = 1 + sin(x) + 0.1 * rnorm(100) # Do fit with getJointPrecision=TRUE fit = tinyVAST( formula = y ~ s(x), data = data.frame(x=x,y=y) ) # samples from distribution for the mean # excluding fixed effects due to CRAN checks samples = sample_variable(fit, sample_fixed = FALSE)
Sanity check of a tinyVAST model
sanity(object, big_sd_log10 = 2, gradient_thresh = 0.001, silent = FALSE)sanity(object, big_sd_log10 = 2, gradient_thresh = 0.001, silent = FALSE)
object |
Fitted model from |
big_sd_log10 |
Value to check size of standard errors against. A value
of 2 would indicate that standard errors greater than |
gradient_thresh |
Gradient threshold to issue warning. |
silent |
Logical: suppress messages? Useful to set to |
An invisible named list of checks.
Data used to demonstrate and test empirical orthogonal function generalized linear latent variable model (EOF-GLLVM)
data(sea_ice)data(sea_ice)
Make sparse matrix to project from stream-network nodes to user-supplied points
sfnetwork_evaluator(stream, loc, tolerance = 0.01)sfnetwork_evaluator(stream, loc, tolerance = 0.01)
stream |
sfnetworks object representing stream network |
loc |
sf object representing points to which are being projected |
tolerance |
error-check tolerance |
the sparse interpolation matrix, with rows for each row of data
supplied during fitting and columns for each spatial random effect.
make an object representing spatial information required
to specify a stream-network spatial domain, similar in usage to
link[fmesher]{fm_mesh_2d} for a 2-dimensional continuous domain
sfnetwork_mesh(stream)sfnetwork_mesh(stream)
stream |
sfnetworks object representing stream network |
An object (list) of class sfnetwork_mesh. Elements include:
The number of random effects used to represent the network
a table containing a description of parent nodes (from), childen nodes (to), and the distance separating them
copy of the stream network object passed as argument
Simulate values from a GMRF using a tail-down (flow-unconnected) exponential model on a stream network
simulate_sfnetwork(sfnetwork_mesh, theta, n = 1, what = c("samples", "Q"))simulate_sfnetwork(sfnetwork_mesh, theta, n = 1, what = c("samples", "Q"))
sfnetwork_mesh |
Output from |
theta |
Decorrelation rate |
n |
number of simulated GMRFs |
what |
Whether to return the simulated GMRF or its precision matrix |
a matrix of simulated values for a Gaussian Markov random field
arising from a stream-network spatial domain, with row for each spatial random
effect and n columns, using the sparse precision matrix
defined in Charsley et al. (2023)
Charsley, A. R., Gruss, A., Thorson, J. T., Rudd, M. B., Crow, S. K., David, B., Williams, E. K., & Hoyle, S. D. (2023). Catchment-scale stream network spatio-temporal models, applied to the freshwater stages of a diadromous fish species, longfin eel (Anguilla dieffenbachii). Fisheries Research, 259, 106583. doi:10.1016/j.fishres.2022.106583
simulate.tinyVAST is an S3 method for producing a matrix of simulations from
a fitted model. It can be used with the DHARMa package
among other uses. Code is modified from the version in sdmTMB
## S3 method for class 'tinyVAST' simulate( object, nsim = 1L, seed = sample.int(1e+06, 1L), type = c("mle-eb", "mle-mvn"), ... )## S3 method for class 'tinyVAST' simulate( object, nsim = 1L, seed = sample.int(1e+06, 1L), type = c("mle-eb", "mle-mvn"), ... )
object |
output from |
nsim |
how many simulations to do |
seed |
random seed |
type |
How parameters should be treated. |
... |
not used |
A matrix with row for each row of data in the fitted model and nsim
columns, containing new samples from the fitted model.
set.seed(101) x = seq(0, 2*pi, length=100) y = sin(x) + 0.1*rnorm(length(x)) fit = tinyVAST( data=data.frame(x=x,y=y), formula = y ~ s(x) ) sims = simulate(fit, nsim=100, type="mle-mvn") if(requireNamespace("DHARMa")){ # simulate new data conditional on fixed effects # and sampling random effects from their predictive distribution y_iz = simulate(fit, nsim=500, type="mle-mvn") # Visualize using DHARMa res = DHARMa::createDHARMa( simulatedResponse = y_iz, observedResponse = y, fittedPredictedResponse = fitted(fit) ) plot(res) }set.seed(101) x = seq(0, 2*pi, length=100) y = sin(x) + 0.1*rnorm(length(x)) fit = tinyVAST( data=data.frame(x=x,y=y), formula = y ~ s(x) ) sims = simulate(fit, nsim=100, type="mle-mvn") if(requireNamespace("DHARMa")){ # simulate new data conditional on fixed effects # and sampling random effects from their predictive distribution y_iz = simulate(fit, nsim=500, type="mle-mvn") # Visualize using DHARMa res = DHARMa::createDHARMa( simulatedResponse = y_iz, observedResponse = y, fittedPredictedResponse = fitted(fit) ) plot(res) }
Extract the approximated spatial correlation between one coordinate and other coordinates using a sparse precision and SPDE mesh
spatial_cor(Q, mesh, coord, pred)spatial_cor(Q, mesh, coord, pred)
Q |
sparse precision matrix |
mesh |
SPDE mesh |
coord |
vector of length-2 with spatial coordinates for focal point |
pred |
matrix with two columns and multiple rows, with location for points to predict correlation |
A vector with length nrow(pred) giving the spatial correlation
summarize parameters from a fitted tinyVAST
## S3 method for class 'tinyVAST' summary( object, what = c("space_term", "time_term", "spacetime_term", "fixed"), predictor = c("one", "two"), ... )## S3 method for class 'tinyVAST' summary( object, what = c("space_term", "time_term", "spacetime_term", "fixed"), predictor = c("one", "two"), ... )
object |
Output from |
what |
What component to summarize, whether |
predictor |
whether to get the 1st or 2nd linear predictor (the latter is only applicable in delta models) |
... |
Not used |
tinyVAST includes three components:
a separable Gaussian Markov random field (GMRF) constructed from a structural equation model (SEM) and a spatial variable
a separable GMRF constructed from a a dynamic SEM (a nonseparable time-variable interaction) and a spatial variable
a generalized additive model (GAM), representing exogenous covariates
Each of these are summarized and interpreted differently, and summary.tinyVAST
facilitates this.
Regarding the DSEM componennt, tinyVAST includes an "arrow and lag"
notation, which specifies the set of
path coefficients and exogenous variance parameters to be estimated. Function tinyVAST
then estimates the maximum likelihood value for those coefficients and parameters
by maximizing the log-marginal likelihood.
However, many users will want to associate individual parameters and standard errors
with the path coefficients that were specified using the "arrow and lag" notation.
This task is complicated in
models where some path coefficients or variance parameters are specified to share a single value a priori,
or were assigned a name of NA and hence assumed to have a fixed value a priori (such that
these coefficients or parameters have an assigned value but no standard error).
The summary function therefore compiles the MLE for coefficients (including duplicating
values for any path coefficients that assigned the same value) and standard error
estimates, and outputs those in a table that associates them with the user-supplied path and parameter names.
It also outputs the z-score and a p-value arising from a two-sided Wald test (i.e.
comparing the estimate divided by standard error against a standard normal distribution).
A data-frame containing the estimate (and standard errors, two-sided Wald-test
z-value, and associated p-value if the standard errors are available) for
model parameters, including the fixed-effects specified via formula,
or the path coefficients for the spatial SEM specified via space_term,
the dynamic SEM specified via time_term, or the spatial dynamic SEM
specified via spacetime_term
Extract the covariance resulting from a specified path structure and estimated parameters for a SEM or DSEM term in tinyVAST
term_covariance( object, what = c("space_term", "time_term", "spacetime_term"), pred = c("one", "two"), n_times = NULL )term_covariance( object, what = c("space_term", "time_term", "spacetime_term"), pred = c("one", "two"), n_times = NULL )
object |
Output from |
what |
Which SEM or DSEM term to extract |
pred |
Extract the term |
n_times |
The number of times to include when calculating covariance for a DSEM
component, i.e., |
tinyVAST constructs the covariance from specified path structure and estimated parameters
The covariance matrix among variables
# Extract covariance for spatial factor analysis (too slow for CRAN) # Simulate settings set.seed(101) theta_xy = 0.4 n_x = n_y = 10 n_c = 3 # Number of species n_f = 1 # Number of factors rho = 0.8 resid_sd = 0.5 # Simulate GMRFs R_s = exp(-theta_xy * abs(outer(1:n_x, 1:n_y, FUN="-")) ) R_ss = kronecker(X=R_s, Y=R_s) delta_fs = mvtnorm::rmvnorm(n_c, sigma=R_ss ) # Simulate loadings for two factors L_cf = matrix( rnorm(n_c^2), nrow=n_c ) L_cf[,seq(from=n_f+1, to=n_c)] = 0 L_cf = L_cf + resid_sd * diag(n_c) # Simulate correlated densities d_cs = L_cf %*% delta_fs # Shape into longform data-frame and add error Data = data.frame( expand.grid(species=1:n_c, x=1:n_x, y=1:n_y), "var"="logn", "z"=exp(as.vector(d_cs)) ) Data$n = rnorm( n=nrow(Data), mean=Data$z, sd=1 ) # make mesh mesh = fmesher::fm_mesh_2d( Data[,c('x','y')] ) # Specify factor model with two factors and additional independent variance with shared SD sem = " # Loadings matrix f1 -> 1, l1 f1 -> 2, l2 f1 -> 3, l3 # Factor variance = 1 f1 <-> f1, NA, 1 # Shared residual variance 1 <-> 1, sd, 1 2 <-> 2, sd, 1 3 <-> 3, sd, 1 " # fit model out = tinyVAST( space_term = sem, data = Data, formula = n ~ 0 + factor(species), spatial_domain = mesh, variables = c( "f1", 1:n_c ), space_columns = c("x","y"), variable_column = "species", time_column = "time", distribution_column = "dist" ) # Extract covariance among species and factors, where # estimated covariance is obtained by ignoring factors V = term_covariance( out, what = "space_term", pred = "one" )# Extract covariance for spatial factor analysis (too slow for CRAN) # Simulate settings set.seed(101) theta_xy = 0.4 n_x = n_y = 10 n_c = 3 # Number of species n_f = 1 # Number of factors rho = 0.8 resid_sd = 0.5 # Simulate GMRFs R_s = exp(-theta_xy * abs(outer(1:n_x, 1:n_y, FUN="-")) ) R_ss = kronecker(X=R_s, Y=R_s) delta_fs = mvtnorm::rmvnorm(n_c, sigma=R_ss ) # Simulate loadings for two factors L_cf = matrix( rnorm(n_c^2), nrow=n_c ) L_cf[,seq(from=n_f+1, to=n_c)] = 0 L_cf = L_cf + resid_sd * diag(n_c) # Simulate correlated densities d_cs = L_cf %*% delta_fs # Shape into longform data-frame and add error Data = data.frame( expand.grid(species=1:n_c, x=1:n_x, y=1:n_y), "var"="logn", "z"=exp(as.vector(d_cs)) ) Data$n = rnorm( n=nrow(Data), mean=Data$z, sd=1 ) # make mesh mesh = fmesher::fm_mesh_2d( Data[,c('x','y')] ) # Specify factor model with two factors and additional independent variance with shared SD sem = " # Loadings matrix f1 -> 1, l1 f1 -> 2, l2 f1 -> 3, l3 # Factor variance = 1 f1 <-> f1, NA, 1 # Shared residual variance 1 <-> 1, sd, 1 2 <-> 2, sd, 1 3 <-> 3, sd, 1 " # fit model out = tinyVAST( space_term = sem, data = Data, formula = n ~ 0 + factor(species), spatial_domain = mesh, variables = c( "f1", 1:n_c ), space_columns = c("x","y"), variable_column = "species", time_column = "time", distribution_column = "dist" ) # Extract covariance among species and factors, where # estimated covariance is obtained by ignoring factors V = term_covariance( out, what = "space_term", pred = "one" )
Fits a vector autoregressive spatio-temporal (VAST) model using a minimal feature-set and a widely used interface.
tinyVAST( formula, data, time_term = NULL, space_term = NULL, spacetime_term = NULL, family = gaussian(), delta_options = list(formula = ~1), spatial_varying = NULL, weights = NULL, spatial_domain = NULL, development = list(), control = tinyVASTcontrol(), space_columns = c("x", "y"), time_column = "time", times = NULL, variable_column = "var", variables = NULL, distribution_column = "dist" )tinyVAST( formula, data, time_term = NULL, space_term = NULL, spacetime_term = NULL, family = gaussian(), delta_options = list(formula = ~1), spatial_varying = NULL, weights = NULL, spatial_domain = NULL, development = list(), control = tinyVASTcontrol(), space_columns = c("x", "y"), time_column = "time", times = NULL, variable_column = "var", variables = NULL, distribution_column = "dist" )
formula |
Formula with response on left-hand-side and predictors on right-hand-side,
parsed by |
data |
Data-frame of predictor, response, and offset variables. Also includes
variables that specify space, time, variables, and the distribution for samples,
as identified by arguments |
time_term |
Specification for time-series structural equation model structure for
constructing a time-variable interaction that defines a time-varying intercept
for each variable (i.e., applies uniformly across space).
|
space_term |
Specification for structural equation model structure for
constructing a space-variable interaction.
|
spacetime_term |
Specification for time-series structural equation model structure
including lagged or simultaneous effects for
constructing a time-variable interaction, which is then combined in
a separable process with the spatial correlation to form a
space-time-variable interaction (i.e., the interaction occurs locally at each site).
|
family |
A function returning a class |
delta_options |
a named list with slots for |
spatial_varying |
a formula specifying spatially varying coefficients (SVC). Note that
using formulas in R, |
weights |
A numeric vector representing optional likelihood weights for the data likelihood. Weights do not have to sum to one and are not internally modified. Thee weights argument needs to be a vector and not a name of the variable in the data frame. |
spatial_domain |
Object that represents spatial relationships, either using
|
development |
Specify options that are under active development. Please do not use these features without coordinating with the package authors. |
control |
Output from |
space_columns |
A string or character vector that indicates
the column(s) of |
time_column |
A character string indicating the column of |
times |
A integer vector listing the set of times in order.
If |
variable_column |
A character string indicating the column of |
variables |
A character vector listing the set of variables.
if |
distribution_column |
A character string indicating the column of |
tinyVAST includes several basic inputs that specify the model structure:
formula specifies covariates and splines in a Generalized Additive Model;
time_term specifies interactions among variables and over time that
are constant across space, constructing the time-variable interaction.
space_term specifies interactions among variables and over time that occur
based on the variable values at each location, constructing
the space-variable interaction.
spacetime_term specifies interactions among variables and over time, constructing
the space-time-variable interaction.
These inputs require defining the domain of the model. This includes:
spatial_domain specifies spatial domain, with determines spatial correlations
times specifies the temporal domain, i.e., sequence of time-steps
variables specifies the set of variables, i.e., the variables that will be modeled
The default spacetime_term=NULL and space_term=NULL turns off all multivariate
and temporal indexing, such that spatial_domain is then ignored, and the model collapses
to a generalized additive model using gam. To specify a univariate spatial model,
the user must specify spatial_domain and either space_term="" or spacetime_term="", where the latter
two are then parsed to include a single exogenous variance for the single variable
| Model type | How to specify |
| Generalized additive model | specify spatial_domain=NULL space_term="" and spacetime_term="", and then use formula to specify splines and covariates |
| Dynamic structural equation model (including vector autoregressive, dynamic factor analysis, ARIMA, and structural equation models) | specify spatial_domain=NULL and use spacetime_term to specify interactions among variables and over time |
| Univariate spatio-temporal model, or multiple independence spatio-temporal variables | specify spatial_domain and spacetime_term="", where the latter is then parsed to include a single exogenous variance for the single variable |
| Multivariate spatial model including interactions | specify spatial_domain and use space_term to specify spatial interactions |
| Vector autoregressive spatio-temporal model (i.e., lag-1 interactions among variables) | specify spatial_domain and use spacetime_term="" to specify interactions among variables and over time, where spatio-temporal variables are constructed via the separable interaction of spacetime_term and spatial_domain
|
Model building notes
binomial familes: A binomial family can be specified in only one way:
the response is the observed proportion (proportion = successes / trials),
and the 'weights' argument is used to specify the Binomial size (trials, N)
parameter (proportion ~ ..., weights = N).
factor models: If a factor model is desired, the factor(s) must be named
and included in the variables. The factor is then modeled for space_term,
time_term, and spacetime_term and it's variance must be fixed a priori
for any term where it is not being used.
An object (list) of class tinyVAST. Elements include:
Data-frame supplied during model fitting
the spatial domain supplied during fitting
the formula specified during model fitting
The TMB object from MakeADFun
The output from nlminb
The report from obj$report()
The output from sdreport
The list of inputs passed to MakeADFun
A record of the function call
Total time to run model
Objects useful for package function, i.e., all arguments passed during the call
output from deviance_explained
Details section of make_dsem_ram() for a summary of the math involved with constructing the DSEM, and doi:10.1111/2041-210X.14289 for more background on math and inference
doi:10.48550/arXiv.2401.10193 for more details on how GAM, SEM, and DSEM components are combined from a statistical and software-user perspective
summary.tinyVAST() to visualize parameter estimates related to SEM and DSEM model components
# Simulate a seperable two-dimensional AR1 spatial process n_x = n_y = 25 n_w = 10 R_xx = exp(-0.4 * abs(outer(1:n_x, 1:n_x, FUN="-")) ) R_yy = exp(-0.4 * abs(outer(1:n_y, 1:n_y, FUN="-")) ) z = mvtnorm::rmvnorm(1, sigma=kronecker(R_xx,R_yy) ) # Simulate nuissance parameter z from oscillatory (day-night) process w = sample(1:n_w, replace=TRUE, size=length(z)) Data = data.frame( expand.grid(x=1:n_x, y=1:n_y), w=w, z=as.vector(z) + cos(w/n_w*2*pi)) Data$n = Data$z + rnorm(nrow(Data), sd=1) # Add columns for multivariate and/or temporal dimensions Data$var = "n" # make SPDE mesh for spatial term mesh = fmesher::fm_mesh_2d( Data[,c('x','y')], n=100 ) # fit model with cyclic confounder as GAM term out = tinyVAST( data = Data, formula = n ~ s(w), spatial_domain = mesh, space_term = "n <-> n, sd_n" ) # Run crossvalidation (too slow for CRAN) CV = cv::cv( out, k = 4 ) CV# Simulate a seperable two-dimensional AR1 spatial process n_x = n_y = 25 n_w = 10 R_xx = exp(-0.4 * abs(outer(1:n_x, 1:n_x, FUN="-")) ) R_yy = exp(-0.4 * abs(outer(1:n_y, 1:n_y, FUN="-")) ) z = mvtnorm::rmvnorm(1, sigma=kronecker(R_xx,R_yy) ) # Simulate nuissance parameter z from oscillatory (day-night) process w = sample(1:n_w, replace=TRUE, size=length(z)) Data = data.frame( expand.grid(x=1:n_x, y=1:n_y), w=w, z=as.vector(z) + cos(w/n_w*2*pi)) Data$n = Data$z + rnorm(nrow(Data), sd=1) # Add columns for multivariate and/or temporal dimensions Data$var = "n" # make SPDE mesh for spatial term mesh = fmesher::fm_mesh_2d( Data[,c('x','y')], n=100 ) # fit model with cyclic confounder as GAM term out = tinyVAST( data = Data, formula = n ~ s(w), spatial_domain = mesh, space_term = "n <-> n, sd_n" ) # Run crossvalidation (too slow for CRAN) CV = cv::cv( out, k = 4 ) CV
Control parameters for tinyVAST
tinyVASTcontrol( opt_loops = 1, newton_loops = 0, eval.max = 10000, iter.max = 10000, getsd = TRUE, silent = getOption("tinyVAST.silent", TRUE), trace = getOption("tinyVAST.trace", 0), verbose = getOption("tinyVAST.verbose", FALSE), profile = c(), tmb_par = NULL, tmb_map = NULL, tmb_random = NULL, gmrf_parameterization = c("separable", "projection"), reml = FALSE, getJointPrecision = FALSE, calculate_deviance_explained = TRUE, run_model = TRUE, suppress_user_warnings = FALSE, get_rsr = FALSE, extra_reporting = FALSE, use_anisotropy = FALSE, sar_adjacency = "queen", barrier_stiffness = 0.01 )tinyVASTcontrol( opt_loops = 1, newton_loops = 0, eval.max = 10000, iter.max = 10000, getsd = TRUE, silent = getOption("tinyVAST.silent", TRUE), trace = getOption("tinyVAST.trace", 0), verbose = getOption("tinyVAST.verbose", FALSE), profile = c(), tmb_par = NULL, tmb_map = NULL, tmb_random = NULL, gmrf_parameterization = c("separable", "projection"), reml = FALSE, getJointPrecision = FALSE, calculate_deviance_explained = TRUE, run_model = TRUE, suppress_user_warnings = FALSE, get_rsr = FALSE, extra_reporting = FALSE, use_anisotropy = FALSE, sar_adjacency = "queen", barrier_stiffness = 0.01 )
opt_loops |
Integer number of times to call nonlinear optimizer. |
newton_loops |
Integer number of Newton steps to do after running
|
eval.max |
Maximum number of evaluations of the objective function
allowed. Passed to |
iter.max |
Maximum number of iterations allowed. Passed to |
getsd |
Boolean indicating whether to call |
silent |
Disable terminal output for inner optimizer? |
trace |
Parameter values are printed every |
verbose |
Output additional messages about model steps during fitting? |
profile |
Character-vector passed to TMB::MakeADFun and see description
there. Fixed effects that are highly correlated with random effects
can often be estimated faster (i.e., with fewer iterations) by adding
them to |
tmb_par |
named list of parameters used as starting values. Elements that
have names that match those constructed internally then replace the
internally constructed starting values. Those that match must have identical
shape to |
tmb_map |
input passed to TMB::MakeADFun as argument |
tmb_random |
input passed to TMB::MakeADFun as argument |
gmrf_parameterization |
Parameterization to use for the Gaussian Markov
random field, where the default |
reml |
Logical: use REML (restricted maximum likelihood) estimation rather than maximum likelihood? Internally, this adds the fixed effects to the list of random effects to integrate over. |
getJointPrecision |
whether to get the joint precision matrix. Passed
to |
calculate_deviance_explained |
whether to calculate proportion of deviance
explained. See |
run_model |
whether to run the model of export TMB objects prior to compilation (useful for debugging) |
suppress_user_warnings |
whether to suppress warnings from package author regarding dangerous or non-standard options |
get_rsr |
Experimental option, whether to report restricted spatial regression (RSR) adjusted estimator for covariate responses |
extra_reporting |
Whether to report a much larger set of quantities via
|
use_anisotropy |
Whether to estimate two parameters representing geometric anisotropy |
sar_adjacency |
Whether to use queen or rook adjacency when defining a Simultaneous Autoregressive spatial precision from a sfc_GEOMETRY (default is queen) |
barrier_stiffness |
The ratio of local stiffness (the scale of diffusion
rate and resulting decorrelation distance) for barriers relative to normal areas
in the SPDE method when using |
An object (list) of class tinyVASTcontrol, containing either default or
updated values supplied by the user for model settings
Bakka, H., Vanhatalo, J., Illian, J., Simpson, D., Rue, H. (2019). Non-stationary Gaussian models with physical barriers. Spatial Statistics, 29, 268-288. doi:10.1016/j.spasta.2019.01.002
extract the covariance of fixed effects, or both fixed and random effects.
## S3 method for class 'tinyVAST' vcov(object, which = c("fixed", "random", "both"), ...)## S3 method for class 'tinyVAST' vcov(object, which = c("fixed", "random", "both"), ...)
object |
output from |
which |
whether to extract the covariance among fixed effects, random effects, or both |
... |
ignored, for method compatibility |
A square matrix containing the estimated covariances among the parameter estimates in the model.
The dimensions dependend upon the argument which, to determine whether fixed, random effects,
or both are outputted.