Title: | Bayesian Marginal Effects for 'brms' Models |
---|---|
Description: | Calculate Bayesian marginal effects, average marginal effects, and marginal coefficients (also called population averaged coefficients) for models fit using the 'brms' package including fixed effects, mixed effects, and location scale models. These are based on marginal predictions that integrate out random effects if necessary (see for example <doi:10.1186/s12874-015-0046-6> and <doi:10.1111/biom.12707>). |
Authors: | Joshua F. Wiley [aut, cre] , Donald Hedeker [aut] |
Maintainer: | Joshua F. Wiley <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.2.0 |
Built: | 2024-11-07 03:58:26 UTC |
Source: | https://github.com/jwiley/brmsmargins |
This function is designed to help calculate marginal effects
including average marginal effects (AMEs) from brms
models.
Arguments are labeled as required when it is required that the
user directly specify the argument. Arguments are labeled as
optional when either the argument is optional or there are
sensible default values so that users do not typically need to specify
the argument.
brmsmargins( object, at = NULL, wat = NULL, add = NULL, newdata = model.frame(object), CI = 0.99, CIType = "HDI", contrasts = NULL, ROPE = NULL, MID = NULL, subset = NULL, dpar = NULL, seed, verbose = FALSE, ... )
brmsmargins( object, at = NULL, wat = NULL, add = NULL, newdata = model.frame(object), CI = 0.99, CIType = "HDI", contrasts = NULL, ROPE = NULL, MID = NULL, subset = NULL, dpar = NULL, seed, verbose = FALSE, ... )
object |
A required argument specifying a fitted |
at |
An optional argument (but note, either |
wat |
An optional list with named elements including one element named,
“ID” with a single character string, the name of the variable
in the model frame that is the ID variable. Additionally,
there should be one or more named elements, named after variables
in the model (and specified in the |
add |
An optional argument (but note, either |
newdata |
An optional argument specifying an object inheriting
from data frame indicating the baseline values to use for predictions and AMEs.
It uses a sensible default: the model frame from the |
CI |
An optional argument with a numeric value specifying the width
of the credible interval. Defaults to |
CIType |
An optional argument, a character string specifying the
type of credible interval (e.g., highest density interval). It is passed down to
|
contrasts |
An optional argument specifying a contrast matrix.
The posterior predictions matrix
is post multiplied by the contrast matrix, so they must be conformable.
The posterior predictions matrix has a separate column for each row in the
|
ROPE |
An optional argument, that can either be left as |
MID |
An optional argument, that can either left as |
subset |
An optional argument, a character string that is a
valid |
dpar |
An optional argument giving the parameter passed on to the |
seed |
An optional argument that controls whether (and if so what) random seed
to use. This does not matter when using fixed effects only. However,
when using Monte Carlo integration to integrate out random effects from
mixed effects models, it is critical if you are looking at a continuous
marginal effect with some small offset value as otherwise the
Monte Carlo error from one set of predictions to another may exceed
the true predicted difference.
If |
verbose |
An optional argument, a logical value whether to print
more verbose messages. Defaults to |
... |
An optional argument, additional arguments passed on to
|
The main parts required for the function are a fitted model object,
(via the object
argument) a dataset to be used for prediction,
(via the newdata
argument which defaults to the model frame),
and a dataset passed to either at
or add
.
The steps are as follows:
Check that the function inputs (model object, data, etc.) are valid.
Take the dataset from the newdata
argument and either
add the values from the first row of add
or replace the values
using the first row of at
. Only variables specified in
at
or add
are modified. Other variables are left as is.
Use the fitted()
function to generate predictions based on
this modified dataset. If effects
is set to “fixedonly”
(meaning only generate predictions using fixed effects)
or to “includeRE”
(meaning generate predictions using fixed and random effects),
then predictions are generated entirely using the fitted()
function and are, typically back transformed to the response scale.
For mixed effects models with fixed and random effects where
effects
is set to “integrateoutRE”, then fitted()
is only used to generate predictions using the fixed effects on the linear
scale. For each prediction generated, the random effects are integrated out
by drawing k
random samples from the model assumed random effect(s)
distribution. These are added to the fixed effects predictions,
back transformed, and then averaged over all k
random samples to
perform numerical Monte Carlo integration.
All the predictions for each posterior draw, after any back transformation has been applied, are averaged, resulting in one, marginal value for each posterior draw. These are marginal predictions. They are average marginal predictions if averaging over the sample dataset, or may be marginal predictions at the means, if the initial input dataset used mean values, etc.
Steps two to four are repeated for each row of at
or add
.
Results are combined into a matrix where the columns are different
rows from at
or add
and the rows are different posterior
draws.
If contrasts were specified, using a contrast matrix, the
marginal prediction matrix is post multiplied by the contrast matrix.
Depending on the choice(s) of add
or at
and the
values in the contrast matrix, these can then be
average marginal effects (AMEs) by using numerical integration
(add
with 0 and a very close to 0 value) or
discrete difference (at
with say 0 and 1 as values)
for a given predictor(s).
The marginal predictions and the contrasts, if specified are summarized.
Although brmsmargins()
is focused on helping to calculate
marginal effects, it can also be used to generate marginal predictions,
and indeed these marginal predictions are the foundation of any
marginal effect estimates. Through manipulating the input data,
at
or add
and the contrast matrix, other types of estimates
averaged or weighting results in specific ways are also possible.
A list with four elements.
Posterior
Posterior distribution of all predictions. These predictions default to fixed effects only, but by specifying options to prediction()
they can include random effects or be predictions integrating out random effects.
Summary
A summary of the predictions.
Contrasts
Posterior distribution of all contrasts, if a contrast matrix was specified.
ContrastSummary
A summary of the posterior distribution of all contrasts, if specified
Pavlou, M., Ambler, G., Seaman, S., & Omar, R. Z. (2015) doi:10.1186/s12874-015-0046-6 “A note on obtaining correct marginal predictions from a random intercepts model for binary outcomes” and Skrondal, A., & Rabe-Hesketh, S. (2009) doi:10.1111/j.1467-985X.2009.00587.x “Prediction in multilevel generalized linear models” and Norton EC, Dowd BE, Maciejewski ML. (2019) doi:10.1001/jama.2019.1954 “Marginal Effects—Quantifying the Effect of Changes in Risk Factors in Logistic Regression Models”
## Not run: #### Testing #### ## sample data and logistic model with brms set.seed(1234) Tx <- rep(0:1, each = 50) ybin <- c(rep(0:1, c(40,10)), rep(0:1, c(10,40))) logitd <- data.frame(Tx = Tx, ybin = ybin) logitd$x <- rnorm(100, mean = logitd$ybin, sd = 2) mbin <- brms::brm(ybin ~ Tx + x, data = logitd, family = brms::bernoulli()) summary(mbin) ## now check AME for Tx tmp <- brmsmargins( object = mbin, at = data.table::data.table(Tx = 0:1), contrasts = matrix(c(-1, 1), nrow = 2), ROPE = c(-.05, +.05), MID = c(-.10, +.10)) tmp$Summary tmp$ContrastSummary ## Tx AME ## now check AME for Tx with bootstrapping the AME population tmpalt <- brmsmargins( object = mbin, at = data.table::data.table(Tx = 0:1), contrasts = matrix(c(-1, 1), nrow = 2), ROPE = c(-.05, +.05), MID = c(-.10, +.10), resample = 100L) tmpalt$Summary tmpalt$ContrastSummary ## Tx AME ## now check AME for continuous predictor, x ## use .01 as an approximation for first derivative ## 1 / .01 in the contrast matrix to get back to a one unit change metric tmp2 <- brmsmargins( object = mbin, add = data.table::data.table(x = c(0, .01)), contrasts = matrix(c(-1/.01, 1/.01), nrow = 2), ROPE = c(-.05, +.05), MID = c(-.10, +.10)) tmp2$ContrastSummary ## x AME if (FALSE) { library(lme4) data(sleepstudy) fit <- brms::brm(Reaction ~ 1 + Days + (1 + Days | Subject), data = sleepstudy, cores = 4) summary(fit, prob = 0.99) tmp <- brmsmargins( object = fit, at = data.table::data.table(Days = 0:1), contrasts = matrix(c(-1, 1), nrow = 2), ROPE = c(-.05, +.05), MID = c(-.10, +.10), CIType = "ETI", effects = "integrateoutRE", k = 5L) tmp$Summary tmp$ContrastSummary } ## End(Not run)
## Not run: #### Testing #### ## sample data and logistic model with brms set.seed(1234) Tx <- rep(0:1, each = 50) ybin <- c(rep(0:1, c(40,10)), rep(0:1, c(10,40))) logitd <- data.frame(Tx = Tx, ybin = ybin) logitd$x <- rnorm(100, mean = logitd$ybin, sd = 2) mbin <- brms::brm(ybin ~ Tx + x, data = logitd, family = brms::bernoulli()) summary(mbin) ## now check AME for Tx tmp <- brmsmargins( object = mbin, at = data.table::data.table(Tx = 0:1), contrasts = matrix(c(-1, 1), nrow = 2), ROPE = c(-.05, +.05), MID = c(-.10, +.10)) tmp$Summary tmp$ContrastSummary ## Tx AME ## now check AME for Tx with bootstrapping the AME population tmpalt <- brmsmargins( object = mbin, at = data.table::data.table(Tx = 0:1), contrasts = matrix(c(-1, 1), nrow = 2), ROPE = c(-.05, +.05), MID = c(-.10, +.10), resample = 100L) tmpalt$Summary tmpalt$ContrastSummary ## Tx AME ## now check AME for continuous predictor, x ## use .01 as an approximation for first derivative ## 1 / .01 in the contrast matrix to get back to a one unit change metric tmp2 <- brmsmargins( object = mbin, add = data.table::data.table(x = c(0, .01)), contrasts = matrix(c(-1/.01, 1/.01), nrow = 2), ROPE = c(-.05, +.05), MID = c(-.10, +.10)) tmp2$ContrastSummary ## x AME if (FALSE) { library(lme4) data(sleepstudy) fit <- brms::brm(Reaction ~ 1 + Days + (1 + Days | Subject), data = sleepstudy, cores = 4) summary(fit, prob = 0.99) tmp <- brmsmargins( object = fit, at = data.table::data.table(Days = 0:1), contrasts = matrix(c(-1, 1), nrow = 2), ROPE = c(-.05, +.05), MID = c(-.10, +.10), CIType = "ETI", effects = "integrateoutRE", k = 5L) tmp$Summary tmp$ContrastSummary } ## End(Not run)
Returns a summary of a posterior distribution for a single
parameter / value. It is based on personal preference. Notably, it does not
only use bayestestR::describe_posterior
, an excellent function,
because of the desire to also describe the percentage of the full posterior
distribution that is at or exceeding the value of a
Minimally Important Difference (MID). MIDs are used in clinical studies with outcome
measures where there are pre-defined differences that are considered clinically
important, which is distinct from the ROPE or general credible intervals capturing
uncertainty.
bsummary(x, CI = 0.99, CIType = "HDI", ROPE = NULL, MID = NULL)
bsummary(x, CI = 0.99, CIType = "HDI", ROPE = NULL, MID = NULL)
x |
The posterior distribution of a parameter |
CI |
A numeric value indicating the desired width of the credible interval.
Defaults to |
CIType |
A character string indicating the type of credible interval, passed on
to the |
ROPE |
Either left as |
MID |
Either left as |
A data.table
with the mean, M
the mean of the posterior samples
the median of the posterior samples
the lower limit of the credible interval
the upper limit of the credible interval
the percentage of posterior samples falling into the ROPE
the percentage of posterior samples falling at or beyond the MID
the width of the credible interval used
the type of credible interval used (e.g., highest density interval)
a label describing the values included in the ROPE
a label describing the values included in the MID
Kruschke, J. K. (2018). doi:10.1177/2515245918771304 “Rejecting or accepting parameter values in Bayesian estimation”
bsummary(rnorm(1000)) bsummary(rnorm(1000), ROPE = c(-.5, .5), MID = c(-1, 1))
bsummary(rnorm(1000)) bsummary(rnorm(1000), ROPE = c(-.5, .5), MID = c(-1, 1))
Used in the process of Monte Carlo integration over multivariate normal random effects. This generates the random draws from the multivariate normal distribution and multiplies these by the data. Not intended to be called directly by most users.
integratemvn(X, k, sd, chol) integratemvnR(X, k, sd, chol)
integratemvn(X, k, sd, chol) integratemvnR(X, k, sd, chol)
X |
A numeric matrix of the data to be multiplied by the random effects |
k |
An integer, the number of random samples to be used for numerical integration |
sd |
A numeric vector of the standard deviations |
chol |
A numeric matrix, which should be the Cholesky decomposition of the correlation matrix of the multivariate normal distribution. |
A numeric matrix with random values
integratemvnR
: Pure R
implementation of integratemvn
integratemvn( X = matrix(1, 1, 2), k = 100L, sd = c(10, 5), chol = chol(matrix(c(1, .5, .5, 1), 2))) integratemvn(matrix(1, 1, 1), 100L, c(5), matrix(1))
integratemvn( X = matrix(1, 1, 2), k = 100L, sd = c(10, 5), chol = chol(matrix(c(1, .5, .5, 1), 2))) integratemvn(matrix(1, 1, 1), 100L, c(5), matrix(1))
Used to conduct Monte Carlo integration over Gaussian random effects. Not intended to be called directly by most users.
integratere(d, sd, L, k, yhat, backtrans) integratereR(d, sd, L, k, yhat, backtrans)
integratere(d, sd, L, k, yhat, backtrans) integratereR(d, sd, L, k, yhat, backtrans)
d |
A list with model matrices for each random effect block. |
sd |
A list with standard deviation matrices for each random effect block where rows are different posterior draws. |
L |
A list with matrices for each random effect block containing the parts of the L matrix, the Cholesky decomposition of the random effect correlation matrix. |
k |
An integer, the number of samples for Monte Carlo integration. |
yhat |
A matrix of the fixed effects predictions |
backtrans |
An integer, indicating the type of back transformation. 0 indicates inverse logit (e.g., for logistic regression). 1 indicates exponential (e.g., for poisson or negative binomial regression or if outcome was natural log transformed). 2 indicates square (e.g., if outcome was square root transformed). 3 indicates inverse (e.g., if outcome was inverse transformed such as Gamma regression) Any other integer results in no transformation. -9 is recommended as the option for no transformation as any future transformations supported will be other, positive integers. |
A numeric matrix with the Monte Carlo integral calculated.
integratereR
: Pure R
implementation of integratere
integratere( d = list(matrix(1, 1, 1)), sd = list(matrix(1, 2, 1)), L = list(matrix(1, 2, 1)), k = 10L, yhat = matrix(0, 2, 1), backtrans = 0L)
integratere( d = list(matrix(1, 1, 1)), sd = list(matrix(1, 2, 1)), L = list(matrix(1, 2, 1)), k = 10L, yhat = matrix(0, 2, 1), backtrans = 0L)
Used to get marginal coefficients off of a generalized linear mixed model.
lmcpp(X, y)
lmcpp(X, y)
X |
A numeric model matrix. If intercept is desired, it must already have been added as a column. |
y |
A numeric matrix. A single column if one response variable or multiple columns where each column is a different response, such as a for marginal coefficients where each column is a different MCMC sample. |
A numeric matrix with the coefficient.
lmcpp(cbind(1, mtcars$hp, mtcars$am), as.matrix(mtcars[, c("mpg", "qsec")]))
lmcpp(cbind(1, mtcars$hp, mtcars$am), as.matrix(mtcars[, c("mpg", "qsec")]))
Calculate marginal coefficients from a brms
generalized linear mixed model using the method proposed by Hedeker (2018).
marginalcoef( object, summarize = TRUE, posterior = FALSE, index, backtrans = c("response", "linear", "identity", "invlogit", "exp", "square", "inverse"), k = 100L, ... )
marginalcoef( object, summarize = TRUE, posterior = FALSE, index, backtrans = c("response", "linear", "identity", "invlogit", "exp", "square", "inverse"), k = 100L, ... )
object |
A fitted brms model object that includes random effects. Required. |
summarize |
A logical value, whether or not to
calculate summaries of the posterior predictions.
Defaults to |
posterior |
A logical value whether or not to
save and return the posterior samples. Defaults
to |
index |
An optional integer vector, giving the posterior draws to be used in the calculations. If omitted, defaults to all posterior draws. |
backtrans |
A character string indicating the type of back transformation to be applied. Can be one of “response” meaning to use the response scale, “linear” or “identity” meaning to use the linear predictor scale, or a specific back transformation desired, from a possible list of “invlogit”, “exp”, “square”, or “inverse”. Custom back transformations should only be needed if, for example, the outcome variable was transformed prior to fitting the model. |
k |
An integer providing the number of random draws to use for
integrating out the random effects. Only relevant when |
... |
Additional arguments passed to |
A list with Summary
and Posterior
.
Some of these may be NULL
depending on the arguments used.
Hedeker, D., du Toit, S. H., Demirtas, H. & Gibbons, R. D. (2018) doi:10.1111/biom.12707 “A note on marginalization of regression parameters from mixed models of binary outcomes”
Calculate marginal predictions from a brms
model.
Marginal predictions average over the input data for each posterior draw.
Marginal predictions for models with random effects will integrate
over random effects.
Arguments are labeled as required when it is required that the
user directly specify the argument. Arguments are labeled as
optional when either the argument is optional or there are
sensible default values so that users do not typically need to specify
the argument.
prediction( object, data, summarize = TRUE, posterior = FALSE, index, dpar = NULL, resample = 0L, resampleseed = FALSE, effects = c("fixedonly", "includeRE", "integrateoutRE"), backtrans = c("response", "linear", "identity", "invlogit", "exp", "square", "inverse"), k = 100L, raw = FALSE, ... )
prediction( object, data, summarize = TRUE, posterior = FALSE, index, dpar = NULL, resample = 0L, resampleseed = FALSE, effects = c("fixedonly", "includeRE", "integrateoutRE"), backtrans = c("response", "linear", "identity", "invlogit", "exp", "square", "inverse"), k = 100L, raw = FALSE, ... )
object |
A required argument specifying a fitted
|
data |
A required argument specifying a data frame or
data table passed to |
summarize |
An optional argument, a logical value, whether
or not to calculate summaries of the posterior predictions.
Defaults to |
posterior |
An optional argument, a logical value whether
or not to save and return the posterior samples. Defaults
to |
index |
An optional argument, an integer vector, giving the posterior draws to be used in the calculations. If omitted, defaults to all posterior draws. |
dpar |
An optional argument, the parameter passed on to the
|
resample |
An optional argument, an integer indicating the
number of bootstrap resamples of the posterior predictions to
use when calculating summaries. Defaults to |
resampleseed |
An optional argument, a seed for random number
generation. Defaults to |
effects |
An optional argument, a character string indicating the type of prediction to be made. Can be one of “fixedonly” meaning only use fixed effects, “includeRE” meaning that random effects should be included in the predictions, or “integrateoutRE” meaning that random effects should be integrated out / over in the predictions. It defaults to “fixedonly” so is not typically required for a user to specify it. |
backtrans |
An optional argument, a character string indicating the type of back transformation to be applied. Can be one of “response” meaning to use the response scale, “linear” or “identity” meaning to use the linear predictor scale, or a specific back transformation desired, from a possible list of “invlogit”, “exp”, “square”, or “inverse”. Custom back transformations should only be needed if, for example, the outcome variable was transformed prior to fitting the model. It defaults to “response” so is not typically required for a user to specify it. |
k |
An optional argument, an integer providing the number of
random draws to use for integrating out the random effects.
Only relevant when |
raw |
An optional argument, a logical value indicating whether to
return the raw output or to average over the Monte Carlo samples.
Defaults to |
... |
An optional argument, additional arguments passed
to |
A list with Summary
and Posterior
.
Some of these may be NULL
depending on the arguments used.
Pavlou, M., Ambler, G., Seaman, S., & Omar, R. Z. (2015) doi:10.1186/s12874-015-0046-6 “A note on obtaining correct marginal predictions from a random intercepts model for binary outcomes” and Skrondal, A., & Rabe-Hesketh, S. (2009) doi:10.1111/j.1467-985X.2009.00587.x “Prediction in multilevel generalized linear models”
This takes a numeric matrix, bootstrap resamples each row, and then calculates the mean. The intended use case is for Bayesian posterior predictions from sample data. Instead of directly calculating the average marginal effect (AME) across all observed values, these can be bootstrapped, so that uncertainty in the target population, and thus the AME in the target population, can be incorporated. Model uncertainty is already assumed to be handled by the different posterior samples, which are assumed to be across rows.
rowBootMeans(x)
rowBootMeans(x)
x |
A numeric matrix |
A numeric vector with the simple bootstrapped row means of the matrix
x <- matrix(1:9, byrow = TRUE, 3) replicate(10, rowBootMeans(x))
x <- matrix(1:9, byrow = TRUE, 3) replicate(10, rowBootMeans(x))
Utility function to convert a row matrix to a square matrix.
Used as the brms
package returns things like the Cholesky
decomposition matrix as separate columns where rows are posterior draws.
Not intended to be called directly by most users.
tab2mat(X) tab2matR(X)
tab2mat(X) tab2matR(X)
X |
a matrix |
A numeric matrix with one row.
tab2matR
: Pure R
implementation of tab2mat
tab2mat(matrix(1:4, 1)) tab2mat(matrix(1:9, 1))
tab2mat(matrix(1:4, 1)) tab2mat(matrix(1:9, 1))