Marginal Effects for Fixed Effects Models

library(knitr)
library(data.table)
#> data.table 1.16.4 using 16 threads (see ?getDTthreads).  Latest news: r-datatable.com
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.22.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> 
#> The following object is masked from 'package:stats':
#> 
#>     ar
library(brmsmargins)

This vignette provides a brief overview of how to calculate marginal effects for Bayesian regression models involving only fixed effects and fit using the brms package.

What are marginal effects?

Marginal effects can be used to describe how an outcome is predicted to change with a change in a predictor (or predictors). It is a derivative. For convenience, typically calculated numerically rather than analytically.

To motivate marginal effects, we can look at some regression models fit in a frequentist framework for simplicity and speed. Here we use the mtcars dataset built into R. First, we can look at a linear regression model of the association between mpg and hp. Here we can see the estimated regression coefficient for mpg.

m.linear <- lm(hp ~ am + mpg, data = mtcars)

coef(m.linear)["mpg"]
#>       mpg 
#> -11.19988

In linear models with no interactions, no (non linear) transformations, and a linear link function, the regression coefficient is the predicted change in the outcome for a one unit change in the predictor, regardless of any other values. For example, here we can look at the predicted difference in the outcome for a one unit difference in mpg from 0 to 1, holding am = 0.

yhat <- predict(
  m.linear,
  newdata = data.frame(am = 0, mpg = c(0, 1)),
  type = "response")

diff(yhat)
#>         2 
#> -11.19988

We can look at the same estimate but moving mpg from 10 to 11 instead 0 to 1, holding am = 1.

yhat <- predict(
  m.linear,
  newdata = data.frame(am = 1, mpg = c(10, 11)),
  type = "response")

diff(yhat)
#>         2 
#> -11.19988

All of these quantities are identical. In this case, the regression coefficient can be interpreted as a marginal effect: the expected change in the outcome for a one unit shift in mpg, regardless of the value of am and regardless of the values where mpg is evaluated.

This convenient property does not hold for many types of models. Next consider a logistic regression model. The regression coefficient, shown below, is on the log odds scale, not the probability scale. This is not convenient for interpretation, as the log odds scale is not the same scale as our outcome.

m.logistic <- glm(vs ~ am + mpg, data = mtcars, family = binomial())

coef(m.logistic)["mpg"]
#>       mpg 
#> 0.6809205

We can find predicted differences on the probability scale. Here moving mpg from 10 to 11 holding am = 0.

yhat <- predict(
  m.logistic,
  newdata = data.frame(am = 0, mpg = c(10, 11)),
  type = "response")

diff(yhat)
#>           2 
#> 0.002661989

We can look at the same estimate but moving mpg from 20 to 21 instead 10 to 11 again holding am = 0.

yhat <- predict(
  m.logistic,
  newdata = data.frame(am = 0, mpg = c(20, 21)),
  type = "response")

diff(yhat)
#>         2 
#> 0.1175344

We can look at the same estimate moving mpg from 20 to 21 as before, but this time holding am = 1.

yhat <- predict(
  m.logistic,
  newdata = data.frame(am = 1, mpg = c(20, 21)),
  type = "response")

diff(yhat)
#>          2 
#> 0.08606869

All the estimates in this case differ. The association between mpg and probability of vs is not linear. Marginal effects provide a way to get results on the response scale, which can aid interpretation.

A common type of marginal effect is an average marginal effect (AME). To calculate an AME numerically, we can get predicted probabilities from a model for every observation in the dataset. For continuous variables, we might use a very small difference to approximate the derivative. For categorical variables, we might calculate a discrete difference.

Average Marginal Effect (AME)

Here is an example of a continuous AME. h is a value near to zero used for the numerical derivative. We take all the values observed in the dataset for the first set of predicted probabilities. Then we take the observed values + h and calculate new predicted probabilities. The difference, divided by h is the “instantaneous” (i.e., derivative) on the probability scale for a one unit shift in the predictor, mpg, for each person. When we average all of these, we get the AME.

h <- .001

nd.1 <- nd.0 <- model.frame(m.logistic)
nd.1$mpg <- nd.1$mpg + h

yhat.0 <- predict(
  m.logistic,
  newdata = nd.0,
  type = "response")

yhat.1 <- predict(
  m.logistic,
  newdata = nd.1,
  type = "response")

mean((yhat.1 - yhat.0) / h)
#> [1] 0.06922997

Here is an example of a discrete AME. The variable, am only takes two values: 0 or 1. So we calculate predicted probabilities if everyone had am = 0 and then again if everyone had am = 1.

nd.1 <- nd.0 <- model.frame(m.logistic)
nd.0$am <- 0
nd.1$am <- 1

yhat.0 <- predict(
  m.logistic,
  newdata = nd.0,
  type = "response")

yhat.1 <- predict(
  m.logistic,
  newdata = nd.1,
  type = "response")

mean((yhat.1 - yhat.0))
#> [1] -0.2618203

In both these examples, we are averaging across the different values observed in the dataset. In a frequentist framework, additional details are needed to calculate uncertainty intervals. In a Bayesian framework, uncertainty intervals can be calculated readily by summarizing the posterior.

AMEs for Logistic Regression

The main function for users to use is brmsmargins(). Here is an example calculating AMEs for mpg and am. First we will fit the same logistic regression model using brms.

bayes.logistic <- brm(
  vs ~ am + mpg, data = mtcars,
  family = "bernoulli", seed = 1234,
  silent = 2, refresh = 0,
  save_pars = save_pars(group = TRUE, latent = FALSE, all = TRUE),
  chains = 4L, cores = 4L, backend = "cmdstanr")
#> Loading required package: rstan
#> Loading required package: StanHeaders
#> 
#> rstan version 2.32.6 (Stan version 2.32.2)
#> For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores()).
#> To avoid recompilation of unchanged Stan programs, we recommend calling
#> rstan_options(auto_write = TRUE)
#> For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
#> change `threads_per_chain` option:
#> rstan_options(threads_per_chain = 1)
#> Do not specify '-march=native' in 'LOCAL_CPPFLAGS' or a Makevars file
summary(bayes.logistic)
#>  Family: bernoulli 
#>   Links: mu = logit 
#> Formula: vs ~ am + mpg 
#>    Data: mtcars (Number of observations: 32) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Regression Coefficients:
#>           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept   -16.12      5.67   -28.84    -7.42 1.00     1379     1959
#> am           -3.83      1.86    -7.97    -0.70 1.00     1568     1772
#> mpg           0.87      0.31     0.40     1.56 1.00     1328     1711
#> 
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

Now we can use brmsmargins(). We give it the model object, a data.frame of the values to be added, first 0, then (0 + h), and a contrast matrix. The default is a 99 percent credible interval, which we override here to 0.95. We use highest density intervals, which are the default. We also could have selected “ETI” for equal tail intervals. brmsmargins() will return a list with the posterior of each prediction, a summary of the posterior for the predictions, the posterior for the contrasts, and a summary of the posterior for the contrasts. Here we just have the one contrast, but multiple could have been specified.

h <- .001
ame1 <- brmsmargins(
  bayes.logistic,
  add = data.frame(mpg = c(0, 0 + h)),
  contrasts = cbind("AME MPG" = c(-1 / h, 1 / h)),
  CI = 0.95, CIType = "HDI")

kable(ame1$ContrastSummary, digits = 3)
M Mdn LL UL PercentROPE PercentMID CI CIType ROPE MID Label
0.071 0.07 0.054 0.092 NA NA 0.95 HDI NA NA AME MPG

Now we can look at how we could calculate a discrete AME. This time we use the at argument instead of the add argument as we want to hold am at specific values, not add 0 and 1 to the observed am values. Because 0 and 1 are meaningful values of am, we also look at the summary of the posterior for the predictions. These predictions average across all values of mpg.

ame2 <- brmsmargins(
  bayes.logistic,
  at = data.frame(am = c(0, 1)),
  contrasts = cbind("AME am" = c(-1, 1)),
  CI = 0.95, CIType = "HDI")

kable(ame2$Summary, digits = 3)
M Mdn LL UL PercentROPE PercentMID CI CIType ROPE MID
0.544 0.547 0.430 0.662 NA NA 0.95 HDI NA NA
0.283 0.277 0.175 0.408 NA NA 0.95 HDI NA NA
kable(ame2$ContrastSummary)
M Mdn LL UL PercentROPE PercentMID CI CIType ROPE MID Label
-0.2607899 -0.2649507 -0.4289198 -0.0891496 NA NA 0.95 HDI NA NA AME am

Note that by default, brmsmargins() uses the model frame from the model object as the dataset. This, however, can be overridden. You can give it any (valid) dataset and it will add or override the chosen values and average across the predictions from the different rows of the dataset.

AMEs for Poisson Regression

Here is a short example for Poisson regression used for count outcomes. We use a dataset drawn from: https://stats.oarc.ucla.edu/r/dae/poisson-regression/


d <- fread("https://stats.oarc.ucla.edu/stat/data/poisson_sim.csv")
d[, prog := factor(prog, levels = 1:3, labels = c("General", "Academic", "Vocational"))]

bayes.poisson <- brm(
  num_awards ~ prog + math, data = d,
  family = "poisson", seed = 1234,
  silent = 2, refresh = 0,
  save_pars = save_pars(group = TRUE, latent = FALSE, all = TRUE),
  chains = 4L, cores = 4L, backend = "cmdstanr")
summary(bayes.poisson)
#>  Family: poisson 
#>   Links: mu = log 
#> Formula: num_awards ~ prog + math 
#>    Data: d (Number of observations: 200) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Regression Coefficients:
#>                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept         -5.32      0.68    -6.68    -4.01 1.00     2444     2292
#> progAcademic       1.15      0.38     0.48     1.92 1.00     1949     1571
#> progVocational     0.39      0.47    -0.48     1.34 1.00     1808     1427
#> math               0.07      0.01     0.05     0.09 1.00     2729     2349
#> 
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

AME for a continuous variable, using default CI interval and type.

h <- .001
ame1.p <- brmsmargins(
  bayes.poisson,
  add = data.frame(math = c(0, 0 + h)),
  contrasts = cbind("AME math" = c(-1 / h, 1 / h)))

kable(ame1.p$ContrastSummary, digits = 3)
M Mdn LL UL PercentROPE PercentMID CI CIType ROPE MID Label
0.044 0.044 0.025 0.065 NA NA 0.99 HDI NA NA AME math

AME for a categorical variable. Here we calculate pairwise contrasts for all three program types. These are the predicted number of awards.

ame2.p <- brmsmargins(
  bayes.poisson,
  at = data.frame(
    prog = factor(1:3,
                  labels = c("General", "Academic", "Vocational"))),
  contrasts = cbind(
    "AME General v Academic" = c(1, -1, 0),
    "AME General v Vocational" = c(1, 0, -1),
    "AME Academic v Vocational" = c(0, 1, -1)))
    
kable(ame2.p$Summary, digits = 3)
M Mdn LL UL PercentROPE PercentMID CI CIType ROPE MID
0.262 0.253 0.077 0.522 NA NA 0.99 HDI NA NA
0.780 0.776 0.575 0.992 NA NA 0.99 HDI NA NA
0.380 0.368 0.158 0.707 NA NA 0.99 HDI NA NA
kable(ame2.p$ContrastSummary, digits = 3)
M Mdn LL UL PercentROPE PercentMID CI CIType ROPE MID Label
-0.518 -0.522 -0.806 -0.135 NA NA 0.99 HDI NA NA AME General v Academic
-0.118 -0.115 -0.510 0.248 NA NA 0.99 HDI NA NA AME General v Vocational
0.400 0.409 0.011 0.759 NA NA 0.99 HDI NA NA AME Academic v Vocational

AMEs for Negative Binomial Regression

Here is a short example for Negative Binomial regression used for count outcomes.

d <- fread("https://stats.oarc.ucla.edu/stat/data/fish.csv")
d[, nofish := factor(nofish, levels = 0:1, labels = c("no fish", "fish"))]

bayes.nb <- brm(
  count ~ nofish + xb, data = d,
  family = "negbinomial", seed = 1234,
  silent = 2, refresh = 0,
  save_pars = save_pars(group = TRUE, latent = FALSE, all = TRUE),
  chains = 4L, cores = 4L, backend = "cmdstanr")
summary(bayes.nb)
#>  Family: negbinomial 
#>   Links: mu = log; shape = identity 
#> Formula: count ~ nofish + xb 
#>    Data: d (Number of observations: 250) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Regression Coefficients:
#>            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept     -1.38      0.21    -1.78    -0.97 1.00     2936     2491
#> nofishfish    -0.08      0.25    -0.56     0.42 1.00     3667     2776
#> xb             1.24      0.10     1.06     1.43 1.00     3223     3165
#> 
#> Further Distributional Parameters:
#>       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> shape     0.72      0.12     0.52     0.99 1.00     3464     2956
#> 
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

AME for a continuous variable, using default CI interval and type.

h <- .001
ame1.nb <- brmsmargins(
  bayes.nb,
  add = data.frame(xb = c(0, 0 + h)),
  contrasts = cbind("AME xb" = c(-1 / h, 1 / h)))

kable(ame1.nb$ContrastSummary, digits = 3)
M Mdn LL UL PercentROPE PercentMID CI CIType ROPE MID Label
5.623 5.296 2.654 11.735 NA NA 0.99 HDI NA NA AME xb

AME for a categorical variable. Here we calculate pairwise contrasts for no fish versus fish. The estimate is how many are caught.

ame2.nb <- brmsmargins(
  bayes.nb,
  at = data.frame(
    nofish = factor(0:1,
                  labels = c("no fish", "fish"))),
  contrasts = cbind(
    "AME No Fish v Fish" = c(1, -1)))
    
kable(ame2.nb$Summary, digits = 3)
M Mdn LL UL PercentROPE PercentMID CI CIType ROPE MID
4.523 4.367 2.485 7.975 NA NA 0.99 HDI NA NA
4.274 4.043 1.695 8.816 NA NA 0.99 HDI NA NA
kable(ame2.nb$ContrastSummary, digits = 3)
M Mdn LL UL PercentROPE PercentMID CI CIType ROPE MID Label
0.25 0.343 -3.065 3.026 NA NA 0.99 HDI NA NA AME No Fish v Fish

References

These references may be useful.

  • Norton, E. C., Dowd, B. E., & Maciejewski, M. L. (2019). Marginal effects—quantifying the effect of changes in risk factors in logistic regression models. JAMA, 321(13), 1304-1305.
  • Mize, T. D., Doan, L., & Long, J. S. (2019). A general framework for comparing predictions and marginal effects across models. Sociological Methodology, 49(1), 152-189.