Bayesian SEM with BRMS

Note: for the most up to date version of this, see this post on rpubs

Background

So, it looks like `brms` version 2.0 implements multivariate responses – and hence piecewise Structural Equation Modeling in a Bayesian framework. After https://github.com/paul-buerkner/brms/issues/303, it became clear that while one cannot do tests of D-separation, WAIC, LOO, etc., are all available. Let’s take a peak with a familiar example.

As a note, `blavaan` also implements Bayesian SEM, but `brms` is so gorram flexible with model forms, that I think it’s an excellent way to approach the problem that gives a lot more breathing room. And it’s in STAN, not JAGS!

The Keeley Data

OK, the venerable Keeley et al. 2005 and extensions in Grace and Keeley 2006. The teaching example that a number of us use is to look at how fire severity mediates plan species richness via cover.

``````### Getting the data from piecewiseSEM
#install.packages("jslefche/piecewiseSEM@2.0") #for the keeley data set
library(piecewiseSEM)
data(keeley)

##Otherwise, on the web

Using covariance based modeling, we would fit the following:

``````library(lavaan)

k_mod <- "
rich ~ firesev + cover
cover ~ firesev"

k_fit_lavaan <- sem(k_mod, data = keeley)``````

We can fit this in piecewiseSEM as well

``````k_fit_psem <- psem(
lm(rich ~ firesev + cover, data=keeley),
lm(cover ~ firesev, data=keeley),
data = keeley
)``````

Yep, the same.

brms and SEM

In the new `brms` you can build these models with `mvbrmsformula` or just adding multiple `brmsformula` objects together. We’ll use `set_rescor(FALSE)` to not model the correlation between response variables (but could to represent residual correlations, I think!) So, here we have

``library(brms)``
``## Loading required package: Rcpp``
``## Loading required package: ggplot2``
``````## Loading 'brms' package (version 2.0.0). Useful instructions
## can be found by typing help('brms'). A more detailed introduction
## to the package is available through vignette('brms_overview').
## Run theme_set(theme_default()) to use the default bayesplot theme.``````
``````rich_mod <- bf(rich ~ firesev + cover)
cover_mod <- bf(cover ~ firesev)

k_fit_brms <- brm(rich_mod +
cover_mod +
set_rescor(FALSE),
data=keeley,
cores=4, chains = 2)``````
``## Compiling the C++ model``
``## Start sampling``

Looks the same! How did the model do?

``plot(k_fit_brms)``

One thing we can do is look at the WAIC or LOO which we can see is the sum of its component pieces. Let’s start with fitting those pieces.

``````rich_fit <- brm(rich_mod,
data=keeley,
cores=2, chains = 2)``````
``## Compiling the C++ model``
``## Start sampling``
``````cover_fit <- brm(cover_mod,
data=keeley,
cores=2, chains = 2)``````
``````## Compiling the C++ model
## Start sampling``````

Now, let’s look at the whole and the parts

``WAIC(k_fit_brms)``
``````##    WAIC    SE
##  768.47 15.52``````
``WAIC(rich_fit)``
``````##    WAIC   SE
##  734.05 9.35``````
``WAIC(cover_fit)``
``````##   WAIC    SE
##  34.43 12.61``````

Nice!

Comparison across all three forms

Broadly, the figures above show simular results. But let’s dig deeper. In `lavaan` we get

``summary(k_fit_lavaan)``
``````## lavaan (0.6-1.1170) converged normally after  32 iterations
##
##   Number of observations                            90
##
##   Estimator                                         ML
##   Model Fit Test Statistic                       0.000
##   Degrees of freedom                                 0
##
## Parameter Estimates:
##
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
##
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   rich ~
##     firesev          -2.531    0.976   -2.593    0.010
##     cover             9.911    5.083    1.950    0.051
##   cover ~
##     firesev          -0.084    0.018   -4.611    0.000
##
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .rich            187.212   27.908    6.708    0.000
##    .cover             0.081    0.012    6.708    0.000``````

We can see the `rich ~ cover` path has 0=0.051. Which, you know, some people are OK with, some might wonder. `piecewiseSEM` produces broadly similar results.

``summary(k_fit_psem, intercept=TRUE)``
``````##
## Structural Equation Model of k_fit_psem
##
## Call:
##   rich ~ firesev + cover
##   cover ~ firesev
##
##     AIC      BIC
##  14.000   31.499
##
## ---
## Tests of directed separation:
##
##  data frame with 0 columns and 0 rows
##
## Global goodness-of-fit:
##
##   Fisher's C = 0 with P-value = 1 and on 0 degrees of freedom
##
## ---
## Coefficients:
##
##   Response   Predictor Estimate Std.Error DF Crit.Value P.Value
##       rich (Intercept)  53.9358    7.0437 87     7.6573  0.0000
##       rich     firesev  -2.5308    0.9926 87    -2.5496  0.0125
##       rich       cover   9.9105    5.1701 87     1.9169  0.0585
##      cover (Intercept)   1.0744    0.0893 88    12.0299  0.0000
##      cover     firesev  -0.0839    0.0184 88    -4.5594  0.0000
##   Std.Estimate Std.SE
##         0.0000 0.0000 ***
##        -0.2768 0.1086   *
##         0.2081 0.1086
##         0.0000 0.0000 ***
##        -0.4371 0.0959 ***
##
##   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
##
## Individual R-squared:
##
##   Response method R.squared
##       rich     NA      0.17
##      cover     NA      0.19``````

Note that p-value goes up! Huh. We also now have intercepts, which we can compare to…

``summary(k_fit_brms)``
``````##  Family: MV(gaussian, gaussian)
##   Links: mu = identity; sigma = identity
##          mu = identity; sigma = identity
## Formula: rich ~ firesev + cover
##          cover ~ firesev
##    Data: keeley (Number of observations: 90)
## Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 2000
##     ICs: LOO = NA; WAIC = NA; R2 = NA
##
## Population-Level Effects:
##                 Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## rich_Intercept     54.16      6.92    40.46    67.73       2000 1.00
## cover_Intercept     1.07      0.09     0.90     1.24       2000 1.00
## rich_firesev       -2.57      0.99    -4.52    -0.58       2000 1.00
## rich_cover          9.84      5.05    -0.30    19.59       2000 1.00
## cover_firesev      -0.08      0.02    -0.12    -0.05       2000 1.00
##
## Family Specific Parameters:
##             Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma_rich     14.02      1.07    12.04    16.27       2000 1.00
## sigma_cover     0.29      0.02     0.25     0.34       2000 1.00
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).``````

Nice effective sample sizes and convergence. Note the `rich_cover` credible interval is pretty big, now, and we can see how it overlaps 0. So, broadly, similar results again!

OK, what can we do with this?From an SEM framework, we want to compare this to an alternate model. I’ve seen suggested that we can calculate posterior p-values for the basis set and go from there for a traditional D-sep test, but that really does mix frequentist and Bayesian paradigms, so, I’m trying to ponder that – although evaluating each independence claim isn’t a bad idea. Might need to write some code to do that.

Model comparison is a snap, however.

Mediation in BRMS

Let’s say we have the following alternate model where fire severity’s effect is fully mediated via cover.

OK, let’s fit this model

``````rich_mod_fullmed <- bf(rich ~  cover)

fit_brms_fullmed <- brm(rich_mod_fullmed +
cover_mod +
set_rescor(FALSE),
data=keeley,
cores=4, chains = 2)``````
``## Compiling the C++ model``
``## Start sampling``

First, if we wanted to evaluate the one missing claim, we could have looked at the `rich ~ firesev` relationship in the full model.

``fixef(rich_fit)``
``````##            Estimate Est.Error    2.5%ile   97.5%ile
## Intercept 54.205538  7.197803 40.1750869 68.8146233
## firesev   -2.558515  1.021273 -4.6423633 -0.6419759
## cover      9.783821  5.276849 -0.8659768 19.9671382``````

We can see the 95% Credible Interfal doesn’t overlap 0. First clue that we need it.

Second, let’s look at WAIC and LOO

``WAIC(k_fit_brms, fit_brms_fullmed)``
``````##                                 WAIC    SE
## k_fit_brms                    768.47 15.52
## fit_brms_fullmed              773.11 16.89
## k_fit_brms - fit_brms_fullmed  -4.64  5.57``````
``LOO(k_fit_brms, fit_brms_fullmed)``
``````##                                LOOIC    SE
## k_fit_brms                    768.48 15.52
## fit_brms_fullmed              773.14 16.90
## k_fit_brms - fit_brms_fullmed  -4.66  5.57``````

So, either way, both say that the partial mediation model is better, but the difference between the two overlaps. This is subtly different from AIC results from frequentist approaches – not sure what’s going on here, but super interesting. I’d still say that credible interval from the basis set suggests that the partial mediation model is the way to go, however, if this was a confirmatory model analysis.

Prediction

This is one of the things I’m more excited about here. Let’s say we start with a fire severity value of 5. How does this predict plant species richness? How do we bring uncertainty to the picture? There’s no automation here, so we have to calculate paths out ourselved – something I’m hoping to handle in `piecewiseSEM` and cam maybe port over here.

We start simply by getting cover. Note, non-terminal endogenous variables need to have an `NA` as their values.

``````newdata = data.frame(firesev = 5, cover=NA)

cover_pred <- fitted(k_fit_brms, newdata=newdata,
resp = "cover", nsamples = 1000,
summary = FALSE)

median(cover_pred)``````
``## [1] 0.6528961``
``posterior_interval(cover_pred)``
``````##             5%       95%
## [1,] 0.5966115 0.7087388``````

Boom! A prediction and the CL given coefficient error. OK, let’s let this go to richness. Note, we’re going to `expand.grid` to incorporate uncertainty, and so when we get a matrix back, we need to group by initial exogenous variables. In this case, we only have 1, so we can just `as.vector` the whole thing, but, for more complex newdata, you’ll need to do some more processing.

OK, so, let’s get the terminal endogenous variable. Note, the `as.matrix(diag())` bit is to keep the number of simulation consistent. Otherwise we’re just kicking up our number of sims quite a bit and we muck about with how we handle uncertainty.

``````newdata2 <- expand.grid(firesev = newdata\$firesev, cover = as.vector(cover_pred))

rich_pred <- fitted(k_fit_brms, newdata=newdata2,
resp = "rich", nsamples = 1000,
summary = FALSE)

#to minimize excess uncertainty
rich_pred <- as.matrix(diag(rich_pred))

#visualize
plot(density(as.vector(rich_pred)))``````