Bayesian SEM with BRMS

Note: for the most up to date version of this, see this post on rpubs

Background

So, it looks like brms version 2.0 implements multivariate responses – and hence piecewise Structural Equation Modeling in a Bayesian framework. After https://github.com/paul-buerkner/brms/issues/303, it became clear that while one cannot do tests of D-separation, WAIC, LOO, etc., are all available. Let’s take a peak with a familiar example.

As a note, blavaan also implements Bayesian SEM, but brms is so gorram flexible with model forms, that I think it’s an excellent way to approach the problem that gives a lot more breathing room. And it’s in STAN, not JAGS!

The Keeley Data

OK, the venerable Keeley et al. 2005 and extensions in Grace and Keeley 2006. The teaching example that a number of us use is to look at how fire severity mediates plan species richness via cover.

### Getting the data from piecewiseSEM
#install.packages("jslefche/piecewiseSEM@2.0") #for the keeley data set
library(piecewiseSEM) 
data(keeley)

##Otherwise, on the web
#keeley <- read.csv("http://byrneslab.net/classes/lavaan_materials/Keeley_rawdata_select4.csv ")

Using covariance based modeling, we would fit the following:

library(lavaan)

k_mod <- "
  rich ~ firesev + cover
  cover ~ firesev"

k_fit_lavaan <- sem(k_mod, data = keeley)

We can fit this in piecewiseSEM as well

k_fit_psem <- psem(
  lm(rich ~ firesev + cover, data=keeley),
  lm(cover ~ firesev, data=keeley),
  data = keeley
)

Yep, the same.

brms and SEM

In the new brms you can build these models with mvbrmsformula or just adding multiple brmsformula objects together. We’ll use set_rescor(FALSE) to not model the correlation between response variables (but could to represent residual correlations, I think!) So, here we have

library(brms)
## Loading required package: Rcpp
## Loading required package: ggplot2
## Loading 'brms' package (version 2.0.0). Useful instructions
## can be found by typing help('brms'). A more detailed introduction
## to the package is available through vignette('brms_overview').
## Run theme_set(theme_default()) to use the default bayesplot theme.
rich_mod <- bf(rich ~ firesev + cover)
cover_mod <- bf(cover ~ firesev)

k_fit_brms <- brm(rich_mod +
                  cover_mod + 
                  set_rescor(FALSE), 
                data=keeley,
                cores=4, chains = 2)
## Compiling the C++ model
## Start sampling

Looks the same! How did the model do?

plot(k_fit_brms)

One thing we can do is look at the WAIC or LOO which we can see is the sum of its component pieces. Let’s start with fitting those pieces.

rich_fit <- brm(rich_mod,
                data=keeley,
                cores=2, chains = 2)
## Compiling the C++ model
## Start sampling
cover_fit <- brm(cover_mod,
                data=keeley,
                cores=2, chains = 2)
## Compiling the C++ model
## Start sampling

Now, let’s look at the whole and the parts

WAIC(k_fit_brms)
##    WAIC    SE
##  768.47 15.52
WAIC(rich_fit)
##    WAIC   SE
##  734.05 9.35
WAIC(cover_fit)
##   WAIC    SE
##  34.43 12.61

Nice!

Comparison across all three forms

Broadly, the figures above show simular results. But let’s dig deeper. In lavaan we get

summary(k_fit_lavaan)
## lavaan (0.6-1.1170) converged normally after  32 iterations
## 
##   Number of observations                            90
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                       0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   rich ~                                              
##     firesev          -2.531    0.976   -2.593    0.010
##     cover             9.911    5.083    1.950    0.051
##   cover ~                                             
##     firesev          -0.084    0.018   -4.611    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .rich            187.212   27.908    6.708    0.000
##    .cover             0.081    0.012    6.708    0.000

We can see the rich ~ cover path has 0=0.051. Which, you know, some people are OK with, some might wonder. piecewiseSEM produces broadly similar results.

summary(k_fit_psem, intercept=TRUE)
## 
## Structural Equation Model of k_fit_psem 
## 
## Call:
##   rich ~ firesev + cover
##   cover ~ firesev
## 
##     AIC      BIC
##  14.000   31.499
## 
## ---
## Tests of directed separation:
## 
##  data frame with 0 columns and 0 rows
## 
## Global goodness-of-fit:
## 
##   Fisher's C = 0 with P-value = 1 and on 0 degrees of freedom
## 
## ---
## Coefficients:
## 
##   Response   Predictor Estimate Std.Error DF Crit.Value P.Value
##       rich (Intercept)  53.9358    7.0437 87     7.6573  0.0000
##       rich     firesev  -2.5308    0.9926 87    -2.5496  0.0125
##       rich       cover   9.9105    5.1701 87     1.9169  0.0585
##      cover (Intercept)   1.0744    0.0893 88    12.0299  0.0000
##      cover     firesev  -0.0839    0.0184 88    -4.5594  0.0000
##   Std.Estimate Std.SE    
##         0.0000 0.0000 ***
##        -0.2768 0.1086   *
##         0.2081 0.1086    
##         0.0000 0.0000 ***
##        -0.4371 0.0959 ***
## 
##   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
## 
## Individual R-squared:
## 
##   Response method R.squared
##       rich     NA      0.17
##      cover     NA      0.19

Note that p-value goes up! Huh. We also now have intercepts, which we can compare to…

summary(k_fit_brms)
##  Family: MV(gaussian, gaussian) 
##   Links: mu = identity; sigma = identity
##          mu = identity; sigma = identity 
## Formula: rich ~ firesev + cover 
##          cover ~ firesev 
##    Data: keeley (Number of observations: 90) 
## Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1; 
##          total post-warmup samples = 2000
##     ICs: LOO = NA; WAIC = NA; R2 = NA
##  
## Population-Level Effects: 
##                 Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## rich_Intercept     54.16      6.92    40.46    67.73       2000 1.00
## cover_Intercept     1.07      0.09     0.90     1.24       2000 1.00
## rich_firesev       -2.57      0.99    -4.52    -0.58       2000 1.00
## rich_cover          9.84      5.05    -0.30    19.59       2000 1.00
## cover_firesev      -0.08      0.02    -0.12    -0.05       2000 1.00
## 
## Family Specific Parameters: 
##             Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma_rich     14.02      1.07    12.04    16.27       2000 1.00
## sigma_cover     0.29      0.02     0.25     0.34       2000 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Nice effective sample sizes and convergence. Note the rich_cover credible interval is pretty big, now, and we can see how it overlaps 0. So, broadly, similar results again!

OK, what can we do with this?From an SEM framework, we want to compare this to an alternate model. I’ve seen suggested that we can calculate posterior p-values for the basis set and go from there for a traditional D-sep test, but that really does mix frequentist and Bayesian paradigms, so, I’m trying to ponder that – although evaluating each independence claim isn’t a bad idea. Might need to write some code to do that.

Model comparison is a snap, however.

Mediation in BRMS

Let’s say we have the following alternate model where fire severity’s effect is fully mediated via cover.

OK, let’s fit this model

rich_mod_fullmed <- bf(rich ~  cover)

fit_brms_fullmed <- brm(rich_mod_fullmed +
                  cover_mod + 
                  set_rescor(FALSE), 
                data=keeley,
                cores=4, chains = 2)
## Compiling the C++ model
## Start sampling

First, if we wanted to evaluate the one missing claim, we could have looked at the rich ~ firesev relationship in the full model.

fixef(rich_fit)
##            Estimate Est.Error    2.5%ile   97.5%ile
## Intercept 54.205538  7.197803 40.1750869 68.8146233
## firesev   -2.558515  1.021273 -4.6423633 -0.6419759
## cover      9.783821  5.276849 -0.8659768 19.9671382

We can see the 95% Credible Interfal doesn’t overlap 0. First clue that we need it.

Second, let’s look at WAIC and LOO

WAIC(k_fit_brms, fit_brms_fullmed)
##                                 WAIC    SE
## k_fit_brms                    768.47 15.52
## fit_brms_fullmed              773.11 16.89
## k_fit_brms - fit_brms_fullmed  -4.64  5.57
LOO(k_fit_brms, fit_brms_fullmed)
##                                LOOIC    SE
## k_fit_brms                    768.48 15.52
## fit_brms_fullmed              773.14 16.90
## k_fit_brms - fit_brms_fullmed  -4.66  5.57

So, either way, both say that the partial mediation model is better, but the difference between the two overlaps. This is subtly different from AIC results from frequentist approaches – not sure what’s going on here, but super interesting. I’d still say that credible interval from the basis set suggests that the partial mediation model is the way to go, however, if this was a confirmatory model analysis.

Prediction

This is one of the things I’m more excited about here. Let’s say we start with a fire severity value of 5. How does this predict plant species richness? How do we bring uncertainty to the picture? There’s no automation here, so we have to calculate paths out ourselved – something I’m hoping to handle in piecewiseSEM and cam maybe port over here.

We start simply by getting cover. Note, non-terminal endogenous variables need to have an NA as their values.

newdata = data.frame(firesev = 5, cover=NA)

cover_pred <- fitted(k_fit_brms, newdata=newdata,
             resp = "cover", nsamples = 1000, 
             summary = FALSE)

median(cover_pred)
## [1] 0.6528961
posterior_interval(cover_pred)
##             5%       95%
## [1,] 0.5966115 0.7087388

Boom! A prediction and the CL given coefficient error. OK, let’s let this go to richness. Note, we’re going to expand.grid to incorporate uncertainty, and so when we get a matrix back, we need to group by initial exogenous variables. In this case, we only have 1, so we can just as.vector the whole thing, but, for more complex newdata, you’ll need to do some more processing.

OK, so, let’s get the terminal endogenous variable. Note, the as.matrix(diag()) bit is to keep the number of simulation consistent. Otherwise we’re just kicking up our number of sims quite a bit and we muck about with how we handle uncertainty.

newdata2 <- expand.grid(firesev = newdata$firesev, cover = as.vector(cover_pred))

rich_pred <- fitted(k_fit_brms, newdata=newdata2,
             resp = "rich", nsamples = 1000, 
             summary = FALSE)

#to minimize excess uncertainty
rich_pred <- as.matrix(diag(rich_pred))

#visualize
plot(density(as.vector(rich_pred)))