I want to know what *YOU* think about review, preprints, and publication

As part of the OpenPub project, we’re soliciting folk to send us videos about their experience with the scholarly publication process. We want to use these to try and crowdfund the development of OpenPub – our preprint server with robust tools for discussion and interaction. Interested? Check out the full request over here and/or email me!

Filtering Out Exogenous Pairs of Variables from a Basis Set

Sometimes in an SEM for which you're calculating a test of D-Separation, you want all exogenous variables to covary. If you have a large model with a number of exogenous variables, coding that into your basis set can be a pain, and hence, you can spend a lot of time filtering out elements that aren't part of your basis set, particularly with the ggm library. Here's a solution – a function I'm calling filterExoFromBasiSet


#Takes a basis set list from basiSet in ggm and a vector of variable names

filterExoFromBasiSet <- function(set, exo) {
    pairSet <- t(sapply(set, function(alist) cbind(alist[1], alist[2])))
    colA <- which(pairSet[, 1] %in% exo)
    colB <- which(pairSet[, 2] %in% exo)
    both <- c(colA, colB)
    both <- unique(both[which(duplicated(both))])

    set[-both]
}

How does it work? Let's say we have the following model:

y1 <- x1 + x2

Now, we should have no basis set. But…

library(ggm)

modA <- DAG(y1 ~ x1 + x2)
basiSet(modA)
## [[1]]
## [1] "x2" "x1"

Oops – there's a basis set! Now, instead, let's filter it

basisA <- basiSet(modA)
filterExoFromBasiSet(basisA, c("x1", "x2"))
## list()

Yup, we get back an empty list.

This function can come in handy. For example, let's say we're testing a model with an exogenous variable that does not connect to an endogenous variable, such as

y1 <- x1
x2 (which is exogenous)

Now –


modB <- DAG(y ~ x1, 
               x2 ~ x2)

basisB <- basiSet(modB)
filterExoFromBasiSet(basisB, c("x1", "x2"))
## [[1]]
## [1] "x2" "y"  "x1"

So, we have the correct basis set with only one element.

What about if we also have an endogenous variable that has no paths to it?


modC <- DAG(y1 ~ x1, 
               x2 ~ x2, 
               y2 ~ y2)

basisC <- basiSet(modC)

filterExoFromBasiSet(basisC, c("x1", "x2"))
## [[1]]
## [1] "y2" "x2"
## 
## [[2]]
## [1] "y2" "x1"
## 
## [[3]]
## [1] "y2" "y1" "x1"
## 
## [[4]]
## [1] "x2" "y1" "x1"

This yields the correct 4 element basis set.

Extracting p-values from different fit R objects

Let's say you want to extract a p-value and save it as a variable for future use from a linear or generalized linear model – mixed or non! This is something you might want to do if, say, you were calculating Fisher's C from an equation-level Structural Equation Model. Here's how to extract the effect of a variable from multiple different fit models. We'll start with a data set with x, y, z, and a block effect (we'll see who in a moment).


x <- rep(1:10, 2)
y <- rnorm(20, x, 3)
block <- c(rep("a", 10), rep("b", 10))

mydata <- data.frame(x = x, y = y, block = block, z = rnorm(20))

Now, how would you extract the p-value for the parameter fit for z from a linear model object? Simply put, use the t-table from the lm object's summary

alm <- lm(y ~ x + z, data = mydata)

summary(alm)$coefficients
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   1.1833     1.3496  0.8768 0.392840
## x             0.7416     0.2190  3.3869 0.003506
## z            -0.4021     0.8376 -0.4801 0.637251

# Note that this is a matrix.  
# The third row, fourth column is the p value
# you want, so...

p.lm <- summary(alm)$coefficients[3, 4]

p.lm
## [1] 0.6373

That's a linear model, what about a generalized linear model?

aglm <- glm(y ~ x + z, data = mydata)

summary(aglm)$coefficients
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   1.1833     1.3496  0.8768 0.392840
## x             0.7416     0.2190  3.3869 0.003506
## z            -0.4021     0.8376 -0.4801 0.637251

# Again, is a matrix.  
# The third row, fourth column is the p value you
# want, so...

p.glm <- summary(aglm)$coefficients[3, 4]

p.glm
## [1] 0.6373

That's a linear model, what about a generalized linear model?


anls <- nls(y ~ a * x + b * z, data = mydata, 
     start = list(a = 1, b = 1))

summary(anls)$coefficients
##   Estimate Std. Error t value  Pr(>|t|)
## a   0.9118     0.1007   9.050 4.055e-08
## b  -0.4651     0.8291  -0.561 5.817e-01

# Again, is a matrix.  
# The second row, fourth column is the p value you
# want, so...

p.nls <- summary(anls)$coefficients[2, 4]

p.nls
## [1] 0.5817

Great. Now, what if we were running a mixed model? First, let's look at the nlme package. Here, the relevant part of the summary object is the tTable

library(nlme)
alme <- lme(y ~ x + z, random = ~1 | block, data = mydata)

summary(alme)$tTable
##               Value Std.Error DF t-value  p-value
## (Intercept)  1.1833    1.3496 16  0.8768 0.393592
## x            0.7416    0.2190 16  3.3869 0.003763
## z           -0.4021    0.8376 16 -0.4801 0.637630

# Again, is a matrix.  
# But now the third row, fifth column is the p value
# you want, so...

p.lme <- summary(alme)$tTable[3, 5]

p.lme
## [1] 0.6376

Last, what about lme4? Now, for a linear lmer object, you cannot get a p value. But, if this is a generalizes linear mixed model, you are good to go (as in Shipley 2009). Let's try that here.

library(lme4)

almer <- lmer(y ~ x + z + 1 | block, data = mydata)

# no p-value!
summary(almer)@coefs
##             Estimate Std. Error t value
## (Intercept)    4.792     0.5823   8.231

# but, for a genearlined linear mixed model
# and yes, I know this is a
# bad model but, you know, demonstration!

aglmer <- lmer(y + 5 ~ x + z + (1 | block), 
        data = mydata, family = poisson(link = "log"))

summary(aglmer)@coefs
##             Estimate Std. Error z value  Pr(>|z|)
## (Intercept)  1.90813    0.16542  11.535 8.812e-31
## x            0.07247    0.02471   2.933 3.362e-03
## z           -0.03193    0.09046  -0.353 7.241e-01

# matrix again!  Third row, fourth column
p.glmer <- summary(aglmer)@coefs[3, 4]

p.glmer
## [1] 0.7241

#Scio13 and Beyond

While I’ve been active in using online spaces for scientific activities – blogging, tweeting, crowdfunding, and much much more – for a looong time. I’ve found it’s benefitted me greatly as a scientist. I’ve also formed a deep love for the community I’ve found in the online science world (to name just a few).

And yet, until this year, I’d never been to science online before.

This year, I finally remedied that. And it was indeed amazing. I’ll be posting the notes from my own session on how science online can and has changed the peer review process, but, I wanted to share this picture of (many) of the marine bloggers who were at the conference, and issue a challenge.

First – the challenge.

HEY MARINE SCIENTISTS WHO READ THIS BLOG (you’re quiet, but I see you in my hitlog)!!! I understand that #SciO13 may have been a bit overwhelming for you, or too broad, or something, so you didn’t register. It’s ok. Becoming more engaged with the science online world can seem like a lot. But, aren’t you a little bit curious? Well, if you are, in October, David Shiffman is setting up an amazing opportunity for you – Science Online Oceans. Go read his post, and block that weekend off on your calendar. Right now! Then come to Miami next October, and be prepared to have your world blown open as you interact with a much broader community that will help you realize the full potential of this internet thingamajig and how it can help you as a scientist.

(oh, and everyone reading should consider coming to Science Online 2014 as well)

And now, the picture, which should serve as some extra enticement. It’s only a few of the marine bloggers who were at the conference, so it’s only a small flavor of the awesomeness that was there, and the great connections and conversations that resulted. But I think you get the point.

OceanBloggers at #SciO13