Extracting p-values from different fit R objects

Let's say you want to extract a p-value and save it as a variable for future use from a linear or generalized linear model – mixed or non! This is something you might want to do if, say, you were calculating Fisher's C from an equation-level Structural Equation Model. Here's how to extract the effect of a variable from multiple different fit models. We'll start with a data set with x, y, z, and a block effect (we'll see who in a moment).

x <- rep(1:10, 2)
y <- rnorm(20, x, 3)
block <- c(rep("a", 10), rep("b", 10))

mydata <- data.frame(x = x, y = y, block = block, z = rnorm(20))

Now, how would you extract the p-value for the parameter fit for z from a linear model object? Simply put, use the t-table from the lm object's summary

alm <- lm(y ~ x + z, data = mydata)

##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   1.1833     1.3496  0.8768 0.392840
## x             0.7416     0.2190  3.3869 0.003506
## z            -0.4021     0.8376 -0.4801 0.637251

# Note that this is a matrix.  
# The third row, fourth column is the p value
# you want, so...

p.lm <- summary(alm)$coefficients[3, 4]

## [1] 0.6373

That's a linear model, what about a generalized linear model?

aglm <- glm(y ~ x + z, data = mydata)

##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   1.1833     1.3496  0.8768 0.392840
## x             0.7416     0.2190  3.3869 0.003506
## z            -0.4021     0.8376 -0.4801 0.637251

# Again, is a matrix.  
# The third row, fourth column is the p value you
# want, so...

p.glm <- summary(aglm)$coefficients[3, 4]

## [1] 0.6373

That's a linear model, what about a generalized linear model?

anls <- nls(y ~ a * x + b * z, data = mydata, 
     start = list(a = 1, b = 1))

##   Estimate Std. Error t value  Pr(>|t|)
## a   0.9118     0.1007   9.050 4.055e-08
## b  -0.4651     0.8291  -0.561 5.817e-01

# Again, is a matrix.  
# The second row, fourth column is the p value you
# want, so...

p.nls <- summary(anls)$coefficients[2, 4]

## [1] 0.5817

Great. Now, what if we were running a mixed model? First, let's look at the nlme package. Here, the relevant part of the summary object is the tTable

alme <- lme(y ~ x + z, random = ~1 | block, data = mydata)

##               Value Std.Error DF t-value  p-value
## (Intercept)  1.1833    1.3496 16  0.8768 0.393592
## x            0.7416    0.2190 16  3.3869 0.003763
## z           -0.4021    0.8376 16 -0.4801 0.637630

# Again, is a matrix.  
# But now the third row, fifth column is the p value
# you want, so...

p.lme <- summary(alme)$tTable[3, 5]

## [1] 0.6376

Last, what about lme4? Now, for a linear lmer object, you cannot get a p value. But, if this is a generalizes linear mixed model, you are good to go (as in Shipley 2009). Let's try that here.


almer <- lmer(y ~ x + z + 1 | block, data = mydata)

# no p-value!
##             Estimate Std. Error t value
## (Intercept)    4.792     0.5823   8.231

# but, for a genearlined linear mixed model
# and yes, I know this is a
# bad model but, you know, demonstration!

aglmer <- lmer(y + 5 ~ x + z + (1 | block), 
        data = mydata, family = poisson(link = "log"))

##             Estimate Std. Error z value  Pr(>|z|)
## (Intercept)  1.90813    0.16542  11.535 8.812e-31
## x            0.07247    0.02471   2.933 3.362e-03
## z           -0.03193    0.09046  -0.353 7.241e-01

# matrix again!  Third row, fourth column
p.glmer <- summary(aglmer)@coefs[3, 4]

## [1] 0.7241

#Scio13 and Beyond

While I’ve been active in using online spaces for scientific activities – blogging, tweeting, crowdfunding, and much much more – for a looong time. I’ve found it’s benefitted me greatly as a scientist. I’ve also formed a deep love for the community I’ve found in the online science world (to name just a few).

And yet, until this year, I’d never been to science online before.

This year, I finally remedied that. And it was indeed amazing. I’ll be posting the notes from my own session on how science online can and has changed the peer review process, but, I wanted to share this picture of (many) of the marine bloggers who were at the conference, and issue a challenge.

First – the challenge.

HEY MARINE SCIENTISTS WHO READ THIS BLOG (you’re quiet, but I see you in my hitlog)!!! I understand that #SciO13 may have been a bit overwhelming for you, or too broad, or something, so you didn’t register. It’s ok. Becoming more engaged with the science online world can seem like a lot. But, aren’t you a little bit curious? Well, if you are, in October, David Shiffman is setting up an amazing opportunity for you – Science Online Oceans. Go read his post, and block that weekend off on your calendar. Right now! Then come to Miami next October, and be prepared to have your world blown open as you interact with a much broader community that will help you realize the full potential of this internet thingamajig and how it can help you as a scientist.

(oh, and everyone reading should consider coming to Science Online 2014 as well)

And now, the picture, which should serve as some extra enticement. It’s only a few of the marine bloggers who were at the conference, so it’s only a small flavor of the awesomeness that was there, and the great connections and conversations that resulted. But I think you get the point.

OceanBloggers at #SciO13

Crowdfunding in the Peer Reviewed Literature

(x-posted at the #SciFund Blog)

ResearchBlogging.orgThe final version of Wheat et al.’s paper Raising money for scientific research through crowdfunding is out in Trends in Ecology and Evolution. A more hefty piece of #SciFund analysis is behind it (slowed down in no small part because of my sloooow processing of new data in fancy models). The Wheat et al. paper is a lovely short piece that Rachel and Yiwei (who crowdfunded the excellent Alaska Predator Research Expedition that has it’s new website over here) were gracious enough to ask Jai and I to participate in. In it, we cover the basics of crowdfunding for the academic sciences – what is it? what are the platforms you might use? what are some strategies for success?

Overall, this is a nice, gentle introduction that you should send to any colleague who either appears interested in crowdfunding, curious as to what it is, or is highly skeptical of the entire enterprise.

So go check it out!

Wheat R.E., Wang Y., Byrnes J.E. & Ranganathan J. (2013). Raising money for scientific research through crowdfunding, Trends in Ecology & Evolution, 28 (2) 71-72. DOI:

Linkage: A field guide to privilege in marine science

As someone who, admittedly, benefitted a great deal from Privilege growing up (it definitely lowered the barriers to my becoming a succesful marine scientist), know that this is true of MANY Ecology and Evolutionary Biology folk, and now think a good deal about how I can lower those barriers for students and mentees that come under my aegis, you should all go over and check out Miriam Goldstein’s A field guide to privilege in marine science: some reasons why we lack diversity.

A Quick Note in Weighting with nlme

I’ve been doing a lot of meta-analytic things lately. More on that anon. But one quick thing that came up was variance weighting with mixed models in R, and after a few web searches, I wanted to post this, more as a note-to-self and others than anything. Now, in a simple linear model, weighting by variance or sample size is straightforward.

lm(y ~ x, data = dat, weights = 1/v)

#sample size
lm(y ~ x, data = dat, weights = n)

You can use the same sort of weights argument with lmer. But, what about if you’re using nlme? There are reasons to do so. Things change a bit, as nlme uses a wide array of weighting functions for the variance to give it some wonderful flexibility – indeed, it’s a reason to use nlme in the first place! But, for such a simple case, to get the equivalent of the above, here’s the tricky little difference. I’m using gls, generalized least squares, but this should work for lme as well.

gls(y ~ x, data=dat, weights = ~v)

#sample size
gls(y ~ x, data = dat, weights = ~1/n)

OK, end note to self. Thanks to John Griffin for prompting this.


Everyone has been pretty shocked by the devastation wreaked by Sandy. Here in New England, we also got a Nor’easter following a few days later. That’s a lot of intense storm action in a short period of time.

So I was quite curious as I ventured out into the field last weekend to see how things looked. I went on a potential field site scouting trip to UMB’s field station in Nantucket. Nantucket of course got a good dose of Sandy, although it largely passed southwest. The Nor’Easter may have been worse.

What I found while just walking about on the shoreline was pretty incredible. It was Scallapocalypse.

Let me include a video here of what one saw looking across the beach so you can get a sense of what was going on.

This was taken in Madaket. It was a bit more dramatic in other parts of the island – because scallop fishermen had come on shore, scooped up the scallops (many of which were the seed for next year, and too small for now) and taken them back out to the scallop grounds. Here’s what things looked like by the lab.

All over, the scallop grounds had come to shore.

But the huge flux of biomass onto shore was impressive. And it wasn’t just scallops, but a ton of seagrass as well, much of which was matting over fringing salt marshes.

Still, the huge amount of energy and nutrients coming into the shoreline ecosystem driven by storms gave me a lot of pause. I mean, those scallops that weren’t saved did end up in the coastal foodweb. Birds were definitely looking fat and happy, and we’d find piles like this with flocks of birds nearby:

The whole thing really got my brain going, with two big questions

1) So, what is the fate of all of this influx of stuff into the shoreline? How will the influx of energy alter the structure and dymaics of the food web? Will the smothering of the marsh matter? It is winter, when things are slower. How quickly will everything be decomposed? Will the effects be lagged until the springtime? Or will they affect the system now? I think of Gary Polis’s work on how food web structure is shaped by the influx of energy on small islands. I know this is a BIG island, but, still, the point stands, this is a big flux of biomass and nitrogen. And it’s not just plant matter, but animal protein.

2) How will climate change alter the frequency of this subsidy? What would the consequences of a regime with regular small subsidies and occasional big ones versus regular big subsidies be? This stems largely from my thinking about the increase in the size of the ‘largest storm of the year’ in California coastal systems that’s been the basis of my previous work. But, models and analysis from the Knutson group seem to show that, while hurricanes and cyclones in the Atlantic aren’t getting more frequent, the size of each one is getting bigger. So, similar pattern. If small subsidies are coming in every year now due to the occasional passing hurricane or Nor’easter, but the size of those same storms in the future is going to get larger, then having this kind of big Scallapocalypse/subsidy could get more frequent. Particular as northern Atlantic waters get warmer (which they are – Nixon 2004), this could be an interesting and perhaps not so well investigated climate effect – the increased strength of coupling between marine and terrestrial food webs.

Oh, and random 3) What role will invasive algae play in increasing the impacts of storms on the amount of material coming on land? This may lead nowhere, but I noticed a lot of material (not scallops) that had washed on land had the invasive Codium fragile attached to it. I know that subtidal kelps can do this to mussels as well (Witman’s work), but there’s no kelp here. Is Codium becoming a drag (har har) and increasing the energy and nutrient flow from sea to land?

All in all, an interesting trip with a lot to chew on for future research. And a great setting!

Knutson, T. R., J. L. McBride, J. Chan, K. Emanuel, G. Holland, C. Landsea, I. Held, J. P. Kossin, A. K. Srivastava, and M. Sugi. 2010. Tropical cyclones and climate change. Nature Climate Change 3:157–163.

Nixon, S. W., S. Granger, B. A. Buckley, M. Lamont, and B. Rowell. 2004. A one hundred and seventeen year coastal water temperature record from Woods Hole, Massachusetts. Estuaries 27:397–404.

Polis, G. A., and S. D. Hurd. 1995. Extraordinarily high spider densities on islands: flow of energy from the marine to terrestrial food webs and the absence of predation. Proceedings of the National Academy of Sciences, USA 92:4382–4386.

Polis, G. A., W. B. Anderson, and R. D. Holt. 1997. Toward an integration of landscape and food web ecology: the dynamics of spatially subsidized food webs. Annual Review of Ecology and Systematics 28:289–316.

Witman, J. D., and T. H. Suchanek. 1984. Mussels in Flow – Drag and Dislodgement by Epizoans. Marine Ecology Progress Series 16:259–268.

Why I’m Teaching Computational Data Analysis for Biology

This is a x-post from the blog I’ve setup for my course blog. As my first class at UMB, I’m teaching An Introduction to Computational Data Analysis for Biology – basically mixing teaching statistics and basic programming. It’s something I’ve thought a long time about teaching – although the rubber meeting the road has been fascinating.

As part of the course, I’m adapting an exercise that I learned while taking English courses – in particular from a course on Dante’s Divine Comedy. I ask that students write 1 page weekly to demonstrate that they are having a meaningful interaction with the material. I give them a few pages from this book as a prompt, but really they can write about anything. One student will post on the blog per week (and I’m encouraging them to use the blog for posting other materials as well – we shall see, it’s an experiment). After they post, I hope that it will start a conversation, at least amongst participants in the class. I also think this post might pair well with some of Brian McGill’s comments on statistical machismo to show you a brief sketch of my own evolution as a data analyst.

I’ll be honest, I’m excited. I’m excited to be teaching Computational Data Analysis to a fresh crop of graduate students. I’m excited to try and take what I have learned over the past decade of work in science, and share that knowledge. I am excited to share lessons learned and help others benefit from the strange explorations I’ve had into the wild world of data.

I’m ready to get beyond the cookbook approach to data. When I began learning data analysis, way back in an undergraduate field course, it was all ANOVA all the time (with brief diversions to regression or ANCOVA). There was some point and click software that made it easy, so long as you knew the right recipe for the shape of your data. The more complex the situation, the more creative you had to be in getting an accurate sample, and then in determining what was the right incantation of sums of squares to get a meaningful test statistic. And woe be it if your p value from your research was 0.051.

I think I enjoyed this because it was fitting a puzzle together. That, and I love to cook, so, who doesn’t want to follow a good recipe?

Still, there was something that always nagged me. This approach – which I encountered again and again – seemed stale. The body of analysis was beautiful, but it seemed divorced from the data sets that I saw starting to arrive on the horizon – data sets that were so large, or chocked full of so many different variables, that something seemed amiss.

The answer rippled over me in waves. First, comments from an editor – Ram Meyers – for a paper of mine began to lift the veil. I had done all of my analyses as taught (and indeed even used for a class) using ANOVA and regression, multiple comparison, etc. etc. in the classic vein. Ram asked why, particularly given that the Biological processes that generated my data should in no way generate something with a normal – or even log-normal – distribution. While he agreed that the approximation was good enough, he made me go back, and jump off the cliff into the world of generalized linear models. It was bracing. But he walked me through it – over the phone even.

So, a new recipe, yes? But it seemed like something more was looming.

Then, an expiration of a JMP site license with one week left on a paper revision left me bereft. The only free tool I could turn to that seemed to do what I wanted it to do was R.

Wonderful, messy, idiosyncratic R.

I jumped in and learned the bare minimum of what I needed to know to do my analysis…and lost myself.

I had taken computer science in college, and even written the backend of a number of websites in PERL (also wonderful, messy, and idiosyncratic). What I enjoyed most about programming was that you could not hide from how you manipulated information. Programming has a functional aspect at the core where an input must be translated into a meaningful output according to the rules that you craft.

Working with R, I was crafting rules to generate meaningful statistical output. But what were those rules but my assumptions about how nature worked. The fundamentals of what I was doing all along – fitting a line to data with an error distribution – that should be based in biology, not arbitrary assumptions – was laid all the more bare. Some grumblingly lovely help from statistical denizens on the R help boards helped to bring this in sharp focus.

So, I was ready when, for whatever reason, fate thrust me into a series of workshops on Bayesian statistics, AIC analysis, hierarchical modeling, time series analysis, data visualization, meta-analysis, and last – Structural Equation Modeling.

I was delighted to learn more and more of how statistical analysis had grown beyond what I had been taught. I drank deeply of it. I know, that’s pretty nerdy, but, there you have it.

The new techniques all shared a common core – that they were engines of inference about biological processes. How I, as the analyst, made assumptions about how the world worked was up to me. Once I had a model of how my system worked in mind – sketched out, filled with notes on error distributions, interactions, and more, I could sit back and think about what inferential tools would give me the clearest answers I needed.

I had moved instead of finding the one right recipe in a giant cookbook to choosing the right tools out of a toolbox. And then using the tools of computer science – optimizing algorithms, thinking about efficient data storage, etc – to let my tools work bring data and biological models together.

It’s exciting. And that’s the core philosophy I’m trying to convey in this semester. (N.B. the spellchecker tried to change convey to convert – there’s something there).

Think about biology. Think about a line. Think about a probability distribution. Put them together, and find out what stories your data can tell you about the world.