I’ve Got the Power!

There is nothing like that pain of taking a first look at fresh precious new data from a carefully designed experiment which took months of planning and construction, 8 divers working full days, and a lot of back-and-forth with colleagues, and then you find absolutely no patterns of interest.

Yup, it’s a real takes-your-breath-away-hand-me-a-beer kind of moment. Fortunately, I was on my way to a 4th of July dinner with my family up in the mountains of Colorado, so, I wasn’t quite going to commit hari-kari on the spot. Still, the moment of vertigo and neausea was somewhat impressive.

Because, let’s admit it, as much as science is about repeated failure until you get an experiment right (I mean, the word is “experiment” not “interesting-meaningful-data-generating-excersise”), it still never feels good.

So, round one of my experiment to test for feedbacks between sessile species diversity and urchin grazing showed bupkiss. Not only was there no richness effect, but, well, there wasn’t even a clear effect that urchins mattered. High density urchin treatments lost about as much cover and diversity of algae and sessile inverts as treatments with low or no urchins in them. What had gone wrong? And can this project be salvaged?

Yeah, I can see a pattern here.  Sure...

Yeah, I can see a pattern here. Sure...

Rather than immediately leaping to a brilliant biological conclusion (that was frankly not in my head) I decided to jump into something called Power Analysis. Now, to most experimental scientists, power analysis is that thing you were told to do in your stats class to tell you how big your sample size should be, but, you know, sometimes you do it, but not always. Because, really, it is often taught as a series of formulae with no seeming rhyme nor reason (although you know it has something to do with statistical significance). Most folk use a few canned packages for ANOVA or regression, but the underlying mechanics seem obscure. And hence, somewhat fear-provoking.

Well, here’s the dirty little secret of power analysis. Really, at the end of the day, it’s just running simulations of a model of what you THINK is going on, adding in some error, and then seeing how often your experimental design of choice will give you easily interpretable (or even correct) results.

The basic mechanics are this. Have some underlying deterministic model. In my case, a model that described how the cover and diversity of sessile species should change given an initial diversity, cover, and number of urchins. Then, add in random error – either process error in each effect itself, or just noise from the environment (e.g., some normal error term with mean 0 and a standard deviation). Use this to get made-up data for your experiment. Run your states and get parameter estimates, p values, etc. Then, do the same thing all over again. Over many simulations of a particular design, you can get an average coefficient estimates, the number of times you get a significant result, etc. “Power” for a given effect is determined by the number of “significant” results (where you define if you’re going with p=0.05 or whatnot) divided by the number of simulations.

So that’s just what I did. I had coefficient estimates (albeit nonsignificant ones). So, what would I need to detect them? I started by playing with the number of replicates. What if I had just put out more cages? Nope. The number of replicates I’d need to get the power into the 0.8 range alone (and you want as close to 1 as possible) was mildly obscene.

So, what about number of urchins? What if instead of having 0 and 16 as my lowest and highest densities?

Bingo.

Shown above are two plots. On the left, you see the p value for the species richness * urchin density interaction for each simulated run at a variety of maximum urchin densities. On the right you see the power (# of simulations where p<0.05 / total number of simulations). Note that, for the densities I worked with, power is around 0.3? And if I kicked it up to 50, well, things get much nicer.

As soon as I saw this graph, the biology of the the Southern California Bight walked up behind me and whapped me over the head. Of course. I had been using a maximum density that corresponded to your average urchin barren. The density of urchins that can hang out, and graze down any new growth. But that's not the density of urchins that CREATED the barren in the first place. Sure enough, looking at our long-term data, this new value of 50 corresponded to a typical front of urchins that would move through an area and lay waste to the kelp forest around it.

Which is what I had been interested in the first place.

*headdesk*

So, round two is up and running now. The cages are stocked with obscenely high densities of the little purple spikey dudes. And we shall see what happens! I am quite excited, and hopeful. Because, I've got the Power!

(and so should you!)


Simple Data Visualization

OK, so, I know I already raved about one Hadley Wickham project and how it has changed my life last week. But what can I say, the man is a genius. And if you are using R (and let’s face it, you should be) and you want simple sexy graphs made quick, the man has you covered once again.

I first learned about ggplot2 while scanning through some slides of the LA Area RUG meetings (that I missed – I still haven’t been to any!) by the folks from Michael Driscoll.

And I liked what I saw – ggplot2 and lattice (which I admit, I had kind of avoided) seemed more intuitive than I thought. Then I ran into a series of articles on ggplot2 from the Learning R blog and I was hooked. Still am. And why I ask?

Let’s consider a task – you have some data split into four groups. For each group, you want to plot a fitted regression between two covariates. You want this split into panels, with common scales, and nicely placed axis labels. Also, you want it to be purty. While you can do this with the standard graphics package (and, I admit, I sometimes like the austerity of the standard graphics), it would take a for loop, tons of formatting instructions, and a number of steps where you could easily mess the whole thing up. Not to mention that you might have to change a good bit if you want to add another group.

Here is how easy it is with ggplot2. Note, there are only two lines of actual graphing code. The rest is just making up the data.

library(ggplot2)

#create data - x is a predictor, g is a group, y is the response
x<-1:100
g<-as.factor(c("A", "B", "C", "D"))

#i love expand grid, as it easily creates full
#matrices of all combinations of multiple variables
z<-data.frame(expand.grid(x=x,g=g))

#add a little error in to our results
z$y<- rnorm(length(z$x),z$x*c(3,15,29, 50)[unclass(z$g)]+5, sd=200)

#this first bit just sets up the basic plotting, with colors and panels
a<-qplot(x,y,group=g, colour=g, data=z, facets=~g)

#now, add a fitted line with the error around it shown.
# Also, change the theme.  And plot the whole thing!
a+stat_smooth(method="lm", se=T, group=g, fill="black")+theme_bw()

All of which yields the following pretty figure:

Nice, eh?

And that stat_smooth statement can take lots of other arguments - e.g., glms (I've tried, and it looks great!)

So check it out - even for just casual data exploration, there's some real clarity to be found. And I look forward to trying out other products by Prof. Wickham!

Lytechinus: Pack Wolf of the Sea

ResearchBlogging.orgSo, you know, I’m cruising along, trying to determine the diet of the white urchin, Lytechinus anamesus, from the literature. There’s your usual “It eats kelp” papers, a few red algae papers, and nothing else special and then – A PAPER ON LYTICHINUS EATING OTHER SPECIES OF URCHINS.

That’s right, baby, urchin on urchin predation. Could the ocean get much weirder?

A red urchin mobbed by the ravening horde.

A red urchin mobbed by the ravening horde.


Turns out, on urchin barrens (i.e., areas with such ridonculous numbers of urchins – purples, reds, and even whites – that all of the surrounding edible algae has been grazed away) when white urchins get hungry enough, they turn into little pack animals. They mob red and purple urchins (but not each other) and gnaw away at their spines and test until they reach their gooey center.

Or, as gooey as the center of an echinoderm can be.

I cannot resist giving lolcaptions to figures from scientific papers.  I clearly have a problem.

I cannot resist giving lolcaptions to figures from scientific papers. I clearly have a problem.

How intense is this predation? In lab trials, purples were mobbed within 20 minutes. 1/3 were dead by the end of the day. 90% of big fat red urchins tested were severely damaged.

These guys are not kidding around.

Perhaps more strangely, injured whites were left alone. Cannibalism for white urchins, apparently, is not on the menu.

So, dear reader, if you should find yourself hanging out in an urchin barren one day, just remember, don’t turn your back or the urchins will eat you!

MUAHAHAHAHAHAHAHA!

(Also of note, the first author of this paper was the man who turned me into a scientific diver. Huh!)

COYER, J., ENGLE, J., AMBROSE, R., & NELSON, B. (1987). Utilization of purple and red sea urchins (Strongylocentrotus purpuratus Stimpson and S. franciscanus Agassiz) as food by the white sea urchin (Lytechinus anamesus Clark) in the field and laboratory Journal of Experimental Marine Biology and Ecology, 105 (1), 21-38 DOI: 10.1016/S0022-0981(87)80027-8

Sometimes, you just need to use a plyr

I haven’t posted anything about R-nerdery in quite some time. But I have to pause for a moment, and sing the praises of a relatively new package that has made my life exponentially easier. The plyr package.

R has the capability to apply a single function to a vector or list using apply or mapply, or their various derivatives. This returns another vector or list.

This is great in principal, but in practice, with indexing, odd return objects, and difficulties in using multiple arguments, it can get out of hand for complex functions. Hence, one often resorts to a for loop.

Let me give you an example. Let’s say I have some data from a simple experiment where I wanted to look at the effect of adding urchins, lobsters, or both on a diverse community of sessile invertebrates – mussels, squirts, etc. Heck, let’s say, I had one gazillion species whose responses I was interested in. Now let’s say I want a simple summary table of the change in the density each species – and my data has before and after values. So, my data would look like this.

Urchins Lobsters Before.After Replicate Species Density
Yes No Before 1 1 30
Yes No After 1 1 10
Yes No Before 1 2 45
Yes No After 1 2 23
.
.
.
.
.
Yes Yes Before 15 Gazillion 10
Yes Yes After 15 Gazillion 9

So, each row represents the measurement for one species in one replicate either before or after the experiment. Now, previously, to get an output table with the mean change for each species for each treatment, I would have had to do something like this:

#first, create blank list of vectors waiting to be filled
#as well as blank vectors to remember what treatment 
#combinations we're looking at
mean.list<-list()
urchin.vec<-vector()
lobster.vec<-vector()
for(i in 1:gazillion) mean.list[[paste("sp",i,sep=".")]]==vector()

#then, a for loop for each combination
for (i in levels(my.data$Urchins)){
  for (j in levels(my.data$Lobsters)){
    urchin.vec<-c(urchin.vec,i)
    lobster.vec<-c(lobster.vec,j)
    #get the right subset of the data
    sub.data<-subset(my.data, my.data$Urchins==i & my.data$Lobster==j)

    #now loop over all species
    for (k in 1:gazillion){ 
    sub.data<-subset(sub.data, sub.data$Species==k)
    before.data<-subset(sub.data, sub.data$Before.After=="Before")
    after.data<-subset(sub.data, sub.data$Before.After=="After")
    mean.list[[paste("sp",i-3,sep=".")]]<-
	c(mean.list[[paste("sp",i-3,sep=".")]], mean(after.data[k]-before.data[k])

    }
  }
}

#then turn it into a data frame to match back up to the right values
mean.data<-as.data.frame(mean.list)
mean.data$urchins<-as.factor(urchin.vec)
mean.data$lobsters<-as.factor(lobster.vec)

Ok, did that exhaust you as much it did me? Now, here's how to get the same result using ddply (more on why it's called ddply in a moment)

#a function where, if you give it a data frame with species 1 - gazillion
#will give you the mean for each species.  Note, the which statement
#lets me look at before and after data separately
#Also note the Change= in the return value.  That gives a new column name
#for multiple return columns, just return a vector, with each 
#new column getting it's own name

tab.means<-function(df){ return(Change = 
			df$Density[which($Before.After=="After")]
			- df$Density[which($Before.After=="Before")]) }

#now use ddply
mean.data<-ddply(df, c("Urchins", "Lobsters", "Species"), tab.means))

#ok, in fairness, each species doesn't have it's own column.  But, we can 
#just use reshape
mean.data<-reshape(mean.data, timevar="Species", idvar=c("Urchins", "Lobsters", direction="wide)

Um. Whoah. Notice a small difference there. Now, perhaps the earlier code could have been cleaned up a bit with reshape, but still, 6 lines of code versus 18 - and most of that 18 was bookkeeping code, not something doing real useful computation.

Soooo.... Check out plyr! It will make your life a happier, more relaxed place. Enjoy!

(p.s. I'm noticing wordpress is ignoring all of my code indenting - anyone have any quick fixes for that?)

Mapping the Sasquatch

ResearchBlogging.orgI love modeling! I love modeling! Modeling will solve everything!

Let’s model the spatial distribution of Bigfoot!

WAIT, WHAT?!

Figure 1 from the paper. Foots denote sighting of Sasquatch footprints. Circles for just visual/auditory sightings. I ask, how does one know what Bigfoot sounds like?

Yes, it sounds silly, but in the current issue of the Journal of Biogegraphy, Lozier et al give us their stunning contribution Predicting the distribution of Sasquatch in western North America: anything goes with ecological niche modelling. Finally, all will be revealed. And for those wondering:

Sasquatch belongs to a large primate lineage descended from the extinct Asian species Gigantopithicus blacki, but see Milinkovich et al. (2004) and Coltman & Davis (2005) for phylogenetic analyses indicating possible membership in the ungulate clade.

They do this to prove a point – that Ecological Niche Models for determining species ranges are amazing – invaluable conservation tools, really. But if the taxonomy on the data that goes into them are shoddy (like, say, calling a Black Bear a Sasquatch), the results will be, well, interesting.

They use data on sightings (see Fig. 1 above) from… the Bigfoot Field Research Organization
and then used the latest and greatest in Ecological Niche Modeling to determine, given environmental parameters, just where does Bigfoot live? And, under current climate change scenarios, where might we find Sasquatch in the future?

So cryptozoologists take note! Here is a veritable treasure trove of information as to where to place your next tripwire camera!


Where will bigfoot be in the future after climate change? Panel A shows current Sasquatch Distribution. Panel B shows its projected distribution under climate change.

In fairness, the authors use this dubious analysis to point out that, when we have a record of species occurrences that seem tidy and orderly, we often don’t question their taxonomic validity. The output of these models, vital to some conservation efforts, will only be as good as their input. Indeed, in this case, the authors find striking overlap with the (far less frequently observed) Black Bear (yes, people report sightings of Sasquatch more than that of Black Bears). It’s a real problem, and the assessment of data uncertainty is a real pressing issue for any method that attempts to draw inference from sparse data.

But, really, in the end, this is an Ig-Nobel award winner in the making. Bravo.

Lozier, J., Aniello, P., & Hickerson, M. (2009). Predicting the distribution of Sasquatch in western North America: anything goes with ecological niche modelling Journal of Biogeography DOI: 10.1111/j.1365-2699.2009.02152.x