Day 1: Back in the field!

Well, today was it. After a 2 year hiatus behind my computer screen, my skin now softer and more delicate from its soothing rays, I hopped back in the water FOR SCIENCE! Not only that, but it was my return to dry suit diving. Oh, dry suit diving. I really didn’t miss you and your weird buoyancy tricks (note to self, get ankle weights, as old ones have desintegrated).

It was a great day, despite feeling kinda like the first day back at school. Ted (lab manager) and I came out to SML to work with a fantastic undergrad I’m advising for the summer, as well as do some site scouting of our own. The weather report said that things would turn awful in the afternoon/early evening (and was right), so, a morning dive on the backside of Appledore was just the thing.

This truck is loaded WITH SCIENCE!

This truck is loaded WITH SCIENCE!

Ted ready to go

Ted ready to go

We were there to help Sarah census the invasive Heterosiphonia japonica to look at its distribution with respect to a depth and exposure gradient (see also this kick butt paper by Christine Newton et al.). And the backside of Appledore is quite exposed. The transects we were sampling are next to a spot in the intertidal that got wiped bare by this winter’s storms.

Photo by Ted of Sarah (L) and I about to plop down our first transect.

Photo by Ted of Sarah (L) and I about to plop down our first transect.

What intrigued me about the transects was two things. I remember this area being super-kelpy. Indeed, the wave exposure limited Codium back when that alga was the big bad (I’ve heard it’s quite rare here, and in a quick checkout at The Cribs, I only saw 3 individuals). But now…well, there are some kelps, but, mostly, it’s red and brown filamentous algae. A lot of Desmerestia viridis to be sure, but a lot of red and brown puffiness. That, and a lot more Chondrus. I mean, we were fairly shallow, but, still. Interesting.

Photo by Sarah of one of her transects

Photo by Sarah of one of her transects

Not a lot of Heterosiphonia, though. It’s going to be interesting to see how Sarah’s project turns out. Some sites have been heavy with it, some light.

I’m just glad to be back in, and seeing how the place has changed. Steneck’s Flips & Locks paper was an interesting update on how this system is doing. I’m going to be curious to browse around and try and take the big whole community perspective I honed at the SBC LTER and see what there is to see here.

A great photo by Sarah of a school I totally missed.

A great photo by Sarah of a school I totally missed.

Living a Dream: Back to SML

So, today, I’m going to catch a ferry out to the Shoals Marine Lab. I’m just going to be out for a day to meet with the undergrad intern I’m mentoring. I’ll be back later this summer to work with her and setup some permanent monitoring transects.

I have to be honest, this is one of those moments in my life where I am watching a dream come through. I went to SML in the summer of 1999 to take some classes. It changed everything for me. I cannot recount the number of paths that opened up due to that summer than I have run down, higgledy-piggledy. To return now as a mentor and researcher? To have the chance to really learn the secrets of the sea around the Isles of Shoals? To come back with new eyes after a decade of developing as a scientist? I’m having a hard time expressing my excitement and joy.

Rather than kvell any more about the place and my excitement, here’s a video from the participants of the Underwater Research Course at SML (which, also, totally formative to who I am as a scientist – thanks, Jim!) I think it conveys a lot of what I could say, but in images and video that’s much more telling.

Best Peer Review Experience Ever

So, I recently submitted a piece regarding the future of scholarly publishing in Ecology & Evolutionary Biology. Simultaneous to posting, I put up a preprint in PeerJ Preprints and also put it on Google Docs for line by line commentary (which you are welcome to give!). I asked in both places that commenters identify themselves, unless they felt deeply uncomfortable.

OMG the experience has been amazing!

At PeerJ you can comment on the main page of the article, and others can rate it – which is fantastic – and I’ve gotten some wonderful feedback there (thanks Lars!)

The Google Doc experience has been even more fascinating, given the ability to put in line by line comments.

One of our reviewers is using the Google Doc for their comments. It has made it easy to see what they are saying, respond to things that I think are relevant (or I’ll just change some of the text in the next draft for bigger changes), and have an interactive experience with the reviewer. It absolutely fabulous.

I’ve been really fascinated by the idea of how collaboration can improve peer review ever since reading Leek et al.’s 2011 piece Cooperation between Referees and Authors Increases Peer Review Accuracy. I’m delighted that one of our reviewers has embraced that ethos, and in so doing, I can see how this will really help with future publications if not just Ross Mounce, but everyone embraced this model. Very cool!

A Preprint Experiment: Four Pillars and a Foundation for the Future of Scholarly Publishing

x-post from the OpenPub Project blog

So, we got together, had two working group meetings to discuss the future of scholarly publishing in Ecology, Evolutionary Biology, and the Earth and Ocean Sciences. What were were thinking that entire time?

We’ve just submitted a piece that brings together our broad ideas (some of which have been seen before), but, simultaneous to publication, we’ve also decided to put up a preprint. Why? Simply put, immediate access is one of our four pillars of the future of scholarly publication. Once you feel something is ready for public consumption, put it out there! We’ve been delighted to watch the evolution of PeerJ Preprints, so we’ve placed our piece there.

Byrnes et al. (2013) The four pillars of scholarly publishing: The future and a foundation. PeerJ PrePrints 1:e11

This immediate access to the piece goes hand in hand with another of our four pillars. Open Review. We want to know what you think. And now. We hope you give us feedback over at the preprint. Or, if you want to give us more detailed annotated comment, we’ve put it in a comment-open Google doc. Highlight something you disagree with. Argue with us. We welcome it! We’d ask that you put your name with the comment. We want a discussion, as discussion will improve this manuscript and help us shape our argument rather than just one-way commenting. This will also allow *you* to get full recognition for your comments, and we will include this in future acknowledgements.

So, enjoy the piece – our commentary is not a straight experiment-analysis-discussion piece, but rather part of a broader ecosystem of scholarly products that we feel are important to get out there. We look forward to hearing what you think of the piece!

Favorite Wave Sensor?

Screen Shot 2013-04-23 at 3.50.12 PM

So, Internet, I’m setting up a number of monitoring sites this summer. I’m hoping to get good wave height measurements from them to look at disturbance. I have yet to find something like the CDIP swell height model for the Gulf of Maine (although, I’d love to hear that this is due to a failure of my google-fu). So, I’m casting about for some good wave height sensors.

The problem I face is that I’d like to deploy a lot of them, and in some highly variable conditions, secured to subtidal transects. Possibly for 6 months. Or maybe even up to a year. So, I’m trying to see if a lower cost smaller profile solution is available. Something like a tidbit for wave heights.

I’ve found a few that kinda sorta fit the bill, although not quite. There’s the venerable SeaBird, products from RBR, the SEAGUARD tide and wave recorder, and the MIDAS from Valeport.

I’m a little worried that this might be too large or expensive given the conditions I’m deploying in. Or I might be asking for a pipe dream.

Internet – any thoughts, recommendations, or experiences you’d care to share? Am I being ridiculous?

More on Bacteria and Groups

Continuing with bacterial group-a-palooza

I followed Ed’s suggestions and tried both a binomial distribution and a Poisson distribution for abundance such that the probability of a density of one species s in one group g in one plot r where there are S_g species in group gis

A_rgs ~ Poisson(\frac{A_rg}{S_g})

In the analysis I’m doing, interesting, the results do change a bit such that the original network only results are confirmed.

I am having one funny thing, though, which I can’t lock down. Namely, the no-group option always has the lowest AIC once I include abundances – and this is true both for binomial and Poisson distributions. Not sure what that is about. I’ve put the code for all of this here and made a sample script below. This doesn’t reproduce the behavior, but, still. Not quite sure what this blip is about.

For the sample script, we have five species and three possible grouping structures. It looks like this, where red nodes are species or groups and blue nodes are sites:

Screen Shot 2013-04-12 at 4.32.50 PM

And the data looks like this

  low med high  1   2   3
1   1   1    1 50   0   0
2   2   1    1 45   0   0
3   3   2    2  0 100   1
4   4   2    2  0 112   7
5   5   3    2  0  12 110

So, here’s the code:

And the results:

> aicdf
     k LLNet LLBinomNet  LLPoisNet   AICpois  AICbinom AICnet
low  5     0    0.00000  -20.54409  71.08818  60.00000     30
med  3     0  -18.68966  -23.54655  65.09310  73.37931     18
high 2     0 -253.52264 -170.73361 353.46723 531.04527     12

We see that the two different estimations disagree, with the binomial favorint disaggregation and poisson favoring moderate aggregation. Interesting. Also, the naive network only approach favors complete aggregation. Interesting. Thoughts?

Groupapalooza: Adapting Food Web Trophic Group Methods for Defining Bacterial “Species”

The following is some notes on a technique I’m developing for a cool collaboration between me, Jen Bowen, and David Weisman. I think it has some generality to it, and I’d love any feedback from the more mathematical crowd…I also wrote it to make sure I knew what I was doing – translating scribbled equations to code to results – so it does freeflow a bit. It may change based on feedback – consider this a working document.

So. Away we go.

What do food webs and determining the identity of bacterial species based on sequences and co-occurrence data have in common? How can bacterial ‘species’ advance basic food web research?

Networks. And AIC scores.

Let me explain.

I’ve long been a huge fan of Allesina and Pascual’s 2009 paper on deriving trophic groups de novo from food web networks. In short, they say that if you have a simple binary network (a eats b, or a doesn’t eat b), you can use information theory to determine trophic groups within a network. I’ve applied their methods in the past to kelp forests, and seen some interesting things, andEd Baskerville has a great paper on using the technique for Seringetti food webs.

So how does this connect to bacteria?

I’m working on an analysis where my collaborators have surveyed bacterial communities at a number of different sites. We want to know the abundance of different species at different sites. However, how to define a bacterial ‘species’ is a tricky question. OK – let me poorly explain my understanding of bacterial taxonomic definitions (don’t kill me, Jen!) Let’s say you amplify and sequence a sample. You may get a number of different representative sequences from that sample. And you can get a measure of the abundance of each sequence type.

Now, on to species – looking at any pair of sequences (looong sequences of many base pairs), you may find two that are, say, one base pair different from each other. Are these two ‘sequences’ independent species or not? What if they differed by 2 base pairs? What about 3? 4? Now, a researcher can define an ‘operational taxonomic unit’ or OTU by all sequences that are X% different from each other – and X is up to them. Thus, once you define your percent similarity, you can sum up all of the species in each OTU, and get the abundance of each “species” in each plot.

This is somewhat unsatisfying. I mean, what if you had two sequences that were 98% similar, but all of sequence A was in one plot, and all of sequence B was in another plot. Now you tell me – is this one species or two?

Let’s take that one step further. Let’s suppose A and B are both in a plot. But sequence A has 10x the abundance of sequence B. Furthermore, in a second plot, both are present, but sequence B is 10x more abundant. Again, one species or two?

The approach I want to lay out here answers this using a slight modification of Allesina and Pasqual’s framework. Namely, we’re going to look at patterns of association, sequence similarity, and abundances to define OTUs.

The Association Part
At the core of Allesina and Pasqual’s framework is the following observation. Let’s say you are dealing with a food web. You’ve got all sorts of directed connections of species A eating species B. Now, let’s say you want to define two trophic groups. Definitions of predator, prey, etc., are not important here. Just that in each group, you’ll have one set of species that eats species in the other group, and vice-versa. Like in this diagram:

Screen shot 2013-04-08 at 3.07.42 PM

So far, so good, yes? Now, the question is, which of these is a better is a better descriptor of the structure of the network, after penalizing for complexity. I.e., we want a general schema. Is the amount of information lost by grouping things a-ok, given that we’ve reduced the complexity of out model of how the world works?

A&P derive a wonderful formula for this. It involved two pieces. First, for each A -> B connection between groups we’ve made, we can derive a probability of producing that particular graph with those species assigned into exactly those groups. L(ab) is the number of links going from species in A to species in B, and S(i) is, say, the number of species in group i. If we define p(ab) as L(ab) / [S(a)S(b)]. The probability of a given link in the network – say, A -> B – given p(ab) can be defined as

p(network | p(ab) = p(ab)L(1-p(ab))^S(a)S(b) – L

Which implies that the likelihood of p(ab) given the network is the same.

Likelihood (p(ab) | network)) = p(ab)L(1-p(ab))^S(a)S(b) – L


Log-Likelihood = L*log(p(ab)) + (S(a)S(b) – L)*log(1-p(ab))

Cool, right?

Let’s call one of those LLs, L(a->b). Now, the Log-Likelihood of a given network configuration with groups is just

LL(all p(ij) | whole network) = LL(a->b) + LL(b -> a) + LL(a -> a) + LL(b -> b)

where LL(a->b) is one of those log-likelihood calculations above. We’ll call this LL(network) for future use.

Now, what about this comparison and penalty for complexity? Here’s where things get even better. We know that there are S total species, and k^2 probabilities, where k is the number of groups. So, voila, we have an AIC for a group structure’d network

AIC = -2 * LL(network) + 2S + 2k^2

and as each AIC for each configuration captures information about information lost by a particular network, we can directly compare different grouping schemas. Note that the AIC for the baseline network is just 2S + 2K^2.

So what does this have to do with bacteria?!?!

OK, ok, hold your horses. Let’s think about sequences and their associations with a site as a link. Let’s consider both sequences and sites as nodes in a network. So, if one sequence associates with one site, that’s a directed link from sequence to site. It’s a bipartite graph. Now, instead of searching through all possible group structures, our groups are defined by OTUs that are created from different levels of sequence similarity. We can calculate the LL for each group -> site association the same as we calculated the LL for A -> B before. The difference is, however, that there are fewer probabilities over the whole network. Instead of there being k^2 probabilities, there are k*r where r is the number of replicate plots we’ve sampled. So

AIC = -2 * LL(OTU network) + 2S + 2k*r

The beauty of this approach is that instead of having to search through group structures, we have 1 grouping per degree of sequence similarity. Granted, we can have tens of thousands of groups, so, it’s still a moderately heinous calculation (go-go mclapply!), but it’s not so bad.

But, what about that abundance problem?

So, until now, I’ve been talking about binary networks, where links are either 1 or 0. As far as I know, no one has derived a weighted-network analog of the A&P approach. On the other hand, here, our network weights are real abundances. Because of this, we can calculate a Likelhood of species with some set of abundances in a plot being part of the same group. Then,

LL(OTU group A -> 1 Plot) = LL(network) + LL(sequences having the observed pattern of abundances in that plot if they are in the same group)

I’m making this jump from the
probability of species in one group being in that group and connecting to one plot = probability of species connecting to plot * probability of species having that pattern of densities.

p(network & abundance) = p(network) * p(abundances)

OK, so, how to we get that p(abundances) aka L(parameters | observed abundances)?

I’m going to throw out a proposal. I’m totally game to hear others, but I think this is reasonable.

If two sequences are indeed the same OTU, they should respond in similar ways to environmental variation. Thus, you should have an equal probability, if you were to sample random individuals from a group in one plot, of drawing either species. So, in the figure below, on the left, the two sequences (in red), even though they both associate with this one site, are different OTUs. Or, rather, it is highly unlikely they are from the same OTU. On the right, they are likely from the same OTU.

Screen shot 2013-04-08 at 3.07.47 PM

This is great, as we now have a parameter for each group-plot combination: the probability of drawing and individual with one of the sequences within a group. And we’re defining that probability as 1/number of sequences in a group. It’s rolling a dice. And we’re rolling it the number of times as we have total ‘individuals’ observed. So, for each sequences, we have a probability of drawing it, and a number of dice rolls…and we should be able to calculate a p(sequence | p(i in j in plot q)) which is the same as Likelihood(p(i in j in plot q) | sequence). I’ll call like Likelihood(abundance ijq). Using a(iq) as the abundance of species i in plot q and A(jq) is the abundance of all species in group j in plot q and S(jq) is the number of sequence types in group j in plot q

Likelihood(abundance ijq) = dbinom(a(iq) | size=A(jq), p=1/S(jq))

Log that, sum over all species in all plots, and we get LL(abundance).

We’ve added 2*k*r more parameters, so, now,

AIC = -2 * LL(OTU network) -2 * LL(OTU abundances) + 2S + 4k*r

Aaand…. that’s it. I think. We should be able to use this to scan across all OTU structures based on sequence similarity, calculate an AIC for each, and then use the OTU structure with the smallest AIC as our ‘species’.

Now, we could of course add additional information. For example, what if we knew some environmental information about plots, etc. We could probably use that to create groups of plots, rather than just use individual plots.

I also wonder if this can be related to a more general solution for weighted networks, and get back to A&P’s original formulation for food webs. Perhaps assuming that all interaction strengths are drawn from the same distribution with the same mean and variance. That should do it, and be relatively simple to implement. Heck, one could even try different distributional assumptions.

Allesina, S. & Pascual, M. (2009). Food web models: a plea for groups. Ecol. Lett., 12, 652–662. 10.1111/j.1461-0248.2009.01321.x

Baskerville, E.B., Dobson, A.P., Bedford, T., Allesina, S., Anderson, T.M. & Pascual, M. (2011). Spatial Guilds in the Serengeti Food Web Revealed by a Bayesian Group Model. PLoS Comp Biol, 7, e1002321. 10.1371/journal.pcbi.1002321


Everyone has been pretty shocked by the devastation wreaked by Sandy. Here in New England, we also got a Nor’easter following a few days later. That’s a lot of intense storm action in a short period of time.

So I was quite curious as I ventured out into the field last weekend to see how things looked. I went on a potential field site scouting trip to UMB’s field station in Nantucket. Nantucket of course got a good dose of Sandy, although it largely passed southwest. The Nor’Easter may have been worse.

What I found while just walking about on the shoreline was pretty incredible. It was Scallapocalypse.

Let me include a video here of what one saw looking across the beach so you can get a sense of what was going on.

This was taken in Madaket. It was a bit more dramatic in other parts of the island – because scallop fishermen had come on shore, scooped up the scallops (many of which were the seed for next year, and too small for now) and taken them back out to the scallop grounds. Here’s what things looked like by the lab.

All over, the scallop grounds had come to shore.

But the huge flux of biomass onto shore was impressive. And it wasn’t just scallops, but a ton of seagrass as well, much of which was matting over fringing salt marshes.

Still, the huge amount of energy and nutrients coming into the shoreline ecosystem driven by storms gave me a lot of pause. I mean, those scallops that weren’t saved did end up in the coastal foodweb. Birds were definitely looking fat and happy, and we’d find piles like this with flocks of birds nearby:

The whole thing really got my brain going, with two big questions

1) So, what is the fate of all of this influx of stuff into the shoreline? How will the influx of energy alter the structure and dymaics of the food web? Will the smothering of the marsh matter? It is winter, when things are slower. How quickly will everything be decomposed? Will the effects be lagged until the springtime? Or will they affect the system now? I think of Gary Polis’s work on how food web structure is shaped by the influx of energy on small islands. I know this is a BIG island, but, still, the point stands, this is a big flux of biomass and nitrogen. And it’s not just plant matter, but animal protein.

2) How will climate change alter the frequency of this subsidy? What would the consequences of a regime with regular small subsidies and occasional big ones versus regular big subsidies be? This stems largely from my thinking about the increase in the size of the ‘largest storm of the year’ in California coastal systems that’s been the basis of my previous work. But, models and analysis from the Knutson group seem to show that, while hurricanes and cyclones in the Atlantic aren’t getting more frequent, the size of each one is getting bigger. So, similar pattern. If small subsidies are coming in every year now due to the occasional passing hurricane or Nor’easter, but the size of those same storms in the future is going to get larger, then having this kind of big Scallapocalypse/subsidy could get more frequent. Particular as northern Atlantic waters get warmer (which they are – Nixon 2004), this could be an interesting and perhaps not so well investigated climate effect – the increased strength of coupling between marine and terrestrial food webs.

Oh, and random 3) What role will invasive algae play in increasing the impacts of storms on the amount of material coming on land? This may lead nowhere, but I noticed a lot of material (not scallops) that had washed on land had the invasive Codium fragile attached to it. I know that subtidal kelps can do this to mussels as well (Witman’s work), but there’s no kelp here. Is Codium becoming a drag (har har) and increasing the energy and nutrient flow from sea to land?

All in all, an interesting trip with a lot to chew on for future research. And a great setting!

Knutson, T. R., J. L. McBride, J. Chan, K. Emanuel, G. Holland, C. Landsea, I. Held, J. P. Kossin, A. K. Srivastava, and M. Sugi. 2010. Tropical cyclones and climate change. Nature Climate Change 3:157–163.

Nixon, S. W., S. Granger, B. A. Buckley, M. Lamont, and B. Rowell. 2004. A one hundred and seventeen year coastal water temperature record from Woods Hole, Massachusetts. Estuaries 27:397–404.

Polis, G. A., and S. D. Hurd. 1995. Extraordinarily high spider densities on islands: flow of energy from the marine to terrestrial food webs and the absence of predation. Proceedings of the National Academy of Sciences, USA 92:4382–4386.

Polis, G. A., W. B. Anderson, and R. D. Holt. 1997. Toward an integration of landscape and food web ecology: the dynamics of spatially subsidized food webs. Annual Review of Ecology and Systematics 28:289–316.

Witman, J. D., and T. H. Suchanek. 1984. Mussels in Flow – Drag and Dislodgement by Epizoans. Marine Ecology Progress Series 16:259–268.