Introducing the Open Derby (guest post)

Ensuring openness and reproducibility in ecology is an ongoing challenge, although open tools (e.g., R Studio, Github) are free to use, and user support is everywhere. To illustrate how ecology collaborations can fit within an open science framework, we had the idea of running an ‘Open Derby’. Similar to a research derby, where grad students and post-docs from different academic departments and universities collaborate together to solve a conservation problem, our idea extends the derby concept to working fully openly: learning version control in GitHub, writing and analysing data in RMarkdown, and publishing all project development in an open notebook. Our end goal is to examine a conservation issue using open access (OA) data, and publish our findings and code in an OA journal. Ultimately, we hope to share Open Derby with other graduate students and academics across the life sciences, with the hope that we can provide a model for switching entire lab groups into open and reproducible research machines.

If you are not familiar with open science tools, RMarkdown is a wonderful method of writing code into a manuscript that can be formatted into manuscripts, reports, and presentations. We use RMarkdown in tandem with GitHub, a service that allows version control of your data, analyses, and manuscripts, and is incredibly useful for managing input between collaborators. Learning these tools can be daunting but learning with friends makes it easier and more fun to build these skills. We are six PhD students from 4 institutions based in 3 countries, who are present or past members of Dr. Julia Baum’s marine ecology lab at the University of Victoria and have come together to demo the Open Derby concept:

James Robinson: PhD student in the Baum lab, studying macroecology of tropical Pacific coral reefs (defending April 2017) and leader of the Open Derby project

Danielle Claar: PhD student in the Baum Lab, studying the dynamics of coral symbioses in response to environmental stressors

Easton White: PhD student at the University of California, Davis. He uses mathematical and statistical techniques to address various ecological and conservation questions.

Jillian Dunic: PhD student at Simon Fraser University, studying the effect of multiple stressors on tipping points in eelgrass. (Recently graduated from Jarrett Byrnes’ lab)

Jamie McDevitt-Irwin: MSc student in the Baum lab evaluating how stressors, specifically human disturbance and bleaching stress from an El Niño event, influence the community structure of the coral microbiome (defending April 2017).

Geoffrey Osgood: PhD student in the Baum lab, using data and models to study shark populations in South Africa and on Pacific reefs

In January 2017, we met (virtually) to decide on a research question for Open Derby. Our aim was to address a question that would be accessible under an open science platform (e.g. open source data). And **SPOILER**, we failed in our first attempt. In the interests of openness, and publishing null results, here is what we did:

Our workflow

Tools used: Rmarkdown, Github, Slack, etherpad, Skype

As mentioned earlier, we used Rmarkdown and Github to manage our workflow. We met once every one or two weeks on Skype to discuss project goals and delegated tasks, taking notes on Skype calls in a publicly accessible etherpad. These meetings, combined with Github, allowed us to divide and conquer without duplicating each other’s work. The goal was to have everyone only spend 1-3 hours on the project per week (after all, we each have dissertations to write). Everyone contributed to coding in R—whether it was data cleaning, analysis, or visualizations—and writing the initial manuscript. We also used Slack to regularly chat about the project in between Skype meetings. Slack eliminates the need for endless email chains and also integrates with Github, allowing participants to be notified when a repository change has occurred. In the future, we might take notes on a markdown file that can be easily version controlled, and may make use of Github’s wiki and project planning features.

Open datasets – oil spills and fisheries catch

We extracted oil spill data from the Emergency Response Division of the National Oceanic and Atmospheric Administration (NOAA; which recorded all major spills that occurred between 1967-2016 in marine environments on the west coasts of U.S.A. and Canada (Figure 1). However, we were unable to find oil spill data from British Columbia (BC). After extensive web searches, emails and even a freedom of information request, we were told there is no existing electronic record of BC coastal oil spills.


Figure 1 Oil spill location and size across west coast of North America from 1967-2016 included in NOAA’s incident news dataset.

We extracted fisheries catch data from the Sea Around Us Project (SaU), which collates global, spatially-explicit estimates of annual fisheries catch at the national level (Pauly & Zeller 2015; Zeller et al. 2016). The SaU dataset uses international and national catch statistics to generate taxon-level catch estimates at a 0.5 degree resolution, spanning 1950-2013, and provided us with California, Oregon, Washington, BC, and Alaska fisheries catch. Yet, as promising as this data appeared at first, we were advised not use this data at spatial scales below the exclusive economic zone (EEZ) level, as data are aggregated. We also contacted Canadian and US fisheries departments to request port-level catch data.

Sadly, data resolution issues sunk our project. With fishing catch data, smaller spatial aggregations of the SaU dataset were simply EEZ level trends with catch estimates scaled downwards (Figure 2), and we did not reasonably expect to detect oil spill impacts at the EEZ level. We were also unable to locate consistent port-level data that spanned either BC, California, or Alaska. Similarly, the oil spill incidents were not consistently reported, and it was unclear how spill estimates were generated – a contact from NOAA warned us that there was no established methodology for estimating the size of an oil spill, and many reported incident volumes involve some amount of guesswork.

Figure 2 Catch estimates for USA west coast EEZ (top panel) and 3 randomly selected 0.5 degree resolution grid cells (bottom 3 panels, labelled with lat-lon centroid of grid cell) from 1970-2010.

Beyond data issues, the ecological impact of an oil spill is highly context-dependent, and will be affected by the environmental conditions, clean-up response, and type of oil contaminant. As a result, our understanding of oil spill impacts on marine ecosystems is, sadly, limited to large, catastrophic events. We urge more openness and data collection of spill characteristics, so that researchers may begin to quantify the cumulative impacts of smaller, more frequent spills on marine ecosystems.

Future Work

You can find our ideas, data, and progress in our public Github repository. We welcome any suggestions that might help us to resurrect this project, and hope to inspire others to write up their failed projects. Our project lead – James Robinson – is currently developing Open Derby for general consumption as a member of Mozilla’s Open Science Leadership training programme.

But, we are now brainstorming conservation questions for our next attempt. Ecologists, please, send us your unanswered conservation problems, unanalyzed datasets, and suggestions for advancing open ecology! This time we will each bring a dataset and question to the brainstorming session. We will let you know how we progress!

Which library to use to teach linear models & likelihood

I’ve been teaching likelihood to grad students for the past few years using the wonderful bbmle package from Ben Bolker. I really love it’s flexibility and the way you can get students into fitting anything their imagination can countenance. But there’s a small cost to that. First, their imagination, when they first meet likelihood, is….not ready. I teach linear models using least squares, then likelihood, then Bayes, in order to show the full range of inferential possibilities. In the past, when I got to likelihood not only did I showcase the possibilities for fitting a straight line with a normal error distribution, but I also inserted the possibility of other error distributions and at least hinted at the possibility of other shapes of lines.

I also taught them the canned “formula” interface to the mle2 function as well as how to write a function for a model – thus introducing them to the wild world of optimizers.

In this year’s iteration, I’ve been trying to pair a lot back, and move from teaching loops and the like to teaching dplyr and a more functional programming style. We hadn’t even hit functions by the time we hit likelihood (and didn’t ever – but that’s partially due to this being v 2.0 of my intro class, and functions will be hit in v 2.1). So I only taught straight lines with a normal error distribution using the formula interface.

It…went OK. Given that we move next into rstanarm for Bayes, the lack of a common syntax with both least squares and our Bayesian efforts was a bit more jarring this year. Particularly given the need to supply start values, coefficient names, and the issues one quickly runs into when you need to put limits on parameters, like standard deviations.

So in thinking about what to do, I went back to glm as an easier way to introduce fitting linear models with likelihood – particularly as then it’s an easy slide to stan_glm. But bbmle provides some nice tools – it’s implementation of profile is so straightforward – although I make students do that by hand anyway. In glm, the resulting profile plots are kinda wonky and take too much explanation.

But…. then someone pointed out the offset argument that lets you fix. So, let’s say you have the following data frame.

adf %
  mutate(y=rnorm(100, 5*x, 90))

You can fit this with glm and check out the coefficients

summary(glm(y ~ x, data=adf, family=gaussian())

But, what if you want to get the profile likelihood for the intercept? Using offset, you can fix a coefficient value. Say, to 1 or 2, for the slope.

glm(y ~ 1, offset=2*x, data=adf)
glm(y ~ 1, offset=1*x, data=adf)


Now, we can wrap this in with dplyr. I’m going to do this inefficiently (fit the model twice) as I’m trying to figure out how to do this without map. Which also was not a huge success – I need to give it more time. Functions and map. These are what needs to be hit next year a bit more solidly.

### Fit a bunch of slope coefficients to make a profile
### for the intercept
coefFrame %
  rowwise() %>%
  mutate(int = coef(glm(y ~ 1, offset=b1*x, data=adf)),
         LL = logLik(glm(y ~ 1, offset=b1*x, data=adf))[1])

plot(LL ~ b1, data=coefFrame)


Works great!

Now we can get the upper and lower 95% CI.

critVal %
  filter(LL>critVal) %>%

Again, straightforward.


Given that I’m not using the full power of bbmle, and getting into the nitty gritty of algorithms, etc., slows things down – and that by jettisoning that weight I think I can fill it with a jump right into AIC rather than wait until we hit with MLR, this simple example makes me thing that next year… yeah, it’s time to introduce the GLM function sooner and leave bbmle for semester 2.

Note, you can do this with lm and an offset argument as well. But, by introducing, as I ponder this, I think that perhaps we can begin to think about non-normality earlier? Maybe? I need to ponder this.

Code in

Linear Model Power Analysis with Dplyr

I really liked the power analysis example I gave in class today. I feel like I’m finally firing on all cylinders with the tidyverse. So I thought I would post it here, in case it was of use to anyone!


OK, we can fit and evluate linear regression models, but what about their power? Let’s look at our seal fit again – maybe we want to know what would have happened with a lower sample size?

Note, I’m going to use a knitr function, kable to neaten up the table for markdown. Try it in your own homeworks!

term estimate std.error statistic p.value
(Intercept) 115.7667505 0.1763751 656.3668 0
age.days 0.0023706 0.0000447 53.0645 0

What about the residual SD?

r.squared adj.r.squared sigma
0.2256493 0.2255691 5.6805

All right – there are our target values. Now, we can change a lof ot things. The effect size (slope), range of values, sigma, and more. But let’s try sample size.

1 Make a Table of Parameters, some of which vary

The first step of a power analysis is to setup a data frame with all of the different possibilities that you might want to assess for power. For linear regression, we have the following parameters, all of which might vary:

  1. Slope
  2. Intercept
  3. Residual variance
  4. Sample Size
  5. Range of X values

To do a power analysis via simulation for a linear regression, we begin by building our simulation data frame with the parameters and information that varies. In this case, sample size.



simPopN <- data.frame(slope = 0.00237, 
                      sigma = 5.6805,

Note, if we wanted to vary more than one parameter, we’d first create a data frame where only one parameter varied, then add a second that varried using crossing in tidyr, like so:


simPopN <- data.frame(slope = 0.00237, 
                      sigma = 2:6) %>%
2 Expand to have a number of rows for each simulated sample with each parameter combination

OK, once we have everything in place, including sample size, we need to expand this out to have some number of samples for each n. For that, we can use the function in tidyr (same library as crossings), expand().


simPopN <- simPopN %>%
  group_by(slope, intercept, sigma, n) %>%
  expand(reps = 1:n) %>%
3 Expand to create repeated simulated data sets for each combination of parameters

Now, if we want to simulate each of these, say, 100 times, we need to assign unique sim numbers, so for each n and sim number we have a unique data set. We can use crossing to replicate each combination of variables above some number of times. Note – 100 is really low, but it doesn’t eat your processor! Use low numbers of simulation for development, then crank them up for final analysis.

simPopN <- simPopN  %>%
  crossing(sim = 1:100)
3 Simulate the data

Great – almost ready to go! Now we just need to add in fitted values. Fortunately, as rnorm() works with vectors, we can just use a mutate here. We’ll also need to simulate random draws of ages, but that’s just another random number.

simPopN <- simPopN %>%
  mutate(age.days = runif(n(), 1000, 8500)) %>%
  mutate( = rnorm(n(), intercept + slope*age.days, sigma))

Yatzee! Ready to run!

4 Fit models and extract coefficients

First, we now need to generate a lot of fit models. Dplyr doesn’t take too kindly to including fit things, so, we can use two powerful functions here – first, nest and unnest() allow us to collapse grouped data down into little pieces and re-expand it.

fits <- simPopN %>%
    group_by(slope, intercept, sigma, n, sim) %>%

## # A tibble: 9,100 × 6
##      slope intercept  sigma     n   sim              data
##      <dbl>     <dbl>  <dbl> <int> <int>            <list>
## 1  0.00237   115.767 5.6805    10     1 <tibble [10 × 3]>
## 2  0.00237   115.767 5.6805    10     2 <tibble [10 × 3]>
## 3  0.00237   115.767 5.6805    10     3 <tibble [10 × 3]>
## 4  0.00237   115.767 5.6805    10     4 <tibble [10 × 3]>
## 5  0.00237   115.767 5.6805    10     5 <tibble [10 × 3]>
## 6  0.00237   115.767 5.6805    10     6 <tibble [10 × 3]>
## 7  0.00237   115.767 5.6805    10     7 <tibble [10 × 3]>
## 8  0.00237   115.767 5.6805    10     8 <tibble [10 × 3]>
## 9  0.00237   115.767 5.6805    10     9 <tibble [10 × 3]>
## 10 0.00237   115.767 5.6805    10    10 <tibble [10 × 3]>
## # ... with 9,090 more rows

Second, the map function in the purrr library allows us to iterate over different levels or grouped data frames, and perform some function. In this case, we’ll fit a model, get it’s coefficients using broom. This is a new weird set of functions. What’s odd about map is that the first argument is a column. But that argument, for the rest of the arguments of the function, is now called . and we also use the ~ notation. What ~ does is says that, from this point forward, . refers to the first argument given to map.

In essence, what map does is iterate over each element of a list given to it. Once we nest, the data column is now a list, with each element of the list it’s own unique data frame. So, map works with lists to apply a function to each element. The output of the model fitting is another list – now called mod. A list of models. We then iterate, using map over that list to generate another list of data frames – this time of coefficients.


fits <- fits %>%
    mutate(mod = map(data, ~lm( ~ age.days, data=.))) %>%
    mutate(coefs = map(mod, ~tidy(.)))

## # A tibble: 9,100 × 8
##      slope intercept  sigma     n   sim              data      mod
##      <dbl>     <dbl>  <dbl> <int> <int>            <list>   <list>
## 1  0.00237   115.767 5.6805    10     1 <tibble [10 × 3]> <S3: lm>
## 2  0.00237   115.767 5.6805    10     2 <tibble [10 × 3]> <S3: lm>
## 3  0.00237   115.767 5.6805    10     3 <tibble [10 × 3]> <S3: lm>
## 4  0.00237   115.767 5.6805    10     4 <tibble [10 × 3]> <S3: lm>
## 5  0.00237   115.767 5.6805    10     5 <tibble [10 × 3]> <S3: lm>
## 6  0.00237   115.767 5.6805    10     6 <tibble [10 × 3]> <S3: lm>
## 7  0.00237   115.767 5.6805    10     7 <tibble [10 × 3]> <S3: lm>
## 8  0.00237   115.767 5.6805    10     8 <tibble [10 × 3]> <S3: lm>
## 9  0.00237   115.767 5.6805    10     9 <tibble [10 × 3]> <S3: lm>
## 10 0.00237   115.767 5.6805    10    10 <tibble [10 × 3]> <S3: lm>
## # ... with 9,090 more rows, and 1 more variables: coefs <list>

Last, we cleanup – we unnest, which takes list-columns from above, and expands them out as into full data frames that get slotted back into our original data frame. Nice trick, no?

We’ll also filter for just the slope coefficient.

fits <- fits %>%
  unnest(coefs) %>%
  ungroup() %>%
  filter(term == "age.days")

## # A tibble: 9,100 × 10
##      slope intercept  sigma     n   sim     term     estimate    std.error
##      <dbl>     <dbl>  <dbl> <int> <int>    <chr>        <dbl>        <dbl>
## 1  0.00237   115.767 5.6805    10     1 age.days 0.0004283944 0.0010318274
## 2  0.00237   115.767 5.6805    10     2 age.days 0.0021239104 0.0005736028
## 3  0.00237   115.767 5.6805    10     3 age.days 0.0033531879 0.0009399613
## 4  0.00237   115.767 5.6805    10     4 age.days 0.0032566962 0.0008342080
## 5  0.00237   115.767 5.6805    10     5 age.days 0.0028332982 0.0009062293
## 6  0.00237   115.767 5.6805    10     6 age.days 0.0028501936 0.0005176723
## 7  0.00237   115.767 5.6805    10     7 age.days 0.0021677718 0.0004457464
## 8  0.00237   115.767 5.6805    10     8 age.days 0.0034636535 0.0006116823
## 9  0.00237   115.767 5.6805    10     9 age.days 0.0012147324 0.0012945056
## 10 0.00237   115.767 5.6805    10    10 age.days 0.0039061730 0.0011307423
## # ... with 9,090 more rows, and 2 more variables: statistic <dbl>,
## #   p.value <dbl>
5 Calculate your Power

Notice that we do indeed have p-values, so we can use these fits to get power for each sample size. We can now do our normal process – in this case grouping by sample size – to get power. And then we can plot the result!

pow <- fits %>%
    group_by(n) %>%
    summarise(power = 1-sum(p.value>0.05)/n()) %>%

qplot(n, power,  data=pow, geom=c("point", "line")) +
  theme_bw(base_size=17) +
  geom_hline(yintercept=0.8, lty=2)

6. Examples

Let’s take a look at the whole workflow, this time trying a bunch of standard deviation values.

##setup parameters
simSD <- data.frame(slope = 0.00237, 
                      sigma = seq(2:6, lengthout=5),
                      n=100) %>%
  group_by(slope, intercept, sigma, n) %>%

  #add sample sizes
  expand(reps = 1:n) %>%
  #add sims
  crossing(sim = 1:100) %>%
  #add fake data
  mutate(age.days = runif(n(), 1000, 8500)) %>%
  mutate( = rnorm(n(), intercept + slope*age.days, sigma))

##make your fits
fitsSD <- simSD %>%
    group_by(slope, intercept, sigma, n, sim) %>%
  mutate(mod = map(data, ~lm( ~ age.days, data=.))) %>%
  mutate(coefs = map(mod, ~tidy(.)))%>%
  #unnest and filter
  unnest(coefs) %>%
  ungroup() %>%
  filter(term == "age.days")

##calculate and plot power
powSD <- fitsSD %>% group_by(n) %>%
    summarise(power = 1-sum(p.value>0.05)/n()) %>%

qplot(n, power,  data=powSD, geom=c("point", "line")) +
  theme_bw(base_size=17) +
  geom_hline(yintercept=0.8, lty=2)

You Complete Me

There’s a feature that is either new or I haven’t noticed before in the new release of tidyr that is making me giddy with giddiness. The complete function.

This seems odd. Why would just one function make you giddy, Jarrett? Why not all of them?

It’s making me so happy as it solves a problem that just took a substantial degree of thought and waaaay to many lines of code to solve in a massive meta-analysis of kelp abundances I’m working on with colleagues. See, when folk go out and sample multiple species, they often behave in funny ways. Sometimes, when they don’t see any individuals of a species, they’ll mark it off as a 0. Sometimes they won’t, and will just not record even the name of the species.

The former case is great. The later case can be a pain in the touchus to fix, as all other identifying information for, say, a plot, site, year, etc. needs to be kept constant for the new blank value you are inserting. It took a few tries, but I ended up with a kludgy function that would fix this issue, but it was slow, particularly for data sets with hundreds of plots and 30-40 species (we were looking at other algae, too).

But along comes complete to make it all quick, painless, and only take a line or so of code.

So, let’s see an example of a ragged dataset where zeroes are just not included.

kelpdf <- data.frame(
  Year = c(1999, 2000, 2004, 1999, 2004),
  Taxon = c("Saccharina", "Saccharina", "Saccharina", "Agarum", "Agarum"),
  Abundance = c(4,5,2,1,8)

##   Year      Taxon Abundance
## 1 1999 Saccharina         4
## 2 2000 Saccharina         5
## 3 2004 Saccharina         2
## 4 1999     Agarum         1
## 5 2004     Agarum         8

So, Agarum wasn’t recorded in 2000. Maybe it was NA, maybe it was 0 – ask your data provider! Assuming it was 0, how do we fill it in?


kelpdf %>% complete(Year, Taxon)
## Source: local data frame [6 x 3]
##    Year      Taxon Abundance
##   (dbl)     (fctr)     (dbl)
## 1  1999     Agarum         1
## 2  1999 Saccharina         4
## 3  2000     Agarum        NA
## 4  2000 Saccharina         5
## 5  2004     Agarum         8
## 6  2004 Saccharina         2

OK, what happened there? Well, complete looked at all possible combinations of Year and Taxon. When there was a missing combination, the remaining columns got an NA.


But, as I said before, we wanted to have zeroes, not NAs. Well, for that, we have a fill argument.

kelpdf %>% complete(Year, Taxon, fill = list(Abundance = 0))
## Source: local data frame [6 x 3]
##    Year      Taxon Abundance
##   (dbl)     (fctr)     (dbl)
## 1  1999     Agarum         1
## 2  1999 Saccharina         4
## 3  2000     Agarum         0
## 4  2000 Saccharina         5
## 5  2004     Agarum         8
## 6  2004 Saccharina         2

Great! We can have different kinds of fills for different columns. This is perfect, as we often had 2-4 measurements of abundance, but only some should be zero. Others should have been NA, as they were never recorded.

But, there’s another problem complete solves. Sometimes, we deal with subsets of data – say we’ve only pulled out certain taxa – but we know that there is more data there. Let’s say in 2001-2003, these sites were sampled, but there was no Saccharina. Yes, in this subset of a larger data frame we have no rows – but we know we should!

For that, we can use the full_seq function in combination with complete. full_seq takes a vector of numbers, sorts them, and then based on the period you specify, fills out the sequence. In our case, this is 1 year. So, to fill in those zeroes…

kelpdf %>% complete(Year = full_seq(Year, period = 1),
                   fill = list(Abundance = 0))
## Source: local data frame [12 x 3]
##     Year      Taxon Abundance
##    (dbl)     (fctr)     (dbl)
## 1   1999     Agarum         1
## 2   1999 Saccharina         4
## 3   2000     Agarum         0
## 4   2000 Saccharina         5
## 5   2001     Agarum         0
## 6   2001 Saccharina         0
## 7   2002     Agarum         0
## 8   2002 Saccharina         0
## 9   2003     Agarum         0
## 10  2003 Saccharina         0
## 11  2004     Agarum         8
## 12  2004 Saccharina         2

Note we had to define (or overwrite, really) our Year column, but no worries there. Now look at all of those lovely zeroes. Ah, so nice. I wish this had been around for the last two years… so much head bashing saved.

Other Writings about Parenting & Climate Change by Scientists

OK, so, I spoke too soon. There are at least a few stabs by scientists writing about being a parent who studies climate/environmental change. As folk send me more, I’ll try and keep this list updated. While there’s some great stuff here, I feel like this is an empty niche for some great writing. Maybe you want to contribute, too? Towards that end, anyone who has a yen, I will be happy to post, repost, or link your own writings here under my science parent category.

OK, the list:

A World for my Daughter by Alejandro Frid (hat tip to Jenn Burt)

A Rational Optimist’s Reluctant Case for Panic by Tom Webb

Scaring our children with the future by CJA Bradshaw

Should Climate Change Stop Us from Having Babies? by Eric Holthaus

Parenting as a Scientist in the Age of Climate Change

An old friend with a new child asked me if I know any climate change scientists or global change biologists who are parents and have written cogently about being parents in the era of climate change. Off of the top of my head, I couldn’t think of any (prove me wrong, internet), but, heck, I’m one myself. Here I am, staring down the barrel of hitting one year since HD’s arrival.

I’ve actually had a lot of conversations about climate change and our daughter with my wife – ethically, what does it mean to have children knowing what we know about the future of their earth? How many children is it fair (to them!) to have? And within myself, I ask, what should I be doing? Both what should I do to try to make the future not suck as much as it might for her and how do I help her prepare for the world that will look so very different than the one my own parents handed me.

I’ll admit, I don’t have many good answers. Particularly not as the parent of a kiddo just shy of the end of her first year. Why is that?

One reason that climate science communication is oft said to fail is the huge scale of imagination needed to grapple with the implications of climate change in the future. Climate scientists often possess some key personality traits that help them with this understanding – moreso than the general public[1]. Add the daily insanity and of managing a tiny human and what it does to your brain, and my brain-box is just not always up to wrestling with these issues until the moment that they will become pressing. Granted, this may be due to being the parent of a babe (with the power) and scientist parents of older children might have some better answers. I hope so, as this is definitely a looming challenge that I have not yet been able to navigate with forethought.

But that kind of punting things down the road is not fair – not fair to her.

So, what are the issues to consider? Here are the ones that always rise to the top of my mind.

First – is it ethical to be having children at all? For me, that one was straightforward. Yes. I do want to pass on the legacy of the family that gave rise to me. I think that there is some great culture and history that has brought me to where I am today. And I think some of that moxie, honestly, will be useful to the world in helping navigate the way forward. I mean, damn, that’s an egotistical thing to say, but, I want my genes to sally forth and help create someone who will help heal the world in some way. The alternative is to fold, and go home. As a species. Not an option. Or see this as someone else’s kid’s problem, not mine. That’s the kind of ethical stance that got us into this problem in the first place. So, also, nope.

Second – what can I do to make a better future? Welp, as someone who studies global change biology, I hope I’m doing it. I hope that I’m building the ship of knowledge
that we need to sail forward as a species. I hope that the choices I make – personally and politically – all help improve the world and our path into the future. I don’t see myself doing anything differently or with more or less urgency than I did before. But that’s because of the values I hold about the importance of my actions for how the impact the whole world and future generations.

And finally – how do I prepare my daughter for the the world she is inheriting? Well here’s where my I feel completely inadequate to the task. I want to imbue her with the same love of the sea (and nature in general) that I have so that she appreciates the possible. I want to help her embrace the philosophy of helping to heal the world that was such a strong part of my philosophical upbringing. But, again, I would have done these things climate change or no. When do we need to talk about why this problem exists in the first place? How do I teach her about making proactive choices that lessen her own risk to the dangers of climate change? How is this different than how I would have raised her before? What are the lurking issues that I am missing, that I know my mind slips around, avoiding them because they are so large that I cannot see them when it comes to my own child? To her future, once I am gone? Should I be helping her learn more skills for self-reliance that I might not have otherwise? Should I be giving her more cynicism and skepticism of the systems around her that have failed us all, or the pragmatism to know that they can be the solution? Are there actual tangible skills or ideas I should make sure to teach her?

Or should I just muddle along as a normal parent, confronting these issues as they arise with the best judgement and sense of ethical responsibility that I can teach as I would have anyway?

As you can see, I have nothing but questions.

But in writing this, I do feel comforted that a lot of my own answers come down to just raising a pragmatic ethical kid who loves the natural world and embraces the responsibility of being part of a species that constantly seeks to better itself.

Is that enough?

Some Interesting Literature Cited
[1] Weiler, C.S., Keller, J.K., Olex, C., 2012. Personality type differences between Ph. D. climate researchers and the general public: implications for effective communication. Clim. Change. doi:10.1007/s10584-011-0205-7

Rafe Sagarin and Passion in The Life Scientific

rafeapylsiacrop-jczarkaHow to process something like Rafe Sagarin’s senseless passing? For those who never encountered him, Rafe was a wonderful marine ecologist whose body of work was as varied and eclectic as can be, from classic papers on long-term effects of climate change in Science and Ecological Monograph to books on how the natural world can help us deal with risk, like terrorist attacks, to working at Biosphere2. He’s even got a wonderful youtube channel that works, as he always did, outside of the box.

Rafe was one of those vivacious dynamic iconoclastic people that you are always delighted to encounter in life. I was fortunate to know him through the Western Society of Naturalists where his charm and vim earned him the annual slot of auctioneer to raise funds for students. I really only knew him as colleague at meetings, and yet he is someone who has served as an example for myself and so many other early career scientists.

It was his passion. Every talk I heard him give, every conversation we ever had – scientific, personal, or with others about the difficulties of being early career – every interaction I saw him have with anyone around him, all of it was infused with an infection passion. His passion for the natural world was infectious. It infused everything. It was in his voice. It was in his worldview. It was in how he shaped his talks and storytelling. It was in his damned silly shirt he wore at every WSN.

Rafe’s uncompromising passion for science and nature was an inspiration. I looked forward to seeing what he thought of next because there is no joy in the world like watching someone follow their passion. And now I will not be able to again.

But that passion. That is what I will always hold onto. Because the life scientific can get you down as it grinds forward. But holding onto our passion, our vim, our vigor, our joie de vivre, rooted in our love of the natural world – that is what can provide an endless wellspring of joy in every moment. It’s something that has helped me forge onwards in my own career and in my life in general.

Rafe will always serve as an example of that passion and the myriad of directions it can lead. I will hold onto that. I hope that his passion will long serve as an example for our whole community. It will for me.

Spicier Analyses with Interactive R Leaflet Maps

Who wants to make a kickass, public-friendly, dynamic, online appendix with a map for their papers? ME! (and you, right?) Let’s talk about a cool way to make your data sing to a much broader audience than a static image.

Last time, I mentioned that I had also been playing the Rstudio’s Leaflet package. For those who don’t know, Leaflet is a javascript library for generating interactive maps. I don’t know javascript (really), but I do know R, so this library is incredibly powerful for someone like me – particularly when paired with Rstudio’s HTMLwidgets or Shiny for publishing things.

So, what is leaflet? How does it work? Let’s start with a simple example. First, install the thing!

if (!require('devtools')) install.packages('devtools')

Now, at its core, leaflet makes maps. And then puts data on them. It uses the magrittr for pipes as syntax, which may be a bit daunting at first, but I think you’ll see that it’s all prety straightforward.

Let’s start with a super-basic simple map.

leaflet() %>% addTiles()

Oh! Map! Scrollable! Easy to use! And you can zoom down to streets. What’s going on here?

First, leaflet() starts the whole process. You can feed it arguments of data frames and such for later use. Second, addTiles says “OK, go to a map server that feeds up image tiles, and grab ’em!” It defaults to OpenStreetMap, but there are others. For example, ESRI has a tile server with satellite imagery.

leaflet() %>% 

There are a number of arguments and functions you could string after addTiles to zoom around, start in a certain area, manipulate the basic map widget in different ways, but I’d like to focus on two common applications for folk like us. First, site maps! Let’s say you had some awesome data from over 100 sites around the world, and you wanted to show it off. Maybe you wanted people to really be able to zoom in and see what the environment was like around your data (i.e., why I went and found the ESRI tile server). How would you do this?

Well, leaflet provides a few different ways to put this stuff on a map. You can add circles, markers, circlemarkers, and much more. Let’s give an example on a basic map. Note the formula interface for supplying values to latitude and longitude.

#fake some data
myData <- data.frame(Latitude = runif(100, -80,80), 
                     Longitude = runif(100, -180,180),
                     measurement1 = rnorm(100, -3, 2), 
                     Year = round(runif(100, 1975,2015)))
#map it!
leaflet() %>% 
  addTiles() %>%
  addCircles(data=myData, lat = ~Latitude, lng = ~Longitude)

Awesome! But we know there’s information in that thar data. There’s a Year and a Measurement. How can we communicate this information via our map? Well, there are a few ways. Note those circles had a color and a size? We can use that. Circles and markers are also clickable. So, we can add some additional information to our data and feed that into the map. Let’s start with a popup message.

#round for readability
myData$info <- paste0("Year: ", myData$Year, 
                      "<br> Value: ", round(myData$measurement1, 3))

Note the use of some HTML in there. That’s because leaflet displays things in a web browser, so, we need to use HTML tags for text formatting and line breaks.

OK, great. Now what about a color? This can get a bit trickier with continuous values, as you’ll need to roll your own gradients. Kind of a pain in the ass. If you just wanted one or two colors (say, positive or negative), you can just do something simple, or even feed a single value to the color argument – say, “red” – but let’s get a little fancy. The classInt library is perfect for gradient construction. Here’s an example. We’ll start with grabbing a palette and making it a color ramp.

#grab a palette

pal <- brewer.pal(11, "Spectral")

#now make it more continuous 
#as a colorRamp
pal <- colorRampPalette(pal)

Great! Now we can use classInt to map the palette to our measurement values.

#now, map it to your values
palData <- classIntervals(myData$measurement1, style="quantile")

#note, we use pal(100) for a smooth palette here
#but you can play with this
myData$colors <- findColours(palData, pal(100))

Now that we have info, colors, and, well, let’s say we want to make the circles bigger – let’s put it all together!

#Put it on a map!
newMap <- leaflet() %>% 
  addTiles() %>%
  addCircleMarkers(data=myData, lat = ~Latitude, lng = ~Longitude,
             color = ~colors, popup = ~info, radius = 8)


Very cool! With different properties for zooming, filling, sizing points, different styles of markers, and more, I think the possibilities for online apendicies for papers – or outreach objects – are pretty endless here. Go forth, ye, and play!

Also, check out the code in this gist!

MEOW! It’s Marine Ecoregions in R!

So, I’m on paternity leave (yay! more on that another day – but HOMG, you guys, my daughter is amazing!) and while my daughter is amazing, there are many hours in the day and wee morning where I find myself rocking slowly back and forth, knowing that if I stop even for a second, this little bundle of cute sleeping away in the wrap will wake and howl.

So, what’s a guy on leave to do? Well, why not learn some new R tricks that have been on my list for, in some cases years, but I have not had time to do otherwise. In particular, time to ramp up some of my geospatial skills. As, really, I can type while rocking. And need to do something to stay awake. (One can only watch Clone Wars for so long – and why is it so much better than the prequels?)

In particular, I’ve really wanted to become a more proficient map-maker. I’ve been working on a few global projects lately, and wanted to visualize the results in some more powerful intuitive ways. Many of these projects have used Spalding et al.’s 2007 Marine Ecoregions of the World classification (or MEOW) as a basis. So, wouldn’t it be cool if we could make some nice plots of those regions, and maybe fill them in with colors according to some result?

Where to start? Well, to begin, how does one get the geographic information in to R? Fortunately, there’s a shapefile over at

Actually, heck, there are a LOT of marine region-like shapefiles over there that we might all want to use for different maps. And everything I’m about to say can generalize to any of those other shapefiles!

Oh, for those of you as ignorant as me, a shapefile is a geospatial file that has information about polygons with some additional data attached to them. To futz with them, we need to first load a few R libraries

#for geospatial tools

#for some data reduction

These are wonderful tools that will help you load and manipulate your shapefiles. Note that I’ve also loaded up dplyr, which I’ve been playing with and finally learning. I’m a huge fan of ye olde plyr library, but dplyr has really upped my game, as it’s weirdly more intuitive – particularly with pipes. For a tutotrial on that, see here – and perhaps I’ll write more about it later.

OK, so, libraries aside, how do we deal with the file we get at Well, once we download and unzip it into a folder, we can take a look at what’s inside

#Let's see what's in this shapefile!
#Note - the paths here are relative to where I
#am working in this file - you may have to change them
ogrInfo("../../MEOW-TNC", "meow_ecos")
## Source: "../../MEOW-TNC", layer: "meow_ecos"
## Driver: ESRI Shapefile number of rows 232 
## Feature type: wkbPolygon with 2 dimensions
## Extent: (-180 -89.9) - (180 86.9194)
## CRS: +proj=longlat +datum=WGS84 +no_defs  
## LDID: 87 
## Number of fields: 9 
##         name type length typeName
## 1   ECO_CODE    2     19     Real
## 2  ECOREGION    4     50   String
## 3  PROV_CODE    2     19     Real
## 4   PROVINCE    4     40   String
## 5   RLM_CODE    2     19     Real
## 6      REALM    4     40   String
## 7   ALT_CODE    2     19     Real
## 8 ECO_CODE_X    2     19     Real
## 9   Lat_Zone    4     10   String

OK, cool! We can see that it’s an ESRI shapefile with 232 rows – that’s 232 ecoregions. Each row of data has a number of properties – Province, Realm (which are both higher order geospatial classifications), some numeric IDs, and information about latitudinal zone. We can also see that it’s in the WGS84 projection – more on projections another time – and that it’s chocked full of polygons.

OK, that’s all well and good, but let’s load and plot the sucker!

#get an ecoregions shapefile, and from that make a provience and realm shapefile
regions <- readOGR("../../MEOW-TNC", "meow_ecos")
## OGR data source with driver: ESRI Shapefile 
## Source: "../../MEOW-TNC", layer: "meow_ecos"
## with 232 features and 9 fields
## Feature type: wkbPolygon with 2 dimensions


WHOAH! Cool. Regions! In the ocean! Nice! What a beautiful simple little plot. Again, well and good. But…..what can we do with this? Well, a quick call to class shows us that regions is a SpatialPolygonsDataFrame. Which of course has it’s own plotting methods, and such. So, you could fill some things, make some borders, overlay – the sky’s the limit. But, there are two things I want to show you how to do to make your life more flexible.

Higher Order Geographic Regions

One might be to look at Provinces and Realms. In general, when you have shapefiles, if you want to make aggregted polygons, you have to go through a few steps. Let’s say we want to look at Provinces. A province is composed of many ecoregions. Fortunately, there’s a function to unite SpatialPolygons (that’s the class we’re dealing with here that’s part of the SpatialPolygonsDataFrame) given some identifier.

#Unite the spatial polygons for each region into one
provinces <- unionSpatialPolygons(regions, regions$PROVINCE)

OK, great. But we still need to add some data to that. This provinces is just a SpatialPolygons object. To do that, let’s make a new reduced data frame using dplyr.

#Make a data frame that will have Province level info and above
prov_data <- regions@data %>%
  group_by(PROVINCE) %>%
  summarise(PROV_CODE = PROV_CODE[1], REALM = REALM[1], RLM_CODE=RLM_CODE[1], Lat_Zone=Lat_Zone[1])

Bueno. We now have a much smaller data frame that is for Provinces only. The last step is to make a new Spatial Polygons Data Frame by joining the data and the polygons. There are two tricks here. First, make sure the right rows in the data are joined to the right polygons. For that, we’ll use a join statement. The second, the new data frame has to have row names matching the names of the polygons. I don’t often use this, but in making a data frame, you can supply row names. So, here we go:

#merge the polygons with the new data file
#note the row.names argument to make sure they map to each other
provinces <- SpatialPolygonsDataFrame(provinces, 
## Joining by: PROVINCE

Not gorgeous, but it gets the job done. We can of course do this for realms as well.

#make realms shapefile
#make spatial polygons for realms
realms <- unionSpatialPolygons(regions, regions$REALM)

#make new data
realm_data <- regions@data %>%
  group_by(REALM) %>%
  summarise(RLM_CODE = RLM_CODE[1],  Lat_Zone=Lat_Zone[1])

#merge the two!
realms <- SpatialPolygonsDataFrame(realms, 
## Joining by: REALM

Excellent. So – did it work? And how different are these three different spatial things anyway? Well, let’s plot them!

#########Plot them all
par(mfrow=c(2,2), mar=c(0,0,0,0))
plot(regions, main="Ecoregion", cex.main=5)
plot(provinces, lwd=2, border="red", main="Province")
plot(realms, lwd=2, border="blue", main="Realm")



ggplot ’em! I admit, I’m a [ggplot2][9] junkie. I just find it the fastest way to make publication quality graphs with little fuss or muss. Or make something quick and dirty to send to colleagues. But, you can’t just go and plot a SpatialPointsDataFrame in ggplot2 with ease and then use it as you will. So what’s a guy to do?

I will admit, I’m shamelessly gacking the following from It provides a three step process where what you do, essentially, is turn the whole mess into a data frame with the polygons providing points for plotting geom_path or geom_polygon pieces.

Step 1, you need an ID column in your data. Let’s do this for both ecoregions and provinces

regions@data$id = rownames(regions@data)
provinces@data$id = rownames(provinces@data)

OK – step 2 is the fortify function. Fortify converts an R object into a data frame for ggplot2. In this case –

regions.points = fortify(regions, ECOREGION="id")
## Regions defined for each Polygons
provinces.points = fortify(provinces, PROVINCES="id")
## Regions defined for each Polygons

Great! Now that we have these two knew fortified data frames that describe the points that we’ll be plotting, the last thing to do is to join the points with the actual, you know, data! For that, I like to use join:

regions.df = join(regions.points, regions@data, by="id")
provinces.df = join(provinces.points, provinces@data, by="id")

What’s great about this is that, from now on, if I have another data frame of some sort that has a Ecoregion or Province as one of it’s headings – for example, let’s say I ran a linear model where Ecoregion was a fixed effect, I have a coefficient for each Ecoregion, and I’ve turned the coefficient table into a data frame with Ecoregion as one of the columns – as long as the name of the identifying column in my new data frame and my data frame for plotting are the same, I can use join to add a new column to my regions.df or provinces.df for plotting.

But, for now, I can show you how these would plot out in ggplot2. To do this, we use geom_polygon to define an area that we can fill as we want, and geom_path to stroke the outside of the areas and do with them what you will.

#####Make some ggplots for later visualization
base_ecoregion_ggplot <- ggplot(regions.df) + theme_bw() +
  aes(long,lat,group=group) + 
  geom_polygon(fill=NA) +
  geom_path(color="black") +

base_province_ggplot <- ggplot(provinces.df) + theme_bw() +
  aes(long,lat,group=group) + 
  geom_polygon(fill=NA) +
  geom_path(color="black") +

Note that there’s a fill=NA argument? That’s where I could put something like coefficient from that joined data, or temperature, or whatever I’ve tacked on to the whole shebang. Let’s see what they look like in ggplot2.

base_ecoregion_ggplot + ggtitle("Ecoregions")


base_province_ggplot + ggtitle("Provinces")


So what’s the advantage of putting them into ggplot? Well, besides using all of the graphical aestehtics for your polygon fills and paths, you can add points (say, sites), lines, or whatnot. One example could be, let’s say you wanted to visualize the borders of land (and countries!) on the map with ecoregions. Cool! Let’s get the worldmap, turn it into a data frame, and then add a geom_path with the world map on it.

worldmap <- map('world', plot=F)
worldmap.df <- data.frame(longitude =worldmap$x,latitude=worldmap$y)

  geom_path(data=worldmap.df, aes(x=longitude, y=latitude, group=NULL), color="darkgreen")


The possibilities really are endless at this point for cool visualizations.

EDIT – OK, here’s a cool example with filled polygons using a random ‘score’ to determine fill and RColorBrewer for pretty colors!

#let's make some fancy colors

#Make a data frame with Ecoregion as an identifier
thing <- data.frame(ECOREGION = regions$ECOREGION,
                    score = runif(nrow(regions), -100, 100))

#merge the score data with the regions data frame
regions.df2 <- merge(regions.df, thing)

ggplot(regions.df2) + theme_bw() +
       aes(long,lat,group=group) + 
       geom_polygon(mapping=aes(fill=score)) +
       geom_path(color="black") +
       coord_equal() +
       scale_fill_gradientn(colours=brewer.pal(11, "Spectral"))

Screen Shot 2015-03-28 at 11.38.02 AM

Next time, Leaflet! If I can figure out how to post it's output. And for those of you who don't know leaflet, prepare to be wowed.

Also, all code for this example is in a gist over here!

Space and SEMs


One question that comes up time and time again when I teach my SEM class is, “What do I do if I have spatially structured data?” Maybe you have data that was sampled on a grid, and you know there are spatial gradients. Maybe your samples are clustered across a landscape. Or at separate sites. A lot of it boils down to worrying about the hidden spatial wee beasties lurk in the background.

I’m going to stop for a moment and suggest that before we go any further you read Brad Hawkins’s excellent Eight (and a half) deadly sins of spatial analysis where he warns of the danger of throwing out the baby with the bathwater. Remember, in any modeling technique, you want to ensure that you’re capturing as much biological signal as is there, and then adjust for remaining spatial correlation. Maybe your drivers vary in a spatial pattern. That’s OK! They’re still your drivers.

That said, ignoring residual spatial autocorrelation essentially causes you to think you have a larger sample size than you think you do (remember the assumption of independent data points) and as such your standard errors are too tight, and you may well produce overconfident results.

To deal with this in a multivariate Structural Equation Modeling context, we have a few options. First, use something like Jon Lefcheck’s excellent piecewiseSEM package and fit your models with mixed model or generalized least squares tools that can accomodate spatial correlation matrices as part of the model. If you have non-spatial information about structure, I’ve started digging into the lavaan.survey package, which has been fun (and is teaching me a lot about survey statistics).

But, what if you just want to go with a model you’ve fit using covariance matrices and maximum likelihood, like you do, using lavaan in R? It should be simple, right?

Well, I’ve kind of tossed this out as a suggestion in the ‘advanced topics’ portion of my class for years, but never implemented it. This year, I got off of my duff, and have been working this up, and have both a solid example, and a function that should make your lives easier – all wrapped up over at github. And I’d love any comments or thoughts on this, as, to be honest, spatial statistics is not where I spend a lot of time. Although I seem to be spending more and more time there these days… silly spatially structured observational datasets…that I seem to keep creating.

Anyway, let’s use as an example the Boreal Vegetation dataset from Zuur et al.’s Mixed Effects Models and Extensions in Ecology with R. The data shows vegetation NDVI from satellite data, as well as a number of other covariates – information on climate (days where the temperature passed some threshold, I believe), wetness, and species richness. And space. Here’s what the data look like, for example:

# Boreality data from
# Mixed Effects Models and Extensions in Ecology with R (2009). 
# Zuur, Ieno, Walker, Saveliev and Smith. Springer
boreal <- read.table("./Boreality.txt", header=T)

#For later

#Let's look at the spatial structure

qplot(x, y, data=boreal, size=Wet, color=NDVI) +
  theme_bw(base_size=18) + 
  scale_size_continuous("Index of Wetness", range=c(0,10)) + 
  scale_color_gradient("NDVI", low="lightgreen", high="darkgreen")


So, there are both clear associations of variables, but also a good bit of spatial structure. Ruh roh! Well, maybe it’s all in the drivers. Let’s build a model where NDVI is affected by species richness (nTot), wetness (Wet), and climate (T61) and richness is itself also affected by climate.


## This is lavaan 0.5-17
## lavaan is BETA software! Please report any bugs.

# A simple model where NDVI is determined
# by nTot, temperature, and Wetness
# and nTot is related to temperature
borModel <- '
  NDVI ~ nTot + T61 + Wet 
  nTot ~ T61

#note meanstructure=T to obtain intercepts
borFit <- sem(borModel, data=boreal, meanstructure=T)

OK, great, we have a fit model – but we fear that the SEs may be too small! Is there any spatial structure in the residuals? Let’s look.

# residuals are key for the analysis
borRes <-, "casewise"))

#raw visualization of NDVI residuals
qplot(x, y, data=boreal, color=borRes$NDVI, size=I(5)) +
  theme_bw(base_size=17) + 
  scale_color_gradient("NDVI Residual", low="blue", high="yellow")


Well…sort of. A clearer way to see this that I like is just to see signs of residuals.

#raw visualization of sign of residuals
qplot(x, y, data=boreal, color=borRes$NDVI>0, size=I(5)) +
  theme_bw(base_size=17) + 
  scale_color_manual("NDVI Residual >0", values=c("blue", "red"))


OK, we can clearly see the positive residuals clustering on the corners, and negatives ones more prevalent in the middle. Sort of. Are they really? Well, we can correct for them one we know the degree of spatial autocorrelation, Moran’s I. To do this, there are a few steps. First, calculate the spatial weight matrix – essentially, the inverse of the distance between any pair of points. Close points should have a lower weight on the resulting analyses than nearer points.

#Evaluate Spatial Residuals
#First create a distance matrix
distMat <- as.matrix(dist(cbind(boreal$x, boreal$y)))

#invert this matrix for weights
distsInv <- 1/distMat
diag(distsInv) <- 0

OK, that done, we can determine whether there was any spatial autocorrelation in the residuals. Let’s just focus on NDVI.

#calculate Moran's I just for NDVI
mi.ndvi <- Moran.I(borRes$NDVI, distsInv)

## $observed
## [1] 0.08265236
## $expected
## [1] -0.001879699
## $sd
## [1] 0.003985846
## $p.value
## [1] 0

Yup, it’s there. We can then use this correlation to calculate a spatially corrected sample size, which will be smaller than our initial sample size.

#What is our corrected sample size?
n.ndvi <- nrow(boreal)*(1-mi.ndvi$observed)/(1+mi.ndvi$observed)

And given that we can get parameter variances and covariances from the vcov matrix, it’s a snap to calculate new SEs, remembering that the variance of a parameter has the sample size in the denominator.

#Where did we get the SE from?

##    NDVI~nTot     NDVI~T61     NDVI~Wet     nTot~T61   NDVI~~NDVI 
## 1.701878e-04 2.254616e-03 1.322207e-01 5.459496e-01 1.059631e-04 
##   nTot~~nTot       NDVI~1       nTot~1 
## 6.863893e+00 6.690902e-01 1.617903e+02

#New SE
ndvi.var <- diag(vcov(borFit))[1:3] <- sqrt(ndvi.var*nrow(boreal)/n.ndvi)

##    NDVI~nTot     NDVI~T61     NDVI~Wet 
## 0.0001848868 0.0024493462 0.1436405689

#compare to old SE

##    NDVI~nTot     NDVI~T61     NDVI~Wet 
## 0.0001701878 0.0022546163 0.1322207383

Excellent. From there, it’s a hop, skip, and a jump to calculating a z-score and ensuring that this parameter is still different from zero (or not!)

#new z values
z <- coef(borFit)[1:3]/

2*pnorm(abs(z), lower.tail=F)

##     NDVI~nTot      NDVI~T61      NDVI~Wet 
##  5.366259e-02  1.517587e-47 3.404230e-194

summary(borFit, standardized=T)

## lavaan (0.5-17) converged normally after  62 iterations
##   Number of observations                           533
##   Estimator                                         ML
##   Minimum Function Test Statistic                1.091
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.296
## Parameter estimates:
##   Information                                 Expected
##   Standard Errors                             Standard
##                    Estimate  Std.err  Z-value  P(>|z|)  Std.all
## Regressions:
##   NDVI ~
##     nTot             -0.000    0.000   -2.096    0.036   -0.000   -0.044
##     T61              -0.035    0.002  -15.736    0.000   -0.035   -0.345
##     Wet              -4.270    0.132  -32.295    0.000   -4.270   -0.706
##   nTot ~
##     T61               1.171    0.546    2.144    0.032    1.171    0.092
## Intercepts:
##     NDVI             10.870    0.669   16.245    0.000   10.870  125.928
##     nTot           -322.937  161.790   -1.996    0.046 -322.937  -30.377
## Variances:
##     NDVI              0.002    0.000                      0.002    0.232
##     nTot            112.052    6.864                    112.052    0.991

See! Just a few simple steps! Easy-peasy! And a few changes – the effect of species richness is no longer so clear, for example

OK, I lied. That’s a lot of steps. But, they’re repetative. So, I whipped up a function that should automate this, and produce useful output for each endogenous variable. I need to work on it a bit, and I’m sure issues will come up with latents, composites, etc. But, just keep your eyes peeled on the github for the latest update.

lavSpatialCorrect(borFit, boreal$x, boreal$y)

## $Morans_I
## $Morans_I$NDVI
##     observed     expected          sd p.value    n.eff
## 1 0.08265236 -0.001879699 0.003985846       0 451.6189
## $Morans_I$nTot
##     observed     expected          sd p.value    n.eff
## 1 0.03853411 -0.001879699 0.003998414       0 493.4468
## $parameters
## $parameters$NDVI
##             Parameter      Estimate    n.eff      Std.err   Z-value
## NDVI~nTot   NDVI~nTot -0.0003567484 451.6189 0.0001848868  -1.92955
## NDVI~T61     NDVI~T61 -0.0354776273 451.6189 0.0024493462 -14.48453
## NDVI~Wet     NDVI~Wet -4.2700526589 451.6189 0.1436405689 -29.72734
## NDVI~~NDVI NDVI~~NDVI  0.0017298286 451.6189 0.0001151150  15.02696
## NDVI~1         NDVI~1 10.8696158663 451.6189 0.7268790958  14.95382
##                  P(>|z|)
## NDVI~nTot   5.366259e-02
## NDVI~T61    1.517587e-47
## NDVI~Wet   3.404230e-194
## NDVI~~NDVI  4.889505e-51
## NDVI~1      1.470754e-50
## $parameters$nTot
##             Parameter    Estimate    n.eff     Std.err   Z-value
## nTot~T61     nTot~T61    1.170661 493.4468   0.5674087  2.063171
## nTot~~nTot nTot~~nTot  112.051871 493.4468   7.1336853 15.707431
## nTot~1         nTot~1 -322.936937 493.4468 168.1495917 -1.920534
##                 P(>|z|)
## nTot~T61   3.909634e-02
## nTot~~nTot 1.345204e-55
## nTot~1     5.479054e-02

Happy coding, and I hope this helps some of you out. If you’re more of a spatial guru than I, and have any suggestions, feel free to float them in the comments below!