Linear Model Power Analysis with Dplyr

I really liked the power analysis example I gave in class today. I feel like I’m finally firing on all cylinders with the tidyverse. So I thought I would post it here, in case it was of use to anyone!



Power

OK, we can fit and evluate linear regression models, but what about their power? Let’s look at our seal fit again – maybe we want to know what would have happened with a lower sample size?

Note, I’m going to use a knitr function, kable to neaten up the table for markdown. Try it in your own homeworks!

knitr::kable(tidy(seal_lm))
term estimate std.error statistic p.value
(Intercept) 115.7667505 0.1763751 656.3668 0
age.days 0.0023706 0.0000447 53.0645 0

What about the residual SD?

knitr::kable(glance(seal_lm)[,1:3])
r.squared adj.r.squared sigma
0.2256493 0.2255691 5.6805

All right – there are our target values. Now, we can change a lof ot things. The effect size (slope), range of values, sigma, and more. But let’s try sample size.

1 Make a Table of Parameters, some of which vary

The first step of a power analysis is to setup a data frame with all of the different possibilities that you might want to assess for power. For linear regression, we have the following parameters, all of which might vary:

  1. Slope
  2. Intercept
  3. Residual variance
  4. Sample Size
  5. Range of X values

To do a power analysis via simulation for a linear regression, we begin by building our simulation data frame with the parameters and information that varies. In this case, sample size.

library(dplyr)

set.seed(607)

simPopN <- data.frame(slope = 0.00237, 
                      intercept=115.767,
                      sigma = 5.6805,
                      n=10:100) 

Note, if we wanted to vary more than one parameter, we’d first create a data frame where only one parameter varied, then add a second that varried using crossing in tidyr, like so:

library(tidyr)

simPopN <- data.frame(slope = 0.00237, 
                      intercept=115.767,
                      sigma = 2:6) %>%
  crossing(n=10:100) 
2 Expand to have a number of rows for each simulated sample with each parameter combination

OK, once we have everything in place, including sample size, we need to expand this out to have some number of samples for each n. For that, we can use the function in tidyr (same library as crossings), expand().

library(tidyr)

simPopN <- simPopN %>%
  group_by(slope, intercept, sigma, n) %>%
  expand(reps = 1:n) %>%
  ungroup()
3 Expand to create repeated simulated data sets for each combination of parameters

Now, if we want to simulate each of these, say, 100 times, we need to assign unique sim numbers, so for each n and sim number we have a unique data set. We can use crossing to replicate each combination of variables above some number of times. Note – 100 is really low, but it doesn’t eat your processor! Use low numbers of simulation for development, then crank them up for final analysis.

simPopN <- simPopN  %>%
  crossing(sim = 1:100)
3 Simulate the data

Great – almost ready to go! Now we just need to add in fitted values. Fortunately, as rnorm() works with vectors, we can just use a mutate here. We’ll also need to simulate random draws of ages, but that’s just another random number.

simPopN <- simPopN %>%
  mutate(age.days = runif(n(), 1000, 8500)) %>%
  mutate(length.cm = rnorm(n(), intercept + slope*age.days, sigma))

Yatzee! Ready to run!

4 Fit models and extract coefficients

First, we now need to generate a lot of fit models. Dplyr doesn’t take too kindly to including fit things, so, we can use two powerful functions here – first, nest and unnest() allow us to collapse grouped data down into little pieces and re-expand it.

fits <- simPopN %>%
    group_by(slope, intercept, sigma, n, sim) %>%
    nest()

fits
## # A tibble: 9,100 × 6
##      slope intercept  sigma     n   sim              data
##      <dbl>     <dbl>  <dbl> <int> <int>            <list>
## 1  0.00237   115.767 5.6805    10     1 <tibble [10 × 3]>
## 2  0.00237   115.767 5.6805    10     2 <tibble [10 × 3]>
## 3  0.00237   115.767 5.6805    10     3 <tibble [10 × 3]>
## 4  0.00237   115.767 5.6805    10     4 <tibble [10 × 3]>
## 5  0.00237   115.767 5.6805    10     5 <tibble [10 × 3]>
## 6  0.00237   115.767 5.6805    10     6 <tibble [10 × 3]>
## 7  0.00237   115.767 5.6805    10     7 <tibble [10 × 3]>
## 8  0.00237   115.767 5.6805    10     8 <tibble [10 × 3]>
## 9  0.00237   115.767 5.6805    10     9 <tibble [10 × 3]>
## 10 0.00237   115.767 5.6805    10    10 <tibble [10 × 3]>
## # ... with 9,090 more rows

Second, the map function in the purrr library allows us to iterate over different levels or grouped data frames, and perform some function. In this case, we’ll fit a model, get it’s coefficients using broom. This is a new weird set of functions. What’s odd about map is that the first argument is a column. But that argument, for the rest of the arguments of the function, is now called . and we also use the ~ notation. What ~ does is says that, from this point forward, . refers to the first argument given to map.

In essence, what map does is iterate over each element of a list given to it. Once we nest, the data column is now a list, with each element of the list it’s own unique data frame. So, map works with lists to apply a function to each element. The output of the model fitting is another list – now called mod. A list of models. We then iterate, using map over that list to generate another list of data frames – this time of coefficients.

library(purrr)
library(broom)

fits <- fits %>%
    mutate(mod = map(data, ~lm(length.cm ~ age.days, data=.))) %>%
    mutate(coefs = map(mod, ~tidy(.)))

fits  
## # A tibble: 9,100 × 8
##      slope intercept  sigma     n   sim              data      mod
##      <dbl>     <dbl>  <dbl> <int> <int>            <list>   <list>
## 1  0.00237   115.767 5.6805    10     1 <tibble [10 × 3]> <S3: lm>
## 2  0.00237   115.767 5.6805    10     2 <tibble [10 × 3]> <S3: lm>
## 3  0.00237   115.767 5.6805    10     3 <tibble [10 × 3]> <S3: lm>
## 4  0.00237   115.767 5.6805    10     4 <tibble [10 × 3]> <S3: lm>
## 5  0.00237   115.767 5.6805    10     5 <tibble [10 × 3]> <S3: lm>
## 6  0.00237   115.767 5.6805    10     6 <tibble [10 × 3]> <S3: lm>
## 7  0.00237   115.767 5.6805    10     7 <tibble [10 × 3]> <S3: lm>
## 8  0.00237   115.767 5.6805    10     8 <tibble [10 × 3]> <S3: lm>
## 9  0.00237   115.767 5.6805    10     9 <tibble [10 × 3]> <S3: lm>
## 10 0.00237   115.767 5.6805    10    10 <tibble [10 × 3]> <S3: lm>
## # ... with 9,090 more rows, and 1 more variables: coefs <list>

Last, we cleanup – we unnest, which takes list-columns from above, and expands them out as into full data frames that get slotted back into our original data frame. Nice trick, no?

We’ll also filter for just the slope coefficient.

fits <- fits %>%
  unnest(coefs) %>%
  ungroup() %>%
  filter(term == "age.days")

fits
## # A tibble: 9,100 × 10
##      slope intercept  sigma     n   sim     term     estimate    std.error
##      <dbl>     <dbl>  <dbl> <int> <int>    <chr>        <dbl>        <dbl>
## 1  0.00237   115.767 5.6805    10     1 age.days 0.0004283944 0.0010318274
## 2  0.00237   115.767 5.6805    10     2 age.days 0.0021239104 0.0005736028
## 3  0.00237   115.767 5.6805    10     3 age.days 0.0033531879 0.0009399613
## 4  0.00237   115.767 5.6805    10     4 age.days 0.0032566962 0.0008342080
## 5  0.00237   115.767 5.6805    10     5 age.days 0.0028332982 0.0009062293
## 6  0.00237   115.767 5.6805    10     6 age.days 0.0028501936 0.0005176723
## 7  0.00237   115.767 5.6805    10     7 age.days 0.0021677718 0.0004457464
## 8  0.00237   115.767 5.6805    10     8 age.days 0.0034636535 0.0006116823
## 9  0.00237   115.767 5.6805    10     9 age.days 0.0012147324 0.0012945056
## 10 0.00237   115.767 5.6805    10    10 age.days 0.0039061730 0.0011307423
## # ... with 9,090 more rows, and 2 more variables: statistic <dbl>,
## #   p.value <dbl>
5 Calculate your Power

Notice that we do indeed have p-values, so we can use these fits to get power for each sample size. We can now do our normal process – in this case grouping by sample size – to get power. And then we can plot the result!

pow <- fits %>%
    group_by(n) %>%
    summarise(power = 1-sum(p.value>0.05)/n()) %>%
    ungroup() 


qplot(n, power,  data=pow, geom=c("point", "line")) +
  theme_bw(base_size=17) +
  geom_hline(yintercept=0.8, lty=2)

6. Examples

Let’s take a look at the whole workflow, this time trying a bunch of standard deviation values.

##setup parameters
simSD <- data.frame(slope = 0.00237, 
                      intercept=115.767,
                      sigma = seq(2:6, lengthout=5),
                      n=100) %>%
  group_by(slope, intercept, sigma, n) %>%

  #add sample sizes
  expand(reps = 1:n) %>%
  
  #add sims
  crossing(sim = 1:100) %>%
  
  #add fake data
  mutate(age.days = runif(n(), 1000, 8500)) %>%
  mutate(length.cm = rnorm(n(), intercept + slope*age.days, sigma))


##make your fits
fitsSD <- simSD %>%
    group_by(slope, intercept, sigma, n, sim) %>%
    nest()%>%
  
  #mapping
  mutate(mod = map(data, ~lm(length.cm ~ age.days, data=.))) %>%
  mutate(coefs = map(mod, ~tidy(.)))%>%
  
  #unnest and filter
  unnest(coefs) %>%
  ungroup() %>%
  filter(term == "age.days")

##calculate and plot power
powSD <- fitsSD %>% group_by(n) %>%
    summarise(power = 1-sum(p.value>0.05)/n()) %>%
    ungroup() 


qplot(n, power,  data=powSD, geom=c("point", "line")) +
  theme_bw(base_size=17) +
  geom_hline(yintercept=0.8, lty=2)


I’ve Got the Power!

There is nothing like that pain of taking a first look at fresh precious new data from a carefully designed experiment which took months of planning and construction, 8 divers working full days, and a lot of back-and-forth with colleagues, and then you find absolutely no patterns of interest.

Yup, it’s a real takes-your-breath-away-hand-me-a-beer kind of moment. Fortunately, I was on my way to a 4th of July dinner with my family up in the mountains of Colorado, so, I wasn’t quite going to commit hari-kari on the spot. Still, the moment of vertigo and neausea was somewhat impressive.

Because, let’s admit it, as much as science is about repeated failure until you get an experiment right (I mean, the word is “experiment” not “interesting-meaningful-data-generating-excersise”), it still never feels good.

So, round one of my experiment to test for feedbacks between sessile species diversity and urchin grazing showed bupkiss. Not only was there no richness effect, but, well, there wasn’t even a clear effect that urchins mattered. High density urchin treatments lost about as much cover and diversity of algae and sessile inverts as treatments with low or no urchins in them. What had gone wrong? And can this project be salvaged?

Yeah, I can see a pattern here.  Sure...

Yeah, I can see a pattern here. Sure...

Rather than immediately leaping to a brilliant biological conclusion (that was frankly not in my head) I decided to jump into something called Power Analysis. Now, to most experimental scientists, power analysis is that thing you were told to do in your stats class to tell you how big your sample size should be, but, you know, sometimes you do it, but not always. Because, really, it is often taught as a series of formulae with no seeming rhyme nor reason (although you know it has something to do with statistical significance). Most folk use a few canned packages for ANOVA or regression, but the underlying mechanics seem obscure. And hence, somewhat fear-provoking.

Well, here’s the dirty little secret of power analysis. Really, at the end of the day, it’s just running simulations of a model of what you THINK is going on, adding in some error, and then seeing how often your experimental design of choice will give you easily interpretable (or even correct) results.

The basic mechanics are this. Have some underlying deterministic model. In my case, a model that described how the cover and diversity of sessile species should change given an initial diversity, cover, and number of urchins. Then, add in random error – either process error in each effect itself, or just noise from the environment (e.g., some normal error term with mean 0 and a standard deviation). Use this to get made-up data for your experiment. Run your states and get parameter estimates, p values, etc. Then, do the same thing all over again. Over many simulations of a particular design, you can get an average coefficient estimates, the number of times you get a significant result, etc. “Power” for a given effect is determined by the number of “significant” results (where you define if you’re going with p=0.05 or whatnot) divided by the number of simulations.

So that’s just what I did. I had coefficient estimates (albeit nonsignificant ones). So, what would I need to detect them? I started by playing with the number of replicates. What if I had just put out more cages? Nope. The number of replicates I’d need to get the power into the 0.8 range alone (and you want as close to 1 as possible) was mildly obscene.

So, what about number of urchins? What if instead of having 0 and 16 as my lowest and highest densities?

Bingo.

Shown above are two plots. On the left, you see the p value for the species richness * urchin density interaction for each simulated run at a variety of maximum urchin densities. On the right you see the power (# of simulations where p<0.05 / total number of simulations). Note that, for the densities I worked with, power is around 0.3? And if I kicked it up to 50, well, things get much nicer. As soon as I saw this graph, the biology of the the Southern California Bight walked up behind me and whapped me over the head. Of course. I had been using a maximum density that corresponded to your average urchin barren. The density of urchins that can hang out, and graze down any new growth. But that's not the density of urchins that CREATED the barren in the first place. Sure enough, looking at our long-term data, this new value of 50 corresponded to a typical front of urchins that would move through an area and lay waste to the kelp forest around it. Which is what I had been interested in the first place. *headdesk* So, round two is up and running now. The cages are stocked with obscenely high densities of the little purple spikey dudes. And we shall see what happens! I am quite excited, and hopeful. Because, I've got the Power! (and so should you!)