Why I’m Teaching Computational Data Analysis for Biology

This is a x-post from the blog I’ve setup for my course blog. As my first class at UMB, I’m teaching An Introduction to Computational Data Analysis for Biology – basically mixing teaching statistics and basic programming. It’s something I’ve thought a long time about teaching – although the rubber meeting the road has been fascinating.

As part of the course, I’m adapting an exercise that I learned while taking English courses – in particular from a course on Dante’s Divine Comedy. I ask that students write 1 page weekly to demonstrate that they are having a meaningful interaction with the material. I give them a few pages from this book as a prompt, but really they can write about anything. One student will post on the blog per week (and I’m encouraging them to use the blog for posting other materials as well – we shall see, it’s an experiment). After they post, I hope that it will start a conversation, at least amongst participants in the class. I also think this post might pair well with some of Brian McGill’s comments on statistical machismo to show you a brief sketch of my own evolution as a data analyst.

I’ll be honest, I’m excited. I’m excited to be teaching Computational Data Analysis to a fresh crop of graduate students. I’m excited to try and take what I have learned over the past decade of work in science, and share that knowledge. I am excited to share lessons learned and help others benefit from the strange explorations I’ve had into the wild world of data.

I’m ready to get beyond the cookbook approach to data. When I began learning data analysis, way back in an undergraduate field course, it was all ANOVA all the time (with brief diversions to regression or ANCOVA). There was some point and click software that made it easy, so long as you knew the right recipe for the shape of your data. The more complex the situation, the more creative you had to be in getting an accurate sample, and then in determining what was the right incantation of sums of squares to get a meaningful test statistic. And woe be it if your p value from your research was 0.051.

I think I enjoyed this because it was fitting a puzzle together. That, and I love to cook, so, who doesn’t want to follow a good recipe?

Still, there was something that always nagged me. This approach – which I encountered again and again – seemed stale. The body of analysis was beautiful, but it seemed divorced from the data sets that I saw starting to arrive on the horizon – data sets that were so large, or chocked full of so many different variables, that something seemed amiss.

The answer rippled over me in waves. First, comments from an editor – Ram Meyers – for a paper of mine began to lift the veil. I had done all of my analyses as taught (and indeed even used for a class) using ANOVA and regression, multiple comparison, etc. etc. in the classic vein. Ram asked why, particularly given that the Biological processes that generated my data should in no way generate something with a normal – or even log-normal – distribution. While he agreed that the approximation was good enough, he made me go back, and jump off the cliff into the world of generalized linear models. It was bracing. But he walked me through it – over the phone even.

So, a new recipe, yes? But it seemed like something more was looming.

Then, an expiration of a JMP site license with one week left on a paper revision left me bereft. The only free tool I could turn to that seemed to do what I wanted it to do was R.

Wonderful, messy, idiosyncratic R.

I jumped in and learned the bare minimum of what I needed to know to do my analysis…and lost myself.

I had taken computer science in college, and even written the backend of a number of websites in PERL (also wonderful, messy, and idiosyncratic). What I enjoyed most about programming was that you could not hide from how you manipulated information. Programming has a functional aspect at the core where an input must be translated into a meaningful output according to the rules that you craft.

Working with R, I was crafting rules to generate meaningful statistical output. But what were those rules but my assumptions about how nature worked. The fundamentals of what I was doing all along – fitting a line to data with an error distribution – that should be based in biology, not arbitrary assumptions – was laid all the more bare. Some grumblingly lovely help from statistical denizens on the R help boards helped to bring this in sharp focus.

So, I was ready when, for whatever reason, fate thrust me into a series of workshops on Bayesian statistics, AIC analysis, hierarchical modeling, time series analysis, data visualization, meta-analysis, and last – Structural Equation Modeling.

I was delighted to learn more and more of how statistical analysis had grown beyond what I had been taught. I drank deeply of it. I know, that’s pretty nerdy, but, there you have it.

The new techniques all shared a common core – that they were engines of inference about biological processes. How I, as the analyst, made assumptions about how the world worked was up to me. Once I had a model of how my system worked in mind – sketched out, filled with notes on error distributions, interactions, and more, I could sit back and think about what inferential tools would give me the clearest answers I needed.

I had moved instead of finding the one right recipe in a giant cookbook to choosing the right tools out of a toolbox. And then using the tools of computer science – optimizing algorithms, thinking about efficient data storage, etc – to let my tools work bring data and biological models together.

It’s exciting. And that’s the core philosophy I’m trying to convey in this semester. (N.B. the spellchecker tried to change convey to convert – there’s something there).

Think about biology. Think about a line. Think about a probability distribution. Put them together, and find out what stories your data can tell you about the world.

Missing my Statsy Goodness? Check out #SciFund!

I know, I know, I have been kinda lame about posting here lately. But that’s because my posting muscle has been focused on the new analyses for what makes a succesful #SciFund proposal. I’ve been posting them at the #SciFund blog under the Analysis tag – so check it out. There’s some fun stats, and you get to watch me be a social scientist for a minute. Viva la interdisciplinarity!

Running R2WinBUGS on a Mac Running OSX

I have long used JAGS to do all of my Bayesian work on my mac. Early on, I tried to figure out how to install WinBUGS and OpenBUGS and their accompanying R libraries on my mac, but, to no avail. I just had too hard of a time getting them running and gave up.

But, it would seem that some things have changed with Wine lately, and it is now possible to not only get WinBUGS itself running nicely on a mac, but to also get R2WinBUGS to run as well. Or at least, so I have discovered after an absolutely heroic (if I do say so myself) effort to get it all running (this was to help out some students I’m teaching who wanted to be able to do the same exercises as their windows colleagues). So, I present the steps that I’ve worked out. I do not promise this will work for everyone – and in fact, if it fails at some point, I want to know about it so that perhaps we can fix it so that more people can get WinBUGS up and running.

Or just run JAGS (step 1} install the latest version, step 2} install rjags in R. Modify your code slightly. Run it. Be happy.)

So, this tutorial works to get the whole WinBUGS shebang running. Note that it hinges on installing the latest development version of Wine, not the stable version (at least as of 1/17/12). If you have previously installed wine using macports, good on you. Now uninstall it with “sudo port uninstall wine”. Otherwise, you will not be able to do this.

Away we go!

1) Have the free version of XCode Installed from http://developer.apple.com/xcode/. You may have to sign up for an apple developer account. Whee! You’re a developer now!

2) Have X11 Installed from your system install disc.

3) Install http://www.macports.org/install.php and install from the package installer. See also here for more information. Afterwards, open the terminal and type

echo export PATH=/opt/local/bin:/opt/local/sbin:$PATH$'n'export MANPATH=/opt/local/man:$MANPATH | sudo tee -a /etc/profile

You will be asked for your password. Don’t worry that it doesn’t display anything as you type. Press enter when you’ve finished typing your password.

4) Open your terminal and type

sudo port install wine-devel

5) Go have a cup of coffe, check facebook, or whatever you do while the install chugs away.

6) Download WinBUGS 1.4.x from here. Also download the immortality key and the patch.

7) Open your terminal, and type

cd Downloads
wine WinBUGS14.exe

Note, if you have changed your download directory, you will need to type in the path to the directory where you download files now (e.g., Desktop).

8 ) Follow the instructions to install WinBUGS into c:Program Files.

9) Run WinBUGS via the terminal as follows:

wine ~/.wine/drive_c/Program Files/WinBUGS14/WinBUGS14

10) After first running WinBUGS, install the immortality key. Close WinBUGS. Open it again as above and install the patch. Close it. Open it again and WinBUGS away!

11) To now use R2WinBugs fire up R and install the R2WinBUGS library.

12) R2WinBugs should now work normally with one exception. When you use the bugs function, you will need to supply the following additional argument:

bugs.directory='/Users/YOURUSERNAME/.wine/drive_c/Program Files/WinBUGS14'

filling in your username where indicated. If you don’t know it, in the terminal type

ls /Users

No, ~ will not work for those of you used to it. Don’t ask me why.

Seeing Through the Measurement Error

I am part of an incredibly exciting project – the #SciFund Challenge. #SciFund is an attempt to have scientists link their work to the general public through crowdfunding. As I’m one of the organizers, I thought I should have some skin in the game. But what skin?

Well, people are pitching some incredibly sexy projects – tracking puffin migrations, coral reefs conservation, snake-squirrel interactions (WITH ROBOSQUIRRELS!), mathematical modeling of movements like Occupy Wall Street, and many many more. It’s some super sexy science stuff.

So what is my project going to address? Measurement error.


But wait, before you roll your eyes at me, this is REALLY IMPORTANT. Seriously!

It can change everything we know about a system!

I’m working with a 30 year data set from the Channel Islands National Park. 30 years of divers going out and counting everything in those forests to see what’s there. They’ve witnessed some amazing change – El Niños, changes in fishing pressure, changes in fishing pressure, changes in urbanization on the coast, and more. It’s perhaps the best long-term large-scale full community subtidal data set in existence (and if there are better, um, send ’em my way because I want to work with them!)

But 30 years – that’s a lot of different divers working on this data set under a ton of different environmental conditions. Doing basic sampling on SCUBA is arduous, and given the sometimes crazy environmental conditions, there is probably some small amount of variation in the data due to processes other than biology. To demonstrate this to a general non-statistical audience, I created the following video. Enjoy watching me in my science film debut…oh dear.

OK, my little scientific audience. You might look at this and think, meh, 30 years of data, won’t any measurement error due to those kinds of conditions or differences in the crew going out to do these counts just average out? With so much data, it shouldn’t be important! Jarrett just wanted an excuse to make a silly science video!

And that’s where you may well be wrong (well, about the data part, anyway). I’ve been working with this data for a long time, and one of my foci has been to try and tease out the signals of community processes, like the relative importance of predation and grazing versus nutrients and habitat provision. Your basic top-down bottom-up kind of thing. While early models showed, yep, they’re both important, and here’s how and why, some rather strident reviewer comments came back and forced me to rethink the models, adding in a great deal more complexity even to the simplest one.

And this is where measurement error became important. Measurement error can obscure the signal of important processes in complex models. A process may be there, may be important in your data, but if you’re not properly controlling for measurement error it can hide real biological patterns.

For example, below is a slice of one model done with two different analyses. I’m looking at whether there are any relationships between predators, grazers, and kelp. On the left hand side, we have the results from the fit model without using calibration data to quantify measurement error. While it appears that there is a negative relationship between grazers and kelp, there is no detectable relationship between predators and grazers (hence the dashed line – it ain’t different from 0).

This is because there is so much extra variation in records of grazer abundances due to measurement error that we cannot see the predator -> grazer relationship.

Now let’s consider the model on the right. Here, I’ve assumed that 10% of the variation in the data is due to measurement error (i.e., an R2 of 0.9 between observed and actual grazer abundances). So, I have “calibration” data. This error rate is made up, just to show the consequences of folding the error in to our analysis.

Just folding in this very small amount of measurement error, we get a change in the results of the model. We can now see a negative relationship between predators and grazers.

I need this calibration data to ensure that the results I’m seeing in my analyses of this incredible 30 year kelp forest data set are real, and not due to spurious measurement error. So I’m hoping wonderful folk like you (or people you know – seriously, forward http://scifund.rockethub.com around to everyone you know! RIGHT NOW!) will see the video, read the project description, and pitch in to help support kelp forest research.

If we’re going to use a 30 year data set to understand kelp forests and environmental change, we want to do it right, so, let’s figure out how much variation in the data is real, and how much is mere measurement error. It’s not hard, and the benefits to marine research are huge.

#SciFund Preview….

Whew, it’s been a bit since I posted here. Rest assured, little sea squirts, there are some interesting new things in the works. Some things that are science-y (in which I try and use this blog as a sounding board/ lab notebook) and some things not so much.

In the not so much category, a ton of my time has lately been going to the organizing of the #SciFund challenge – a large initiative to try and crowdfunding for science! If you haven’t been following it, check out our initial manifesto here.

I’m pretty stoked about the whole thing – it’s a real way of connecting science to the public via a funding mechanism. And with us launching on November 1st, it’s been an absolute pleasure to watch the creative and innovative videos that participating scientists have been putting together to solicit funds.

Videos, you say? Am I doing one?

Why yes! So to give you a hint of what’s to come, here’s a brief preview of my #SciFund video. I think you’ll all agree, it’s vintage me, attempting to sell one of the more arcane (to the public) pieces of my research in a way that might just connect. We shall see.

More to come in a week…

Ecological SEMs and Composite Variables: What, Why, and How

I’m a HUGE fan of Structural Equation Modeling. For those of you unfamiliar with the technique, it’s awesome for three main reasons.

  1. It’s a method of teasing apart direct and indirect interactions in your data.
  2. It allows you to assess the importance of underlying latent variables that you cannot measure, but for which have measured indicators.
  3. As it’s formally presented, with path diagrams showing connections between variables, it’s SUPER easy to link conceptual models with your data. See Grace et al. 2010 for a handy guide to this.

Also, there is a quite simple and intuitive R package for fitting SEMs, lavaan (LAtent VAriable Analysis). Disclaimer, I just hopped on board as a lavaan developer (yay!). I’ve also recently started a small project to find cool examples of SEM in the Ecological literature, and then using the provided information, post the models coded up in lavaan so that others can see how to put some of these models together.

As Ecologists, we often use latent variables to incorporate known measurement error of a quantity – i.e., a latent variable with a single indicator and fixed variance. We’re often not interested in the full power of latent variables – latents with multiple indicators. Typically, this is because we’ve actually measured everything we want to measure. We’re not like political scientists who have to quantify fuzzy things like Democracy, or Authoritarianism, or Gastronomicism. (note, I want to live in a political system driven by gastronomy – a gastronomocracy!)

However, we’re still fascinated by the idea of bundling different variables together into a single causal effect, and maybe evaluating the relative contribution of each of those variables within a model. In SEM, this is known as the creation of a Composite Variable. This composite is still an unmeasured quantity – like a latent variable – but with no error variance, and with “indicators” actually driving the variable, rather than having the unmeasured variable causing the expression of its indicators.

Let me give you an example. Let’s say we want to measure the effect of nutrients on diatom species richness in a stream. You’re particularly concerned about nitrogen. However, you can’t bring water samples back to the lab, so, you’re relying on some moderately accurate nitrogen color strips, the biomass of algae (more algae = more nitrogen!), and your lab tech, Stu, who claims he can taste nitrogen in water (and has been proved to be stunningly accurate in the past). In this case, you have a latent variable. The true nitrogen content of the water is causing the readings by these three different indicators.

A composite variable is a different beast. Let’s say we have the same scenario. But, now you have really good measurements of nitrogen. In fact, you have good measurements of both ammonium (NH4) and nitrate (NO3). You want to estimate a “nitrogen effect”, but, you know that both of these different forms of N will contribute to the effect in a slightly different way. You could just construct a model with effects going from both NO3 and NH4 to species richness. If you want to represent the total “Nitrogen Effect” in your model, however, and evaluate the effect of each form of nitrogen on its total effect, you would create a composite. The differences become clear when looking at the path diagram of each case.

Here, I’m adopting the custom of observed variables in squares, latent variables in ovals, and composite variables in hexagons. Note that, as indicators of nitrogen in the latent variable model, each observed indicator has some additional variation due to factors other than nitrogen – δi. There is no such error in the composite variable model. Also, I’m showing that the error in the Nitrogen Effect in the composite variable model is indeed set to 0. There are sometimes reasons where that shouldn’t be 0, but that’s another topic for another time.

This may still seem abstract to you. So, let’s look at an example in practice. One way we often use composites is to bring together a linear and nonlinear effect of a single variable. For example, we know that often nutrient supply rates have a humped shape effect on species richness – i.e., the highest richness happens at intermediate supply rates. One nice example of that is in a paper by Cardinale et al. in 2009 looking at relationships between manipulated nutrient supply, species richness, and algal productivity. To capture that relationship with a composite variable, one would have a ‘nitrogen effect’ affected by N and N2. This nitrogen effect would then affect local species richness.

So, how would you code this model up in lavaan, and then evaluate it.

Well, hey, the data from this paper are freely available, so, let’s use this as an example. For a full replication of the model presented in the paper see here. However, Cardinale et al. didn’t use any composite variables, so, let’s create a model of our own capturing the Nitrogen-Richness relationship while also accounting for local species richness being influenced by regional species richness.

In the figure above, we have the relationship between resource supply rate and local species richness on an agar plate to the left. Separate lines are for separate streams. The black line is the average fit with the supplied equation. On the right, we have a path diagram representing this relationship, as well as the influence of regional species richness.

So, we have a path diagram. Now comes the tricky part. Coding. One thing about the current version of lavaan (0.4-8) is that it does not have a way to represent composite variables. This will change in the future (believe me), but, it may take a while, so, let me walk you through the tricks of incorporating latent variables now. Basically, there are four steps.

  1. Define the variable as a regression, where the composite is determined by it’s causal variables. Also, fix one of the effects to 1. This gives your composite variable a scale.
  2. Specify that the composite has an error variance of 0.
  3. Now treat the composite as a latent variable. It’s indicators are it’s response variables. This may seem odd. However, it’s all just ways of specifying causal pathways – an indicator pathway and a regression pathway have the same meaning in terms of causality. The software just needs something specified so that it doesn’t go looking for our composite variable in our data. Hence, defining it as a latent variable whose indicators are endogenous responses. I actually find this helpful, as it also makes me think more carefully about what a composite variable is, and how too many responses may make my model not identified.
  4. Lastly, because we don’t want to fix the effect of our composite on its response to 1, we’ll have to include an argument in the fitting function that makes it not force the first latent variable loading to be set to 1. We’ll also have to specify that we then want the variance of the response to latent variables freely estimated. Yeah, I know. Note: this all can play havoc when you have both latent and composite variables, so be careful. See here for an example.
  5. Everything else – other regression relationships, showing that nonlinearities are derived quantities, etc.

OK, that’s a lot. How’s it work in practice? Below is the code to fit the model in the path diagram. I’ve labeled the steps in comments, and, included the regional ~ local richness relationship as well as the relationship showing that logN2 was derived from logN. Note, this is a centered squared variable. And, yes, all nitrogen values have been log transformed here.

#simple SA model with N and regional SR using a composite
#Variables: logN = log nutrient supply rate, logNcen2 = log supply rate squared
# SA = Species richness on a patch of Agar, SR = stream-wide richness
	#1) define the composite, scale to logN
	Nitrogen ~ 1*logN + logNcen2 #loading on the significant path!

	#2) Specify 0 error variance
	Nitrogen ~~ 0*Nitrogen

  	#3) now, because we need to represent this as a latent variable
  	#show how species richness is an _indicator_ of nitrogen
  	Nitrogen =~ SA

	#4) BUT, make sure the variance of SA is estimated
  	SA ~~ SA

	#Regional Richness also has an effect
	SA ~ SR

	#And account for the derivation of the square term from the linear term
	logNcen2 ~ logN

 # we specify std.lv=T so that the Nitrogen-SA relationship isn't fixed to 1
 compositeFit <- sem(compositeModel, data=cards, std.lv=T)

Great! It should fit just fine. I'm not going to focus on the regional relationship, as it is predictable and positive. Moreover, when we look at the results, two things immediately pop out at us about the effect of nutrient supply rate.

                   Estimate  Std.err  Z-value  P(>|z|)
Latent variables:
  Nitrogen =~
    SA                0.362    0.438    0.827    0.408

  Nitrogen ~
    logN              1.000
    logNcen2         -1.311    1.373   -0.955    0.340

Wait, what? The Nitrogen effect was not detectably different from 0? Nor was there a nonlinear effect? What's going on here?

What's going on is that the scale of the composite is affecting our results. We've set it to 1. Whenever you are fixing scales, you should always check and see, what would happen if you changed which path was set to 1. So, we can simply set the scale to the nonlinear variable, refit the model, and see if this makes a difference. If it doesn't, then that means there is no nitrogen effect at all!

So, change

Nitrogen ~ 1*logN + logNcen2


Nitrogen ~ logN + 1*logNcen2

And, now let's see the results…..

                   Estimate  Std.err  Z-value  P(>|z|)
Latent variables:
  Nitrogen =~
    SA               -0.474    0.239   -1.989    0.047

  Nitrogen ~
    logN             -0.763    0.799   -0.955    0.340
    logNcen2          1.000

Ah HA! Not only is the linear effect not different from 0, but now we see that fixing the nonlinear effect allows the nutrient signal to come through.

But wait, you may say, that effect is negative? Well, remember that the scale of the nitrogen effect is the same as the nonlinear scale. And, a positive hump-shaped relationship will have a negative squared term. So, given how we've setup the model, yes, that should be negative.

*whew!* That was a lot. And this for a very simile model involving composites and nonlinearities. I thought I'd throw that out as it's a common use of composites, and interpreting nonlinearities in SEMs is always a little tricky and worth bending your brain around. Other uses of composites include summing up a lot of linear quantities, a composite for the application of treatments, and more. But, this should give you a good sense of what they are, how to code them in lavaan, and how to use them in the future.

For a more in depth treatment of this topic, and latent variables versus composites, I urge you to check out this excellent piece by Jim Grace and Ken Bollen. Happy model fitting!

Extra! Extra! Get Your gridExtra!

The more I use it, the deeper I fall in love with ggplot2. I know, some of you have heard me kvel about it ad nauseum (oh, yiddish and latin in one sentence – extra points!). But the graphs really look great, and once you wrap your head around a few concepts, it’s surprisingly easy to make it do most anything you want.

Except for one thing.

One thing I loved about the old R plotting functions was the ability to setup panels easily, and fill them with totally different graphs. Ye olde par(mfrow=c(2,2)) for a 2 x 2 grid, for example.

Enter gridExtra. Let the games begin.

What exactly do I mean? Let’s say I’m working with the soil chemistry data in the vegan package. First, maybe I just want to eyeball the historgrams of both the hummus depth and bare soil columns.

To do this in ggplot2, and with a single commend to put them in a single window, first you need to melt the data with reshape2 so that the column names are actually grouping variables, and then you can plot it. In the process, you create an additional data frame. And, you also have to do some extra specifying of scales, facets, etc. etc. Here’s the code and graphs.

library(ggplot2) #for plotting
library(reshape2) #for data reshaping

library(vegan) #for the data

#First, reshape the data so that Hummus depth and Bare soil are your grouping variables
vMelt<-melt(varechem, measure.vars=c("Humdepth", "Baresoil"))

#Now plot it.  Use fill to color things differently, facet_wrap to split this into two panels,
#And don't forget that the x scales are different - otherwise things look odd
qplot(value, data=vMelt, fill=variable)+facet_wrap( facets=~variable, scale="free_x")

This produces a nice graph. But, man, I had to think about reshaping things, and all of those scales? What if I just wanted to make two historgrams, and slam 'em together. This is where gridExtra is really nice. Through its function grid.arrange, you can make a multi-paneled graph using ggplot2 plots, lattice plots, and more (although, not regular R plots...I think).

So, let's see the same example, but with gridExtra.


#make two separate ggplot2 objects
humDist<-qplot(Humdepth, data=varechem, fill=I("red"))
bareDist<-qplot(Baresoil, data=varechem, fill=I("blue"))

#Now use grid.arrange to put them all into one figure.
#Note the use of ncol to specify two columns.  Things are nicely flexible here.
grid.arrange(humDist, bareDist, ncol=2)

"Oh, what a trivial problem," you may now be saying. But, if you want to, say, plot up 5 different correlations, or, say, the same scatterplot with 4 different model fits, this is a life-saver - if nothing else, in terms of readability of your code for later use.

This is all well and good, but, simple. Let's get into more fun multi-panel figures. Let's say we wanted a bivariate scatter-plot of Hummus Depth and Bare Soil with a linear fit. But, we also wanted to plot the histograms of each variable in adjacent panels. Oh, and flip the histogram of whatever is on the y-axis. Sexy, no? This is pretty straightforward. We can use the ggplot2 objects we already have, flip the co-ordinates on one, create a bivariate plot with a fit, and fill in one final panel with something blank.

#First, the correlation.  I'm using size just to make bigger points.  And then I'll add a smoothed fit.
corPlot<-qplot(Humdepth, Baresoil, data=varechem, size=I(3))+stat_smooth(method="lm")

#OK, we'll need a blank panel.  gridExtra can make all sorts of shapes, so, let's make a white box

#Now put it all together, but don't forget to flip the Baresoil histogram
grid.arrange(humDist, blankPanel, corPlot, bareDist, ncol=2)

Nice. Note the use of the grid.rect. gridExtra is loaded with all sorts of interesting ways to place shapes and other objects into your plots - including my favorite - grid.table, for when you don't want to deal with text.

a<-anova(lm(Baresoil ~ Humdepth, data=varechem))
grid.table(round(a, digits=3))

Or, heck, if you want to make that part of the above plot, use tableGrob instead of grid.table, and then slot it in where the blank panel is. The possibilities are pretty endless!

UPDATE: Be sure to see Karthik's comment below about alternatively using viewports. Quite flexible, and very nice, if a hair more complex.

More, Please!

Thanks to Jim, I’ve been using R in the shell more and more – in concert with vi. It’s been fun, and nice to integrate my workflows all on the server (although I haven’t had to do much graphing yet – I’m sure I’ll start kvetching then and return to a nice gui).

One thing that has frustrated me is that large dumps of output – say, a list composed of elements that are 100 lines each – just whip past me without an ability to scroll through more slowly. The page function helps somewhat, but, it gets wonky when looking at S4 objects. I wanted something more efficient that used – something more…well, like more! So i peered into page, and whipped up a more function that some of you may find useful. Of course, I’m sure that there is a simpler way, but, when all else fails…write it yourself!

more<-function(x, pager=getOption("pager")){
        #put everything into a local file using sink
	file <- tempfile("Rpage.")

        #use file.show as you can use the default R pager
	file.show(file, title = "", delete.file = TRUE, pager = pager)