ESA 2009 Day #3 (Tuesday) – “Big Models” Special Session
Posted 05 Aug 2009 / 0During Tuesday evening of ESA’s meeting I attended a really great special session entitled “Big Models in Ecology: The Good, the Bad, the Ugly Are All Possible Outcomes”. Organized by Vince Gutschick, the session began with a series of overviews by Gutschick, Lou Gross, Lara Prihodko, and Matthew Potts. It then opened up for a series of discussions. Unlike many technical special sessions that I have attended at ESA, this one was very welcoming, striking a great balance between valuable substantial discussion of important modeling issues and overall accessibility. Rather than assume that everyone present knew all the jargon being used, the various presenters made sure to explain the concepts behind different terms and model categories, and I think this really helped to make the discussion more inclusive and effective.
Gutschick defined the many ways that models can get “big”, with primary sources of “bigness” being high computational time, a large number of parameters (and the corresponding data needed to bound those parameters), complexity of form, analytical intractability, and poor coding. He described the “holy grail” of modeling as prediction, but also mentioned something near and dear to my heart: the use of models to “prune” the number of experiments by indicating which experiments have the greatest power to distinguish between competing models or mechanisms. He also brought up the twin issues of debugging, communicating, and sharing large models.
Gross spoke about tradeoffs in modeling based on a framework by Levins that describes a tradeoff between generality, precision, and realism. This triangular depiction of tradeoffs reminded me of a lot of other “choose two” scenarios where all three criteria are impossible to address simultaneously. He also spoke about optimization methods, including optimal control, sensitivity analysis, uncertainty analysis, and scenario analysis. The latter, which utilizes alternative scenarios to generate outcomes which can be compared, is often the most valuable when the model is meant to be present to policy- or decision-makers.
Prihodko spoke about her work on a highly-parameterized biosphere model and described some of the issues that can arise with so many parameters. The cost of making the measurements necessary to properly bound the parameters and the extra computational time needs to be weighed against the benefit of better fit. She introduced a term I was not familiar with (“equifinality”) for a problem I am familiar with: sometimes adding extra parameters makes the model completely insensitive to parameter changes, which means your model is really working a lot harder than it needs to. Determining which parameters lead to equifinality is not easy, as the non-linear nature of ecological models (which is really what makes them interesting) requires that all combinations of parameters be explore to determine the sensitivity of each parameter. Prihodko left us with an awesome quote: “Matching observations exactly means your model is wrong”. Hear, hear!
The final informal comments were made by Potts, who talked about how to effectively build models for end users. He described a set of wants and needs for end users. End users want answers that are phrased in the language they speak, which sometimes means turning risk data into concrete probability scenarios or complex outcomes into monetary terms. They also want documentation with a hierarchy of detail, which allows them to understand the model at various depths. In the end, they often want modelers to boil down the message of their modeling efforts into a simple rule of thumb. End users need to be involved in the development of the model to ensure that the end product is in fact the tool they require, and during this process the modeler should strive to not only inform but also educate the user. Often a menu of models ranging from simple to complex is the best means of allowing the end user to find the appropriate model for their needs.
After the informal comments the group broke into discussion and other participants were able to make comments and ask questions. We talked a little bit about the history of biological modeling and the various historical periods the field has gone through. The International Biological Program was mentioned; I had never heard of this era, but from what I could gather it was a period in which it was imagined that all components of biological systems could be modeled from the ground up. This approach was contrasted with that of Robert MacArthur, who took a more hypothesis-driven approach to modeling. What’s emerged clearly is that despite great advances in computational power, it is still quite possible to write a fairly simple model that overwhelms even the best computers. A participant described a problem that she had with her own model of genetic evolution which took six months to run, and some of the panelists suggested that she look at refining the model so that it would run more quickly, perhaps by building it up stepwise. It was also mentioned that anyone can apply to have their work run on the Teragrid supercomputers; kind of like the big supercolliders, these supercomputers perform research that is proposal-driven.
Expanding on what Potts had said earlier, I brought up the issue of properly communicating the structure of the model in the methods section of published papers. I brought up the analogy with molecular biology, where new techniques for generating data are being constantly innovated. While the innovators of such new techniques may keep them proprietary for long enough to build up new data, once that new data is published they also need to share the techniques used to generate that data. The requirement that empirical science be replicable mandates that methods be properly shared. But in modeling, it is a little more fuzzy what “replication of results” and “sharing of techniques” means. An advantage of math-based modeling is that its methods are generally pretty easy to communicate (but the audience who speaks this language is limited).
Computer-based modeling is not so easy to communicate, and there’s a question about where the method ends and the work begins. For instance, lab procedures need to clearly lay out how to perform a new technique, but there’s still the lab work to be done (and it can be done wrong despite a good methods description). But with modeling, there’s a question about how much description is needed to properly replicate the model. Too little and the results are not replicable because the original model and replicated model are not actually doing the same thing. This makes it tempting to suggest that modelers ought to “surrender their code” for others to use. Although there are sometimes reasons to share code, I don’t think this really solves the problem of replication. Simply running someone else’s code just confirms that the data they described is actually the data they got. There’s no potential for finding bugs or fully understanding how the model was built. In keeping with the analogy to empirical science, sharing code is more like inviting someone to your lab to use your equipment. It’s a way of replicating the results, but in a weak manner; the more reliable replication involves a “ground up” approach driven by following the protocol listed in the methods section. If you can replicate the results of a model by building it yourself, you can be assured that the published results are reliable.
(As an aside: I spoke to Volker Grimm during another session and he told me about a really interesting research project which he is conducting to test how replicable published individual-based models are – I eagerly await this paper!).
There was some discussion of how to judge a model, and several people suggested that a key question to answer was: “Can I make observations that inform my parameters?”. A second key question is: “Will my model output actually answer my question?”. These may seem like pretty obvious things to ask yourself before you make a model, but they are worth emphasizing, because a lot of models are not actually able to say “yes” in response to these questions.
Our last conversation centered around the relationship between modelers and end-users. An end-user looking to find a model for their particular ecological problem faces a daunting task of identifying the right model. Many models may seem “right”, but how does one decide if a given model is reliable? Experienced participants indicated that strong models are vetted: they have been extensively cited and used by others, which should give potential end-users fairly strong confidence in the behavior of the model. Of course just because a model is obscure that doesn’t suggest that it is useless. As modelers it is important that we provide appropriate metadata to accompany our model so that end-users can sort out whether or not the model meets their needs.
Thanks to the organizers of this session – it was very valuable!
I was able to attend this meeting thanks to funding from the Pratt Institute Mellon Fund for Faculty Travel. A Major Post, Computing, Conferences, Ecological Modeling, Ecological Society of America, Individual-based Models, Mathematics, Talks & Seminars