Annals of Botany 90: 777-778, 2002
© 2002 Annals of Botany Company
Shipley, B. Cause and correlation in Biology
Cause and correlation in Biology.
Shipley B. 2000.
Cambridge: Cambridge University Press.
£45.00 (hardback). 317 pp.
As indicated by the title, this book seeks to address the relationship between correlation and causation, with application to biological topics. The book begins with a statement of three objectives: (1) to persuade biologists that it is possible to infer causation with observational (non-experimental) data; (2) to describe certain methods that can assist in this process; and (3) to illustrate the methods presented using biological examples.
A philosophical discussion of causation is potentially confusing and can often evoke debate, which necessitates some carefully crafted effort at the beginning of the book to address the meaning of causation. Fortunately for all involved, the methods presented do not depend on any one particular definition of causality, but instead, on certain properties of causal systems, most of which are not subject to much disagreement. To some extent, the practical result of the introductory material is to get the reader accustomed to the word causation, which requires some effort because the science of statistics typically trains us to avoid its use.
In the first chapter of the book (Preliminaries), the author tackles the task of convincing the reader that there is a substantial need for methods of applying causal inference in the absence of randomized or controlled experiments. To a great extent this is a fairly easy task since virtually everyone, scientist and non-scientist alike, develops causal interpretations of system behaviour as part of the routine business of living. Nonetheless, because this book is about both the philosophy and practice of inferring causal relationships, there is value in pointing out the limits that exist to applying experimental approaches to a great variety of important questions.
The second chapter tackles a topic that, at least formally, is uncharted territory for most biologists: the inter-relationship between causality and statistical relationships. The language of directed graphs (a.k.a. path models) is dealt with first. Because of the recent emergence of graphical modelling as a distinct statistical methodology, it is unfortunate that the author did not provide a link to the rather extensive literature on this topic (cf. Whittaker, 1990). Most of the chapter deals with the concept of d-separation (short for directed separation), which formally defines the logical relationships that can exist among variables in a directed graph. This topic is necessarily dry and even tedious because it is an exhaustive exercise in the logic of all possible relationships. However, the author does an excellent job of trying to make this presentation as interesting and informative as possiblenot an easy task.
In the third chapter, the author first gives an interesting discussion of the early history of path models and the contrasting viewpoint of experimental statistics that has since prevailed in the biological sciences. The bulk of the chapter, however, is devoted to a presentation of the authors own statistical methodology for testing the validity of path models. Here the author makes a sincere effort to present in understandable terms an alternative method to the usual maximum likelihood approach for testing path models.
Chapters 47 present the fundamentals of the methodology of structural equation modelling. Starting with path models and maximum likelihood estimation, the author proceeds through a discussion of measurement error and latent variables, model fit assessment and multi-level models. These topics are covered for completeness and in order to demonstrate their applicability to biological examples. Currently, there are literally dozens of general treatments of structural equations. Yet, since the method is virtually unknown among biologists, there is justification in including this material in the book. In these chapters the author shows both the essence of structural equation modelling and the very broad range of questions it can address. In the final chapter the author tackles the somewhat specialized field of search strategies for exploring hypothesis space. Except for the authors own work, this method has not been applied before to biological systems and deserves further scrutiny.
To a significant extent, this book seeks to swing the historical pendulum of biological thought away from a suspicion of linking correlation and causation towards an embrace of causal interpretation of correlational data. This is not an easy task, and it can be said that the author has done a fairly good job of hitting the level of sophistication needed to make the case forcefully and yet coherently. The authors dedication to the topic of path model analysis is evidenced by his own development of d-sep test methods as an alternative to traditional maximum likelihood methods. Time will tell whether this new approach will prove to be adopted by practitioners.
Overall, I found this book to be filled with useful presentations. It provides a good device for exposing biologists to many statistical methods that have gained widespread acceptance in other fields but that have escaped the attentions of biometricians. Like the author, I share an enthusiasm for the potential applications of these methods for exploring the natural world.
It is perhaps useful to the potential reader to point out that there are alternative ways of looking at the value of the methods discussed in this book. The author has chosen to emphasize the search for causality as a motivating force behind the application of path model methods. I, on the other hand, have used many of the same methods but for a different reason: the belief that multivariate analyses yield more insights about complex systems than do univariate analyses. Actually, multivariate hypotheses formulated as path models can be evaluated with experimental data (e.g. Gough and Grace, 1999) as well as with non-experimental data. Furthermore, the philosophical requirements for establishing causality are precisely the same for a simple bivariate regression as for a complex path model, namely directional dependence. What is different in the two cases (aside from the complexity of showing that data match the model) is the degree of insight gained about the workings of the system. Thus, it is possible to value path modelling methods strictly because they provide hypotheses that more closely match the complexity of natural systems rather than based on any belief that the statistical methods contribute any insights into the question of causal relationships.
Alternative perspectives aside, I highly recommend the book by Shipley for those interested in multivariate approaches to biology. The material is ultimately quite important, the presentation is unusually clear and the exposure to methods developed in other fields of science is valuable. I would also urge those with a serious interest in the subject to keep an eye out for the forthcoming book by Pugesek et al. (due February 2003) for an extended description of the concepts and applications of structural equation modelling to biological systems. Together, these books are likely to finally break down the barriers that have isolated biologists from path modelling methods for so long.
LITERATURE CITED
-
Gough L, Grace JB. 1999. Predicting effects of environmental change on plant species density: experimental evaluations in a coastal wetland. Ecology 80: 882890.
Pugesek B, Tomer A, von Eye A, eds. 2003. Structural equation modeling. Cambridge: Cambridge University Press (in press).
Whittaker J. 1990. Graphical models in applied multivariate statistics. New York: Wiley.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||