Skip Navigation

This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow E-letters: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when E-letters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Causton, D.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Causton, D.
Agricola
Right arrow Articles by Causton, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Annals of Botany 90: 776-777, 2002
© 2002 Annals of Botany Company

Grafen, A., Hails, R. Modern statistics for the life sciences

David Causton

Modern statistics for the life sciences.
Grafen A, Hails R. 2002.
Oxford: Oxford University Press.
£22.99 (softback). xvi + 351 pp.

Once upon a time it was said (correctly) that analysis of variance (ANOVA) in its many forms was simply a special case of multiple linear regression. However, for several decades now, statisticians have recognized that even multiple regression is just part of a much wider class of statistical models known as generalized linear models. Not only do such models include all the different types of ANOVA and linear regression (which itself includes the regression analysis of certain curved relationships, such as polynomials) methods, but they also include the analysis of covariance, orthogonal comparisons (contrasts), non-orthogonal (unbalanced) analyses of variance, many multivariate methods and several other statistical methods not usually covered in the more elementary textbooks.

Several books have already appeared on the topic of generalized linear models, but these have been aimed at professional statisticians and users of statistical methods who have a more mathematical turn of mind. This is the first book, to my knowledge, that attempts to present the commoner statistical methods listed above within this general framework to biologists, and so is very welcome. Because the authors are dealing with the more elementary sub-set of the above topics, they prefer to present their subject as ‘general linear models’ as opposed to ‘generalized linear models’. The book is mid-way between a ‘cookbook’ and one that deals with the underlying statistical theory; where it is necessary to explain some of the theory it is done graphically, and mostly well done at that.

The book does not purport to cover the basics of statistical description and inference: students would need to approach this book with that knowledge obtained from elsewhere. So Chapter 1 launches straight into the workings of one-way ANOVA, and Chapter 2 deals with the basics of straight line regression. The material of these two chapters is presented as in the more conventional kind of statistics textbook. There are very helpful graphical explanations on the rationale of the methods, although I found the further sections involving multidimensional space (entitled respectively ‘The geometrical approach for an ANOVA’ and ‘The geometrical view of regression’) confusing and, in my view, they were unnecessary for understanding the rest of the book. The analysis of residuals is highlighted, as this features strongly throughout the book. The use of bivariate data examples in Chapter 2 is unfortunate as it leads to complications that could well be done without. Univariate examples (x being levels of some controllable experimental treatment, y being a biological measurement) would have been better. However, the differences between the two kinds of data do need to explained, and the authors have done so.

Having set the scene in the first two chapters in terms of ANOVA and regression as separate entities, Chapter 3 introduces the main theme of the book—general linear models. Chapter 4 is entitled ‘Using more than one explanatory variable’ and covers, among other things, the all-important distinction between ‘sequential’ and ‘adjusted’ sums of squares. Chapter 5 introduces the subject of experimental design. Chapter 6 deals with combining continuous and categorical explanatory variables, which include what has long been known as the analysis of covariance, and Chapter 7 is a good introduction to factorial experiments and interactions.

Chapters 8–11 are a major strength of the book. The first two thoroughly discuss the assumptions underlying general linear models, namely that the data items should be independent, that the different groups of data (treatments) should be of homogeneous variance, that the errors should be normally distributed and that the treatment effects in the experiment giving rise to the data should be additive in nature. The second pair of chapters gives very practical advice on the choice of a particular method of analysis, appropriately entitled ‘Model selection’. Chapter 12 contrasts the type of experiment in which the treatments are ‘fixed’ effects as opposed to ‘random’ effects, together with the ideas of different levels of replication and nested experimental designs. These concepts are very important, for without an awareness of such things an experimenter can very easily fall into the trap of ‘pseudoreplication’ when designing an experiment.

Chapter 13 is a curious inclusion in this book as it deals with the analysis of categorical data, involving {chi}2 methods and the Poisson distribution—topics that do not come under the umbrella of generalized linear models. On the other hand, the chapter does highlight ways in which categorical data can be analysed within this framework. Finally, very briefly, Chapter 14 indicates other methods that fall within generalized linear models, including the problem of repeated measurements and also multivariate analysis. The chapter serves as a pointer to these topics: it is too short to do anything else.

It is always easy to be critical. There are generally places in a book like this where the explanation seems inadequate. In any case, what is confusing to one person may be quite clear to another. Fortunately, such instances are few and far between in what is overall a good presentation. However, there are two general points of criticism that I feel need to be made. The first concerns within-treatment comparisons. When a set of treatments (maybe levels of one factor where there is no interaction with other factors) are shown by an F-test to contain significant differences, a biologist usually wants to know which treatments are significantly different from which others. This book is unsatisfactory here. A broad, qualitative, approach is indicated using confidence intervals of the different treatment means which can only provide a vague indication of possible significant differences within a set of treatments. The more usual multiple range tests are not even mentioned. True, they are not a linear model in any sense, and are not ideal, but they are very much better than merely looking at confidence intervals. More surprising is the fact that the authors did not deal with the method of orthogonal comparisons—much better, although more restricted in their use—because this method does come within the umbrella of general linear models.

My second criticism relates to the statement (p. 85) that if an experiment involves plants growing in a glasshouse that pots need to be shuffled around randomly every day in order to smooth out small-scale differences in the environment along a glasshouse bench. This is a procedure that seems to have gained universal acceptance in the design of such experiments. To my mind this is not only a waste of time, but can also be misleading if a plant exposed to one particular treatment reacts in one way to a particular combination of environmental factors on one day in its life cycle, but would react differently the next day—a perfectly feasible assumption since the physiology of a plant changes continuously during its life. If this is the case, moving plants around all the time is just adding to the confusion. Much better to design an experiment carefully in which blocking (the placing of plants comprising different treatments on the bench) is skilfully done.

In the ‘blurb’ on the back cover, it is stated that the book ‘is aimed at undergraduate students in the life sciences, and will also be invaluable for many graduate students’. My view is that this statement should be the other way round. Most, if not all, undergraduate biology courses have insufficient time, and biology undergraduates have insufficient interest in statistics, to be able to work through the large amount of material presented. Don’t forget that this book is, in effect, a second course in statistics: the basics of statistical science are not covered except briefly in an appendix. For those working for research degrees, the situation is, of course, different. It is these students along with professional biologists that I envisage will benefit from this book.

This leads on to my assessment of how likely it is that the generalized linear model will become the norm for the statistical analysis of biological data. After 40 years as a professional scientist, I am pessimistic. Within my own area of experience in plant sciences, although there are many conspicuous exceptions, it seems to me that in general the quality of the approach to statistical analysis of data has actually declined in that time; it certainly does not seem to have increased, at least in so far as the publication of work is concerned. So often one finds, for example, that although a proper factorial experiment has been designed, the results are presented simply as means and standard errors; interactions between factors are ignored, and the overall structure of the experiment is forgotten. It is often not appreciated that a well-presented statistical analysis can help to summarize the main outcomes of an experiment as well as to root out the significant effects.

If the statistical analysis of data by generalized linear models catches on among life scientists, then this book will be a pioneer and may even become a classic; if, on the other hand, most biologists continue to avoid the experimental design and analysis side of their work as much as possible, the book, unfortunately, will become a dead end. I hope not.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow E-letters: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when E-letters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Causton, D.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Causton, D.
Agricola
Right arrow Articles by Causton, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?