|
|||||||||||||||||||||||
|
MiscellanyTwo short notes on statisticsby Jackie YipWith advances in computer technology the ordination
technique NMDS has become popular among community ecologists. NMDS was
seldom used in the past because it takes too much computation time (it
still does! Ask Alan Leung). Clarke
& Warwick (1994) recommended NMDS as one of the best ordination
techniques available, for it makes no assumption about the normality or
type of response, and allows greater flexibility in the definition and
conversion of dissimilarity measures. There are drawbacks, of course,
but I am not going to discuss them here. A quick survey in the Department reveals that
the terrestrial ecologists use PC-ORD for NMDS, and the marine ecologists
prefer PRIMER. PC-ORD was developed by Bruce McCune, who is a bryologist,
and PRIMER by Clarke & Warwick, who are marine biologists. Besides
both providing NMDS options, the two packages provide different multivariate
procedures that do more or less the same things in different ways (e.g.
MRPP vs ANOSIM, Correlation to second matrix vs BIO-ENV). This is fine,
but it puzzled me when I tried to compare the results obtained from NMDS
with both packages. I was doing this just out of curiosity, and it turned
out to be a nightmare. Using the same data on PC-ORD4 and PRIMER5,
I got a very different configuration of ordination and very different
stress values: 0.36 by the former and 0.13 by the latter. The dataset
I used was a matrix of 323 morphospecies of Coleoptera in 118 sites, which
had lots of zeros and ties (i.e. very similar or dissimilar sites). I
used an untransformed Sorensen measure and 20 runs/restarts to find 2-dimensional
solutions in both trials. Bruce McCune, answering my queries, said it
was difficult to compare between packages because stopping criteria or
measures of stress values may not be the same. From the manuals I found
that both packages measured stress values using Kruskal’s stress formula
1 (Kruskal, 1964). There
might be errors in the calculation, but I would never know it because
the calculation for NMDS is so complicated. So the mystery remains. Another possible reason for the discrepancy
is the presence of a large number of ties, which creates randomness in
the ordination process. In fact, Legendre
& Legendre (1998, p.447) said that ‘computer programs may differ
in the way they handle ties. This may cause major discrepancies between
reported stress values corresponding to the final solutions, although
the final configurations of points are usually very similar from program
to program, except when different programs identify distinct final solutions
having very similar stress values’. I posted this question on the list server ORDNEWS.
Chris Howden and Hugh Jones responded, and they suggested different starting
configurations and measures of stress values being the possible reasons.
I later got the confirmation from Bob Clarke that PRIMER follows strictly
Kruskal’s stress formula 1 in the calculation of stress values. This was intended to be a short note, I assure
you. But just as my casual exploration turned out to be a long wade through
muddy water, it has taken up one whole page without giving a satisfactory
result. I can make no conclusion here, other than to remind you again
that the two packages may give you different answers on running NMDS. 2.
Avoiding the Pitfalls of Multiple Testing We sometimes need to test several hypotheses
using data collected during an experiment or a survey. In testing any
single hypothesis, we normally specify an acceptable maximum probability
of rejecting the null hypothesis when it is true (i.e. Type I error),
but when many hypotheses are tested, the probability of
committing at least some Type I errors increases. This may result in spurious ‘significant’ relationships that are explained
by chance only. The reduction in the power of significance testing can be avoided by replacing
multiple tests with other procedures, such as multiple comparisons of
differences by SNK tests instead of multiple t-tests. In situations where multiple tests are not avoidable,
Bonferroni correction is
usually applied to avoid committing Type I errors
in the experiment. You do not have to feel intimidated by the mathematics
that follows – it is not as complicated as it seems. If a specific hypothesis Hj is rejected when Pj £ /n, then the Bonferroni inequality, ensures that the probability of rejecting at least one hypothesis when
all are true is no greater than ,
the multiple level of significance (i.e. experiment-wise probability of Type I error), with n being the number of tests. The Bonferroni-corrected maximum error for
a single test is found by simply dividing the
value by n. A criticism of the classic Bonferroni test procedure is that it is too
conservative for highly correlated test statistics, hence resulting in
a high probability of Type II errors, i.e. failure to reject false null
hypotheses. Holm (1979)
improved the procedure by ranking the P-values in ascending order, and rejecting the hypotheses one at a time, with the level of significance
gradually decreased. Many
methods have been proposed (e.g. Simes,
1986; Hommel, 1988) to improve the power of the Bonferroni test
procedure, but there is, as yet, no consensus on the best method (Shaffer, 1995). As for the value of , Miller (1981) proposed a flexible value as a viable method of maintaining power in adjustments for multiple tests. Chandler (1995) suggested that values of 10-15% are appropriate, especially for large numbers of tests. Ecologists do not seem to be as cautious about
the pitfalls of multiple tests as do the clinical and medical scientists,
who have been using Bonferroni corrections for decades. Laurance et al. (1999)
provides one of the few examples in the ecology literature. If you are
going to make multiple tests in your experiments this is something to
watch out for. Bibliography Chandler,
R.C. (1995). Practical considerations in the use of simultaneous inference
for multiple tests. Animal Behaviour 49: 524-527. Clarke, K.R. (1993). Non-parametric multivariate
analyses of changes in community structure. Australian Journal of Ecology
18: 117-143. Clarke, K.R. & Warwick, R.M. (1994). Changes
in Marine Communities: An Approach to Statistical Analysis and Interpretation.
Plymouth, Plymouth Marine Laboratory. Holm, S. (1979). A simple sequentially rejective
multiple test procedure. Scandinavian Journal of Statistics 6: 65-70. Hommel, G. (1988). A stagewise rejective multiple
test procedure based on a modified Bonferroni test. Biometrika 75: 383-386. Kruskal, J.B. (1964). Nonmetric multidimensional
scaling: A numerical method. Psychometrika 29: 115-129. Laurance, W.F., Fearnside, P.M., Laurance,
S.G., Delamonica, P., Lovejoy, T.E., Rankin-de Merona, J.M., Chambers
J.Q. & Gascon, C. (1999). Relationship between soils and Amazon forest
biomass: a landscape-scale study. Forest Ecology and Management 118: 127-138. Legendre, P. & Legendre, L. (1998). Numerical
Ecology. Amsterdam, Elsevier Science B.V. Miller, R.G. (1981). Simultaneous Statistical
Inference. New York, Springer. Shaffer, J.P. (1995). Multiple hypothesis testing.
Annual Review of Psychology 46: 561-584. Simes, R.J. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika 73: 751-754.
|
||||||||||||||||||||||
|