stepacross.Rd
Function stepacross
tries to replace dissimilarities with
shortest paths stepping across intermediate
sites while regarding dissimilarities above a threshold as missing
data (NA
). With path = "shortest"
this is the flexible shortest
path (Williamson 1978, Bradfield & Kenkel 1987),
and with path = "extended"
an
approximation known as extended dissimilarities (De'ath 1999).
The use of stepacross
should improve the ordination with high
beta diversity, when there are many sites with no species in common.
stepacross(dis, path = "shortest", toolong = 1, trace = TRUE, ...)
dis | Dissimilarity data inheriting from class |
---|---|
path | The method of stepping across (partial match)
Alternative |
toolong | Shortest dissimilarity regarded as |
trace | Trace the calculations. |
... | Other parameters (ignored). |
Williamson (1978) suggested using flexible shortest paths to estimate
dissimilarities between sites which have nothing in common, or no shared
species. With path = "shortest"
function stepacross
replaces dissimilarities that are
toolong
or longer with NA
, and tries to find shortest
paths between all sites using remaining dissimilarities. Several
dissimilarity indices are semi-metric which means that they do not
obey the triangle inequality \(d_{ij} \leq d_{ik} + d_{kj}\), and shortest path algorithm can replace these
dissimilarities as well, even when they are shorter than
toolong
.
De'ath (1999) suggested a simplified method known as extended
dissimilarities, which are calculated with path = "extended"
.
In this method, dissimilarities that are
toolong
or longer are first made NA
, and then the function
tries to replace these NA
dissimilarities with a path through
single stepping stone points. If not all NA
could be
replaced with one pass, the function will make new passes with updated
dissimilarities as long as
all NA
are replaced with extended dissimilarities. This mean
that in the second and further passes, the remaining NA
dissimilarities are allowed to have more than one stepping stone site,
but previously replaced dissimilarities are not updated. Further, the
function does not consider dissimilarities shorter than toolong
,
although some of these could be replaced with a shorter path in
semi-metric indices, and used as a part of other paths. In optimal
cases, the extended dissimilarities are equal to shortest paths, but
they may be longer.
As an alternative to defining too long dissimilarities with parameter
toolong
, the input dissimilarities can contain NA
s. If
toolong
is zero or negative, the function does not make any
dissimilarities into NA
. If there are no NA
s in the
input and toolong = 0
, path = "shortest"
will find shorter paths for semi-metric indices, and path = "extended"
will do nothing. Function no.shared
can be
used to set dissimilarities to NA
.
If the data are disconnected or there is no path between all points,
the result will
contain NA
s and a warning is issued. Several methods cannot
handle NA
dissimilarities, and this warning should be taken
seriously. Function distconnected
can be used to find
connected groups and remove rare outlier observations or groups of
observations.
Alternative path = "shortest"
uses Dijkstra's method for
finding flexible shortest paths, implemented as priority-first search
for dense graphs (Sedgewick 1990). Alternative path = "extended"
follows De'ath (1999), but implementation is simpler
than in his code.
Function returns an object of class dist
with extended
dissimilarities (see functions vegdist
and
dist
).
The value of path
is appended to the method
attribute.
Bradfield, G.E. & Kenkel, N.C. (1987). Nonlinear ordination using flexible shortest path adjustment of ecological distances. Ecology 68, 750--753.
De'ath, G. (1999). Extended dissimilarity: a method of robust estimation of ecological distances from high beta diversity data. Plant Ecol. 144, 191--199.
Sedgewick, R. (1990). Algorithms in C. Addison Wesley.
Williamson, M.H. (1978). The ordination of incidence data. J. Ecol. 66, 911-920.
The function changes the original dissimilarities, and not all like this. It may be best to use the function only when you really must: extremely high beta diversity where a large proportion of dissimilarities are at their upper limit (no species in common).
Semi-metric indices vary in their degree of violating the triangle
inequality. Morisita and Horn--Morisita indices of
vegdist
may be very strongly semi-metric, and shortest
paths can change these indices very much. Mountford index violates
basic rules of dissimilarities: non-identical sites have zero
dissimilarity if species composition of the poorer site is a subset of
the richer. With Mountford index, you can find three sites \(i, j,
k\) so that \(d_{ik} = 0\) and \(d_{jk} = 0\), but \(d_{ij} > 0\). The results of stepacross
on Mountford index can be very weird. If stepacross
is needed,
it is best to try to use it with more metric indices only.
Function distconnected
can find connected groups in
disconnected data, and function no.shared
can be used to
set dissimilarities as NA
. See swan
for an
alternative approach. Function stepacross
is an essential
component in isomap
and cophenetic.spantree
.
# There are no data sets with high beta diversity in vegan, but this # should give an idea. data(dune) dis <- vegdist(dune) edis <- stepacross(dis)#> Too long or NA distances: 5 out of 190 (2.6%) #> Stepping across 190 dissimilarities...## Manhattan distance have no fixed upper limit. dis <- vegdist(dune, "manhattan") is.na(dis) <- no.shared(dune) dis <- stepacross(dis, toolong=0)#> Too long or NA distances: 5 out of 190 (2.6%) #> Stepping across 190 dissimilarities...