The function computes the dissimilarity matrix of a dataset multiple times using vegdist while randomly subsampling the dataset each time. All of the subsampled iterations are then averaged (mean) to provide a distance matrix that represents the average of multiple subsampling iterations. This emulates the behavior of the distance matrix calculator within the Mothur microbial ecology toolkit.

avgdist(x, sample, distfun = vegdist, meanfun = mean,
    transf = NULL, iterations = 100, dmethod = "bray", ...)

Arguments

x

Community data matrix.

sample

The subsampling depth to be used in each iteration. Samples that do not meet this threshold will be removed from the analysis, and their identify returned to the user in stdout.

distfun

The dissimilarity matrix function to be used. Default is the vegan vegdist

meanfun

The calculation to use for the average (mean or median).

transf

Option for transforming the count data before calculating the distance matrix. Any base transformation option can be used (e.g. sqrt)

iterations

The number of random iterations to perform before averaging. Default is 100 iterations.

dmethod

Dissimilarity index to be used with the specified dissimilarity matrix function. Default is Bray-Curtis

...

Any additional arguments to add to the distance function or mean/median function specified.

Note

The function builds on the function rrarefy and and additional distance matrix function (e.g. vegdist) to add more meaningful representations of distances among randomly subsampled datasets by presenting the average of multiple random iterations. This function runs using the vegdist. This functionality has been utilized in the Mothur standalone microbial ecology toolkit here.

See also

This function utilizes the vegdist and rrarefy functions.

Examples

# Import an example count dataset data(BCI) # Test the base functionality mean.avg.dist <- avgdist(BCI, sample = 50, iterations = 10) # Test the transformation function mean.avg.dist.t <- avgdist(BCI, sample = 50, iterations = 10, transf = sqrt) # Test the median functionality median.avg.dist <- avgdist(BCI, sample = 50, iterations = 10, meanfun = median) # Print the resulting tables head(as.matrix(mean.avg.dist))
#> 1 2 3 4 5 6 7 8 9 10 11 12 13 #> 1 0.000 0.564 0.590 0.606 0.636 0.630 0.614 0.602 0.634 0.598 0.620 0.648 0.746 #> 2 0.564 0.000 0.576 0.570 0.652 0.590 0.554 0.564 0.586 0.594 0.582 0.568 0.672 #> 3 0.590 0.576 0.000 0.576 0.580 0.592 0.600 0.562 0.576 0.588 0.594 0.624 0.716 #> 4 0.606 0.570 0.576 0.000 0.614 0.624 0.620 0.572 0.594 0.550 0.602 0.588 0.694 #> 5 0.636 0.652 0.580 0.614 0.000 0.590 0.654 0.628 0.666 0.650 0.676 0.668 0.796 #> 6 0.630 0.590 0.592 0.624 0.590 0.000 0.572 0.596 0.644 0.660 0.576 0.590 0.738 #> 14 15 16 17 18 19 20 21 22 23 24 25 26 #> 1 0.636 0.638 0.636 0.666 0.738 0.644 0.626 0.632 0.682 0.732 0.652 0.638 0.666 #> 2 0.592 0.600 0.594 0.606 0.678 0.594 0.624 0.642 0.618 0.666 0.604 0.626 0.640 #> 3 0.634 0.608 0.584 0.632 0.738 0.634 0.632 0.626 0.636 0.696 0.604 0.602 0.668 #> 4 0.612 0.620 0.590 0.630 0.692 0.612 0.596 0.626 0.608 0.664 0.604 0.640 0.664 #> 5 0.660 0.654 0.656 0.736 0.788 0.708 0.642 0.654 0.734 0.726 0.658 0.686 0.666 #> 6 0.644 0.690 0.624 0.664 0.708 0.634 0.684 0.642 0.624 0.684 0.598 0.680 0.680 #> 27 28 29 30 31 32 33 34 35 36 37 38 39 #> 1 0.660 0.668 0.636 0.680 0.642 0.678 0.694 0.708 0.782 0.660 0.658 0.716 0.702 #> 2 0.648 0.618 0.598 0.618 0.636 0.638 0.624 0.654 0.744 0.648 0.608 0.634 0.654 #> 3 0.658 0.634 0.644 0.686 0.646 0.648 0.668 0.700 0.800 0.636 0.632 0.658 0.682 #> 4 0.654 0.642 0.578 0.618 0.634 0.624 0.644 0.654 0.770 0.672 0.598 0.638 0.642 #> 5 0.722 0.724 0.718 0.738 0.654 0.704 0.752 0.754 0.830 0.678 0.706 0.760 0.768 #> 6 0.674 0.666 0.654 0.706 0.656 0.676 0.682 0.688 0.812 0.694 0.662 0.686 0.712 #> 40 41 42 43 44 45 46 47 48 49 50 #> 1 0.700 0.674 0.642 0.674 0.680 0.704 0.704 0.634 0.676 0.670 0.658 #> 2 0.662 0.628 0.628 0.666 0.656 0.696 0.680 0.624 0.704 0.662 0.654 #> 3 0.702 0.680 0.618 0.660 0.674 0.668 0.742 0.716 0.740 0.710 0.686 #> 4 0.648 0.614 0.622 0.648 0.684 0.674 0.706 0.644 0.680 0.694 0.634 #> 5 0.802 0.702 0.660 0.664 0.694 0.660 0.798 0.744 0.724 0.704 0.682 #> 6 0.750 0.690 0.670 0.662 0.698 0.680 0.742 0.706 0.732 0.700 0.704
head(as.matrix(mean.avg.dist.t))
#> 1 2 3 4 5 6 7 #> 1 0.0000000 0.5028200 0.5219613 0.5764606 0.5828140 0.5550356 0.5335550 #> 2 0.5028200 0.0000000 0.5077907 0.5448071 0.5681954 0.5159470 0.4892767 #> 3 0.5219613 0.5077907 0.0000000 0.5486345 0.5508163 0.5430994 0.5688710 #> 4 0.5764606 0.5448071 0.5486345 0.0000000 0.5267004 0.5798476 0.5773457 #> 5 0.5828140 0.5681954 0.5508163 0.5267004 0.0000000 0.5450754 0.6023634 #> 6 0.5550356 0.5159470 0.5430994 0.5798476 0.5450754 0.0000000 0.5308521 #> 8 9 10 11 12 13 14 #> 1 0.5674039 0.5946548 0.5843245 0.5774246 0.5973418 0.6952174 0.6073351 #> 2 0.5363502 0.5730812 0.5303120 0.5319836 0.5302321 0.6777351 0.5506912 #> 3 0.5485972 0.5597924 0.5011687 0.5481324 0.5811618 0.7405942 0.6021113 #> 4 0.5727434 0.5973259 0.5698309 0.5722804 0.5809131 0.6769620 0.5877928 #> 5 0.5929461 0.5968705 0.5714737 0.5913502 0.6323974 0.7242649 0.5928847 #> 6 0.5645036 0.5795560 0.5958723 0.5551392 0.5292392 0.6626947 0.6147464 #> 15 16 17 18 19 20 21 #> 1 0.5935674 0.5945517 0.5916678 0.6982384 0.6227433 0.6190097 0.6113178 #> 2 0.5679615 0.5482858 0.5647833 0.6676820 0.5338429 0.5747653 0.5816851 #> 3 0.5528166 0.5551578 0.6311249 0.7328965 0.5973205 0.5813709 0.5898606 #> 4 0.5678623 0.5921439 0.6082597 0.7090662 0.6295767 0.6248633 0.6458734 #> 5 0.6051044 0.6033819 0.6321158 0.7239649 0.6124182 0.6089943 0.5900354 #> 6 0.6220166 0.5652719 0.6148738 0.6345261 0.5824451 0.6322334 0.6144530 #> 22 23 24 25 26 27 28 #> 1 0.6034984 0.6562424 0.6135930 0.5861718 0.5993815 0.6140342 0.6109649 #> 2 0.5336577 0.6207790 0.5761819 0.5660150 0.5922172 0.5773536 0.5696336 #> 3 0.6151818 0.6956229 0.6144037 0.6023405 0.6225773 0.6250093 0.6112027 #> 4 0.6172223 0.6417007 0.6350858 0.6076799 0.6598826 0.6255800 0.6064441 #> 5 0.6556525 0.6865921 0.6266726 0.6153717 0.6489225 0.6586618 0.6225078 #> 6 0.5707913 0.6646375 0.5820917 0.6201667 0.6244489 0.6207207 0.5720171 #> 29 30 31 32 33 34 35 #> 1 0.6242210 0.6262671 0.6099920 0.6616868 0.6053339 0.6547968 0.7178460 #> 2 0.5597087 0.6143788 0.5815442 0.6239547 0.5511059 0.6371140 0.6795926 #> 3 0.6016689 0.6236998 0.6041004 0.6329530 0.5938776 0.6679043 0.7496374 #> 4 0.6056577 0.6088220 0.6314919 0.6312576 0.6190221 0.6725577 0.7105919 #> 5 0.6635018 0.6374305 0.6280633 0.6649972 0.6322495 0.6783348 0.7473813 #> 6 0.5751149 0.6346228 0.6291031 0.6341624 0.6039148 0.6389972 0.7055583 #> 36 37 38 39 40 41 42 #> 1 0.6134713 0.6243661 0.6296093 0.6609022 0.7030172 0.6516861 0.6062715 #> 2 0.5629283 0.5565130 0.6093789 0.6151173 0.6509625 0.5895404 0.5746410 #> 3 0.6011882 0.6092780 0.6463414 0.6449418 0.6889053 0.6531873 0.5852579 #> 4 0.6383294 0.6021241 0.6241341 0.6218949 0.7055372 0.5963368 0.5901003 #> 5 0.6095522 0.6497220 0.6611704 0.6500144 0.6969759 0.6393569 0.5942837 #> 6 0.6243356 0.6431861 0.6075838 0.6120964 0.6913716 0.6533246 0.5923743 #> 43 44 45 46 47 48 49 #> 1 0.6234267 0.5996698 0.6362640 0.6533660 0.6099648 0.6458129 0.6493653 #> 2 0.6014506 0.5522526 0.6162492 0.6290647 0.5662433 0.6391675 0.6077363 #> 3 0.6086627 0.6165466 0.6360054 0.6820521 0.6296201 0.6511159 0.6556694 #> 4 0.6390554 0.6283732 0.6447393 0.7008656 0.6290541 0.6532379 0.6451546 #> 5 0.6375199 0.6210606 0.6252798 0.7172933 0.6503303 0.6625645 0.6752949 #> 6 0.6183009 0.6201730 0.6125642 0.6691028 0.6423965 0.6695414 0.6516146 #> 50 #> 1 0.6107505 #> 2 0.5796903 #> 3 0.6595210 #> 4 0.6421178 #> 5 0.6418095 #> 6 0.6288501
head(as.matrix(median.avg.dist))
#> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 #> 1 0.00 0.57 0.60 0.59 0.61 0.61 0.62 0.56 0.61 0.59 0.59 0.65 0.71 0.61 0.64 #> 2 0.57 0.00 0.57 0.59 0.64 0.56 0.57 0.55 0.62 0.58 0.58 0.59 0.68 0.62 0.60 #> 3 0.60 0.57 0.00 0.61 0.62 0.55 0.60 0.57 0.57 0.55 0.58 0.60 0.76 0.61 0.61 #> 4 0.59 0.59 0.61 0.00 0.60 0.62 0.63 0.58 0.66 0.61 0.58 0.61 0.68 0.60 0.59 #> 5 0.61 0.64 0.62 0.60 0.00 0.59 0.63 0.63 0.62 0.63 0.68 0.70 0.77 0.63 0.64 #> 6 0.61 0.56 0.55 0.62 0.59 0.00 0.62 0.61 0.64 0.64 0.59 0.58 0.72 0.69 0.65 #> 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 #> 1 0.60 0.64 0.71 0.64 0.61 0.63 0.63 0.72 0.60 0.62 0.62 0.65 0.66 0.63 0.66 #> 2 0.58 0.58 0.69 0.60 0.63 0.62 0.56 0.64 0.59 0.60 0.62 0.65 0.60 0.55 0.62 #> 3 0.58 0.67 0.74 0.65 0.64 0.61 0.64 0.69 0.58 0.64 0.65 0.64 0.63 0.63 0.65 #> 4 0.60 0.60 0.70 0.62 0.61 0.66 0.60 0.70 0.61 0.66 0.66 0.59 0.64 0.65 0.64 #> 5 0.64 0.72 0.76 0.66 0.63 0.64 0.70 0.72 0.64 0.67 0.66 0.67 0.73 0.70 0.67 #> 6 0.60 0.62 0.66 0.63 0.66 0.68 0.59 0.68 0.61 0.66 0.66 0.65 0.64 0.63 0.70 #> 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 #> 1 0.64 0.69 0.68 0.70 0.78 0.64 0.68 0.68 0.68 0.72 0.63 0.66 0.65 0.62 0.63 #> 2 0.64 0.63 0.60 0.63 0.73 0.63 0.60 0.57 0.60 0.61 0.61 0.64 0.63 0.61 0.68 #> 3 0.66 0.65 0.65 0.70 0.78 0.63 0.67 0.63 0.67 0.70 0.66 0.67 0.64 0.65 0.67 #> 4 0.63 0.60 0.60 0.68 0.74 0.62 0.61 0.58 0.64 0.70 0.60 0.63 0.64 0.66 0.69 #> 5 0.66 0.74 0.69 0.70 0.82 0.66 0.67 0.75 0.72 0.79 0.67 0.67 0.68 0.67 0.66 #> 6 0.68 0.68 0.67 0.68 0.78 0.70 0.69 0.65 0.70 0.77 0.73 0.68 0.66 0.68 0.72 #> 46 47 48 49 50 #> 1 0.67 0.67 0.65 0.64 0.65 #> 2 0.64 0.65 0.64 0.67 0.68 #> 3 0.74 0.69 0.73 0.69 0.67 #> 4 0.66 0.66 0.66 0.69 0.63 #> 5 0.77 0.73 0.70 0.71 0.66 #> 6 0.67 0.67 0.67 0.75 0.71
# Run example to illustrate low variance of mean, median, and stdev results # Mean and median std dev are around 0.05 sdd <- avgdist(BCI, sample = 50, iterations = 100, meanfun = sd) summary(mean.avg.dist)
#> Min. 1st Qu. Median Mean 3rd Qu. Max. #> 0.4900 0.6200 0.6520 0.6558 0.6900 0.8540
summary(median.avg.dist)
#> Min. 1st Qu. Median Mean 3rd Qu. Max. #> 0.4700 0.6000 0.6400 0.6445 0.6800 0.8500
summary(sdd)
#> Min. 1st Qu. Median Mean 3rd Qu. Max. #> 0.04492 0.05675 0.05985 0.05984 0.06306 0.07662
# Test for when subsampling depth excludes some samples # Return samples that are removed for not meeting depth filter depth.avg.dist <- avgdist(BCI, sample = 450, iterations = 10)
#> Warning: The following sampling units were removed because they were below sampling depth: 1, 2, 6, 7, 8, 9, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 33, 34, 36, 37, 38, 39, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50
# Print the result depth.avg.dist
#> 3 4 5 10 15 30 32 #> 3 0.0000000 0.3311111 0.3666667 0.2946667 0.3555556 0.4562222 0.4731111 #> 4 0.3311111 0.0000000 0.3793333 0.3235556 0.3500000 0.3973333 0.4346667 #> 5 0.3666667 0.3793333 0.0000000 0.3902222 0.4004444 0.4953333 0.5175556 #> 10 0.2946667 0.3235556 0.3902222 0.0000000 0.3146667 0.4277778 0.4171111 #> 15 0.3555556 0.3500000 0.4004444 0.3146667 0.0000000 0.4562222 0.4666667 #> 30 0.4562222 0.3973333 0.4953333 0.4277778 0.4562222 0.0000000 0.3811111 #> 32 0.4731111 0.4346667 0.5175556 0.4171111 0.4666667 0.3811111 0.0000000 #> 35 0.6575556 0.6315556 0.6902222 0.6795556 0.6564444 0.5171111 0.5917778 #> 40 0.5611111 0.5302222 0.6315556 0.5213333 0.5580000 0.4535556 0.4011111 #> 35 40 #> 3 0.6575556 0.5611111 #> 4 0.6315556 0.5302222 #> 5 0.6902222 0.6315556 #> 10 0.6795556 0.5213333 #> 15 0.6564444 0.5580000 #> 30 0.5171111 0.4535556 #> 32 0.5917778 0.4011111 #> 35 0.0000000 0.4264444 #> 40 0.4264444 0.0000000