Sparse Cross Tabulation

Create a contingency table from cross-classifying factors, usually contained in a data frame, using a formula interface.

Usage

Xtab(formula = ~., data = parent.frame(), rdrop, cdrop,
subset, na.action, exclude = c(NA, NaN), drop.unused.levels = FALSE)

Arguments

formula: a formula object with the cross-classifying variables (separated by +) on the right hand side (or an object which can be coerced to a formula). Interactions are not allowed. On the left hand side, one may optionally give a vector or a matrix of counts; in the latter case, the columns are interpreted as corresponding to the levels of a variable. This is useful if the data have already been tabulated, see the examples below.
data: an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).
rdrop, cdrop: logical (should zero marginal rows/columns be removed after cross tabulation), character or numeric (what rows/columns should be removed).
subset: an optional vector specifying a subset of observations to be used.
na.action: a function which indicates what should happen when the data contain NAs.
exclude: a vector of values to be excluded when forming the set of levels of the classifying factors.
drop.unused.levels: a logical indicating whether to drop unused levels in the classifying factors. If this is FALSE and there are unused levels, the table will contain zero marginals, and a subsequent chi-squared test for independence of the factors will not work.

Details

The function creates two- or three-way cross tabulation. Only works for two or three factors.

If a left hand side is given in formula, its entries are simply summed over the cells corresponding to the right hand side; this also works if the left hand side does not give counts.

Value

A sparse numeric matrix inheriting from sparseMatrix, specifically an object of S4 class dgCMatrix.

For three factors, a list of sparse matrices.

Author

This function is a slight modification of the xtabs function in the stats package.

Modified by Peter Solymos <solymos@ualberta.ca>

Examples

x <- data.frame(
    sample = paste("Sample", c(1,1,2,2,3,4), sep="."),
    species = c(paste("Species", c(1,1,1,2,3), sep="."),  "zero.pseudo"),
    count = c(1,2,10,3,4,0),
    stringsAsFactors = TRUE)
x
#>     sample     species count
#> 1 Sample.1   Species.1     1
#> 2 Sample.1   Species.1     2
#> 3 Sample.2   Species.1    10
#> 4 Sample.2   Species.2     3
#> 5 Sample.3   Species.3     4
#> 6 Sample.4 zero.pseudo     0
## Xtab class, counts by repetitions in RHS
(x0 <- Xtab(~ sample + species, x))
#> 4 x 4 sparse Matrix of class "dgCMatrix"
#>          Species.1 Species.2 Species.3 zero.pseudo
#> Sample.1         2         .         .           .
#> Sample.2         1         1         .           .
#> Sample.3         .         .         1           .
#> Sample.4         .         .         .           1
## counts by LHS and repetitions in RHS
(x1 <- Xtab(count ~ sample + species, x))
#> 4 x 4 sparse Matrix of class "dgCMatrix"
#>          Species.1 Species.2 Species.3 zero.pseudo
#> Sample.1         3         .         .           .
#> Sample.2        10         3         .           .
#> Sample.3         .         .         4           .
#> Sample.4         .         .         .           .
## drop all empty rows
(x2 <- Xtab(count ~ sample + species, x, cdrop=FALSE,rdrop=TRUE))
#> 3 x 4 sparse Matrix of class "dgCMatrix"
#>          Species.1 Species.2 Species.3 zero.pseudo
#> Sample.1         3         .         .           .
#> Sample.2        10         3         .           .
#> Sample.3         .         .         4           .
## drop all empty columns
Xtab(count ~ sample + species, x, cdrop=TRUE,rdrop=FALSE)
#> 4 x 3 sparse Matrix of class "dgCMatrix"
#>          Species.1 Species.2 Species.3
#> Sample.1         3         .         .
#> Sample.2        10         3         .
#> Sample.3         .         .         4
#> Sample.4         .         .         .
## drop specific columns by placeholder
Xtab(count ~ sample + species, x, cdrop="zero.pseudo")
#> 4 x 3 sparse Matrix of class "dgCMatrix"
#>          Species.1 Species.2 Species.3
#> Sample.1         3         .         .
#> Sample.2        10         3         .
#> Sample.3         .         .         4
#> Sample.4         .         .         .

## 2 and 3 way crosstabs
xx <- data.frame(
    sample = paste("Sample", c(1,1,2,2,3,4), sep="."),
    species = c(paste("Species", c(1,1,1,2,3), sep="."),  "zero.pseudo"),
    count = c(1,2,10,3,4,0),
    segment = letters[c(6,13,6,13,6,6)],
    stringsAsFactors = TRUE)
xx
#>     sample     species count segment
#> 1 Sample.1   Species.1     1       f
#> 2 Sample.1   Species.1     2       m
#> 3 Sample.2   Species.1    10       f
#> 4 Sample.2   Species.2     3       m
#> 5 Sample.3   Species.3     4       f
#> 6 Sample.4 zero.pseudo     0       f
Xtab(count ~ sample + species, xx)
#> 4 x 4 sparse Matrix of class "dgCMatrix"
#>          Species.1 Species.2 Species.3 zero.pseudo
#> Sample.1         3         .         .           .
#> Sample.2        10         3         .           .
#> Sample.3         .         .         4           .
#> Sample.4         .         .         .           .
Xtab(count ~ sample + species + segment, xx)
#> $f
#> 4 x 4 sparse Matrix of class "dgCMatrix"
#>          Species.1 Species.2 Species.3 zero.pseudo
#> Sample.1         1         .         .           .
#> Sample.2        10         .         .           .
#> Sample.3         .         .         4           .
#> Sample.4         .         .         .           .
#> 
#> $m
#> 4 x 4 sparse Matrix of class "dgCMatrix"
#>          Species.1 Species.2 Species.3 zero.pseudo
#> Sample.1         2         .         .           .
#> Sample.2         .         3         .           .
#> Sample.3         .         .         .           .
#> Sample.4         .         .         .           .
#>