Chapter 6 Dealing with Recordings

6.1 Introduction

Automated recording units (ARU) are increasingly being used for auditory surveys. There are numerous advantages for using ARUs, e.g. recordings can be stored in perpetuity to be transcribed later, ARUs can be programmed to record at select times and dates over long time periods that would be prohibitive using human observers.

Bird point counts have been traditionally done by human observers. Combining ARU data with traditional point counts thus require an understanding of how the ARU based counts relate to counts made by human observer in the field. The best way to approach this question is by simultaneously sampling by two approaches: (1) human observers doing traditional point count by registering time and distance interval an individual bird was first detected, and (2) record the same session at the same location by an ARU to be identified/transcribed later in laboratory settings.

6.2 Prerequisites

library(bSims)                # simulations
library(detect)               # multinomial models
library(mefa4)                # count manipulation
library(survival)             # survival models
library(paired)               # paired sampling data
load("_data/abmi/abmi.rda")   # ABMI ARU data set

6.3 Paired sampling

The expected value of the total count (single species) in a 10-minutes time interval using human observer (subscript \(H\)) based unlimited radius point count may be written as: \(E[Y_{H}] = D A_{H} p_{H}\) where \(Y_{H}\) is the count, \(D\) is population density, \(A\) is the area sampled, \(p_{H}\) is the probability that an average individual of the species is available for detection. The quantity \(p_{H}\) can be estimated based on removal sampling utilizing the multiple time intervals. \(A_{H}\) is often unknown, but can be estimated using the effective detection radius: \(A_{H}=\pi EDR_{H}^2\). Human observer based EDR is estimated from distance sampling.

The ARU based survey (subscript \(R\) for recorder) can distinguish individuals within distinct time intervals, but assigning these individuals is not yet possible using a single ARU. An ARU based count thus can be seen ans an unlimited radius point count where the effective area sampled is unknown. The expected value for an ARU based count for a given species may be written as: \(E[Y_{R}] = D A_{R} p_{R}\). \(p_{R}\) can be estimated based on removal sampling utilizing the multiple time intervals from the ARU based survey. The unknown sampling are can be written as \(A_{R}=\pi EDR_{R}^2\). The problem is that ARU based EDR cannot directly be estimated from the data because of the lack of multiple distance bands or individual based distance information.

The advantage of simultaneous sampling by human observers (H) and ARUs (A) is that population density (\(D=D_{H}=D_{R}\)) is identical by design. Possible mechanisms for differences in availability of bird individuals for detection (\(p_{H}\) vs. \(p_{R}\)) can include differences in how detections are made in the field vs. in laboratory (e.g. possibility of double checking).

Both \(p_{H}\) and \(p_{R}\) can be estimated from the data, and the equivalence \(p=p_{H}=p_{R}\) can be tested. So for the sake of simplicity, we assume that human observer and ARU based \(p\)’s are equal. Dividing the expected values of the counts may be written as:

\[\frac{E[Y_{R}]}{E[Y_{H}]} = \frac{D A_{R} p}{D A_{R} p} = \frac{\pi EDR_{R}^2}{\pi EDR_{H}^2} = \frac{EDR_{R}^2}{EDR_{H}^2}\]

By substituting \(EDR_{R}^2 = \Delta^2 EDR_{H}^2\) (and thus \(EDR_{R} = \Delta EDR_{H}\)) we get:

\[\frac{E[Y_{R}]}{E[Y_{H}]} = \frac{\Delta^2 EDR_{H}^2}{EDR_{H}^2} = \Delta^2\]

This means that dividing the mean counts from ARU and human observed counts would give an estimate of the squared scaling constant (\(\Delta^2\)) describing the relationship between the estimated \(EDR_{H}\) and the unknown \(EDR_{R}\).

6.4 Paired data

Human observer surveys:

  • 0-50, 50-100, >100 m distance bands,
  • 0-3, 3-5, 5-10 minutes time intervals.

ARU surveys:

  • unlimited distance,
  • 10 minutes survey in 1-minute time intervals.
paired$DISTANCE[paired$SurveyType == "ARU"] <- "ARU"
with(paired, ftable(SurveyType, Interval, DISTANCE))
##                     DISTANCE >100 m 0-49 m 50-100 m  ARU
## SurveyType Interval                                     
## ARU        0-3 min                0      0        0 3349
##            3-5 min                0      0        0  675
##            5-10 min               0      0        0 1296
##            UNK                    0      0        0    3
## HUM        0-3 min              877   1270     1387    0
##            3-5 min              244    252      393    0
##            5-10 min             481    429      645    0
##            UNK                    7     20       12    0

Select a subset of species that we’ll work with:

xt <- as.matrix(Xtab(Count ~ PKEY + SPECIES, 
  data=paired[paired$SurveyType == "HUM",]))

SPP <- colnames(xt)
## number of >0 counts
ndis <- colSums(xt > 0)
## max count
maxd <- apply(xt, 2, max)
nmin <- 15
SPP <- SPP[ndis >= nmin & maxd > 1]
SPP <- SPP[!(SPP %in%
  c("CANG","COLO","COGO","COME","FRGU","BCFR","UNKN","RESQ",
  "CORA","AMCR","WOSP","WWCR","PISI","EVGR", "RUGR", "SACR",
  "NOFL"))]
SPP
##  [1] "ALFL" "AMRE" "AMRO" "BBWA" "BCCH" "BLBW" "BLJA" "BRCR" "CAWA" "CCSP"
## [11] "CEDW" "CHSP" "CMWA" "CONW" "COYE" "CSWA" "DEJU" "FOSP" "GRAJ" "GRYE"
## [21] "HAWO" "HETH" "LEFL" "LISP" "MAWA" "MOWA" "MYWA" "NAWA" "OCWA" "OVEN"
## [31] "PAWA" "PHVI" "PIWO" "RBGR" "RBNU" "RCKI" "REVI" "SOSP" "SWTH" "TEWA"
## [41] "WISN" "WIWR" "WTSP" "YBSA" "YRWA"

6.5 Availability

We estimated availability for human observer and ARU based counts using the time interval information. ARU based intervals were collapsed to the 0-3-5-10 minutes intervals to match the human observer based design.

xtdurH <- Xtab(Count ~ PKEY + Interval + SPECIES, 
  paired[paired$SurveyType == "HUM",])
xtdurH <- xtdurH[SPP]
xtdurR <- Xtab(Count ~ PKEY + Interval + SPECIES, 
  paired[paired$SurveyType == "ARU",])
xtdurR <- xtdurR[SPP]

Ddur <- matrix(c(3, 5, 10), nrow(xtdurH[[1]]), 3, byrow=TRUE)
Ddur2 <- rbind(Ddur, Ddur)

xdur <- nonDuplicated(paired, PKEY, TRUE)
xx <- xdur[rownames(xtdurR[[1]]),]

We estimated availability for species with at least 15 detections in both subsets of the data (making sure that the total count for at least some locations exceeded 1). We analyzed the human observer and ARU based data in a single model using survey type as a dummy variable. We tested if the estimate corresponding to survey type differed significantly from 0 using 95% confidence intervals.

The following table lists singing rates (phi 1/minute), probability of singing in a 10-minutes interval (p10), number of detections (n), and whether or not the confidence limits for the survey type estimate (\(\beta_1\)) contained 0 (i.e. not significant survey effect).

mdurR <- list()
mdurH <- list()
mdurHR <- list()
mdurHR1 <- list()
for (spp in SPP) {
    yR <- as.matrix(xtdurR[[spp]])[,c("0-3 min","3-5 min","5-10 min")]
    yH <- as.matrix(xtdurH[[spp]])[,c("0-3 min","3-5 min","5-10 min")]
    yHR <- rbind(yH, yR)
    mdurR[[spp]] <- cmulti(yR | Ddur ~ 1, type = "rem")
    mdurH[[spp]] <- cmulti(yH | Ddur ~ 1, type = "rem")
    aru01 <- rep(0:1, each=nrow(yH))
    mdurHR[[spp]] <- cmulti(yHR | Ddur2 ~ 1, type = "rem")
    mdurHR1[[spp]] <- cmulti(yHR | Ddur2 ~ aru01, type = "rem")
}
cfR <- sapply(mdurR, coef)
cfH <- sapply(mdurH, coef)
cfHR <- sapply(mdurHR, coef)
cfHR1 <- t(sapply(mdurHR1, coef))
names(cfR) <- names(cfH) <- names(cfHR) <- names(cfHR1) <- SPP

phiR <- exp(cfR)
phiH <- exp(cfH)
phiHR <- exp(cfHR)

## confidence interval for survey type effect
ci <- t(sapply(mdurHR1, function(z) confint(z)[2,]))
## does CI contain 0?
table(0 %[]% ci)
## 
## FALSE  TRUE 
##     4    41
plot(phiR ~ phiH, 
  ylim=c(0, max(phiH, phiR)), xlim=c(0, max(phiH, phiR)),
  pch=c(21, 19)[(0 %[]% ci) + 1],
  xlab=expression(phi[H]), ylab=expression(phi[R]),
  cex=0.5+2*phiHR)
abline(0,1)

Exercise

Which \(\phi\) estimate should we use? Can we use phiHR? Isn’t that cheating to double the sample size? Think about what we are conditioning on when estimating \(\phi\), and what makes samples independent.

6.6 Distance sampling

We estimate EDR from human observer based counts:

## Data for EDR estimation
xtdis <- Xtab(Count ~ PKEY + DISTANCE + SPECIES, 
  data=paired[paired$SurveyType == "HUM",])
xtdis <- xtdis[SPP]
for (i in seq_len(length(xtdis)))
    xtdis[[i]] <- as.matrix(xtdis[[i]][,c("0-49 m", "50-100 m", ">100 m")])
head(xtdis$YRWA)
##             0-49 m 50-100 m >100 m
## 05-041-01_1      0        0      0
## 05-041-02_1      0        0      0
## 05-041-05_1      0        0      0
## 05-041-06_1      0        0      0
## 05-041-07_1      0        1      0
## 05-041-08_1      0        0      0
## distance radii
Ddis <- matrix(c(0.5, 1, Inf), nrow(xtdis[[1]]), 3, byrow=TRUE)
head(Ddis)
##      [,1] [,2] [,3]
## [1,]  0.5    1  Inf
## [2,]  0.5    1  Inf
## [3,]  0.5    1  Inf
## [4,]  0.5    1  Inf
## [5,]  0.5    1  Inf
## [6,]  0.5    1  Inf
## predictors
xdis <- nonDuplicated(paired, PKEY, TRUE)
xdis <- xdis[rownames(xtdis[[1]]),]

Fitting distance sampling models for each species:

mdis <- pblapply(xtdis, function(Y) {
  cmulti(Y | Ddis ~ 1, xdis, type = "dis")
})
tauH <- sapply(mdis, function(z) unname(exp(coef(z))))
edrH <- 100 * tauH
round(sort(edrH))
## CEDW CMWA BBWA BLBW BRCR NAWA LEFL AMRE HAWO PHVI YRWA MAWA CSWA BCCH OCWA 
##   34   36   39   41   42   45   46   51   52   53   53   54   55   56   59 
## CAWA COYE TEWA MOWA PAWA YBSA LISP MYWA CCSP DEJU CHSP CONW OVEN REVI GRAJ 
##   61   62   63   64   65   67   69   69   70   70   73   74   77   81   82 
## RBGR RCKI RBNU SWTH AMRO WTSP WIWR ALFL SOSP BLJA FOSP HETH GRYE WISN PIWO 
##   85   86   88   91   92   99  101  102  114  129  130  138  158  164  167
hist(edrH)

6.7 Scaling constant

Counts are often modeled in a log-linear Poisson GLM. We used GLM to estimate the unknown scaling constant from simultaneous (paired) surveys. The Poisson mean for a count made at site \(i\) by human observer is \(\lambda_{i,H} = D_{i} \pi EDR_H^2 p\). \(EDR_H\) and \(p\) are estimated using distance sampling and removal sampling, respectively. Those estimates are used to calculate a correction factor \(C = \pi EDR_H^2 p\) which is used as an offset on the log scale as \(log(\lambda_{i,H}) = log(D_{i}) + log(C) = \beta_0 + log(C)\), where \(\beta_0\) is the intercept in the GLM model.

Following the arguments above, the Poisson mean for an ARU based count made at site \(i\) is \(\lambda_{i,R} = D_{i} \pi \Delta^2 EDR_H^2 p = D_{i} \Delta^2 C\). On the log scale, this becomes \(log(\lambda_{i,R}) = log(D_{i}) + log(\Delta^2) + log(C) = \beta_0 + \beta_1 + log(C)\), where \(\beta_1\) is a contrast for ARU type surveys in the log-linear model.

We used survey type as a binary variable (\(x_i\)) with value 0 for human observers and value 1 for ARUs. So the Poisson model is generalized as: \(log(\lambda_{i}) = \beta_0 + x_i \beta_1 + log(C)\). \(\Delta\) can be calculated from \(\beta_1\) as \(\Delta = \sqrt{e^{\beta_i}}\).

We used the Poisson GLM model describe before to estimate the \(\beta_1\) coefficient corresponding to survey type as binary predictor variable, and an offset term incorporating human observer based effective area sampled and availability.

phi <- phiHR
tau <- tauH

Y <- as.matrix(Xtab(Count ~ PKEYm + SPECIES, paired))
X <- nonDuplicated(paired, PKEYm, TRUE)
X <- X[rownames(Y),]
X$distur <- ifelse(X$Disturbance != "Undisturbed", 1, 0)
X$SurveyType <- relevel(X$SurveyType, "HUM")

library(lme4)
mods <- list()
aictab <- list()
Delta <- matrix(NA, length(SPP), 3)
dimnames(Delta) <- list(SPP, c("est", "lcl", "ucl"))

#spp <- "ALFL"
for (spp in SPP) {
  y <- Y[,spp]
  C <- tau[spp]^2 * pi * (1-exp(-phi[spp]))
  off <- rep(log(C), nrow(X))

  mod0 <- glm(y ~ 1, X, offset=off, family=poisson)
  mod1 <- glm(y ~ SurveyType, X, offset=off, family=poisson)
  mod2 <- glm(y ~ SurveyType + distur, X, offset=off, family=poisson)

  aic <- AIC(mod0, mod1, mod2)
  aic$delta_AIC <- aic$AIC - min(aic$AIC)
  aictab[[spp]] <- aic

  Best <- get(rownames(aic)[aic$delta_AIC == 0])
  #summary(Best)

  mods[[spp]] <- Best
  
  ## this is Monte Carlo based CI, no need for Delta method
  bb <- MASS::mvrnorm(10^4, coef(mod1), vcov(mod1))
  Delta[spp,] <- c(sqrt(exp(coef(mod1)["SurveyTypeARU"])),
    quantile(sqrt(exp(bb[,"SurveyTypeARU"])), c(0.025, 0.975)))
}

aic_support <- t(sapply(aictab, function(z) z[,3]))
round(aic_support)
##      [,1] [,2] [,3]
## ALFL  247  244    0
## AMRE   65   66    0
## AMRO   26   21    0
## BBWA   22   23    0
## BCCH   28   29    0
## BLBW   16   18    0
## BLJA    5    7    0
## BRCR   27   29    0
## CAWA   18   20    0
## CCSP   68   68    0
## CEDW    9   11    0
## CHSP    6    5    0
## CMWA   23   25    0
## CONW   20   21    0
## COYE   14   16    0
## CSWA   22   23    0
## DEJU  102   99    0
## FOSP   97   98    0
## GRAJ    0    0    0
## GRYE    6    8    0
## HAWO    0    1    1
## HETH   19   17    0
## LEFL    0    2    4
## LISP   50   52    0
## MAWA    0    1    0
## MOWA    0    2    3
## MYWA   34   36    0
## NAWA    0    1    1
## OCWA  132  133    0
## OVEN  378  375    0
## PAWA  126  128    0
## PHVI    0    2    3
## PIWO    8   10    0
## RBGR    0    2    1
## RBNU   37   38    0
## RCKI   13    9    0
## REVI   23   21    0
## SOSP   65   66    0
## SWTH   37   39    0
## TEWA   22   15    0
## WISN    1    3    0
## WIWR   35   35    0
## WTSP  167  168    0
## YBSA   19   18    0
## YRWA   26   10    0

The following table show the estimate of \(\Delta\) for each species, and the corresponding estimates of effective detection radius (EDR) in meters and effective area sampled (\(A\)) in ha:

But wait, if we started from expected values, shouldn’t ratio of the mean counts give us \(\Delta^2\)? Let’s see if we can get a similar \(\Delta\) value from mean counts:

(gm <- groupMeans(Y[,SPP], 1, X$SurveyType))
##       ALFL   AMRE   AMRO   BBWA   BCCH   BLBW   BLJA   BRCR    CAWA
## ARU 0.2152 0.2200 0.1418 0.1809 0.0978 0.1149 0.1271 0.1663 0.06846
## HUM 0.2910 0.2029 0.2200 0.2054 0.1198 0.1271 0.1222 0.1516 0.06846
##        CCSP   CEDW   CHSP    CMWA    CONW    COYE   CSWA   DEJU    FOSP
## ARU 0.04401 0.1711 0.4890 0.07335 0.07824 0.05623 0.2127 0.2274 0.06601
## HUM 0.07090 0.1711 0.5844 0.06357 0.10024 0.06112 0.2421 0.3154 0.05379
##       GRAJ    GRYE    HAWO   HETH    LEFL    LISP   MAWA   MOWA   MYWA
## ARU 0.2249 0.08313 0.04401 0.5428 0.08802 0.07824 0.4132 0.2494 0.3252
## HUM 0.2714 0.09535 0.06357 0.6479 0.09780 0.07579 0.4597 0.2616 0.3276
##       NAWA  OCWA  OVEN   PAWA    PHVI   PIWO    RBGR   RBNU   RCKI   REVI
## ARU 0.1222 0.110 1.127 0.1345 0.07579 0.1516 0.10758 0.1540 0.2958 0.9291
## HUM 0.1002 0.132 1.306 0.1345 0.07579 0.1345 0.09535 0.1785 0.3961 1.0636
##        SOSP   SWTH   TEWA   WISN   WIWR  WTSP   YBSA    YRWA
## ARU 0.05623 0.5281 0.7433 0.1076 0.3081 1.560 0.1418 0.05623
## HUM 0.06846 0.5403 0.9389 0.1002 0.3667 1.643 0.1883 0.15159
Delta_summary$Delta.emp <- sqrt(gm["ARU",] / gm["HUM",])

plot(Delta.est ~ Delta.emp, Delta_summary,
  col=c(2,1)[(1 %[]% Delta[,-1]) + 1])
abline(0, 1)
abline(h=1, v=1, lty=2)

It looks like the fancy modeling was all for nothing, theory prevailed. But it is always nice when things work out as expected.

We can also see that \(\Delta\) (especially the significant ones) tended to be less than 1, indicating that overall EDR for ARUs is slightly smaller that for human point counts. \(\Delta\) was significantly different from 1 only for relatively few species.

Exercise

Can we pool all species’ data together to estimate an overall \(\Delta\) value? Would that characterize this particular ARU type well enough? What are some of the arguments against this pooling? What might be driving the variation across species?

6.8 Data integration

Now we will pretend that we have no paired design. See how well fixed effects can handle the data integration without calibration.

i <- sample(levels(X$PKEY), floor(nlevels(X$PKEY)/2))
ss <- c(which(X$PKEY %in% i), which(!(X$PKEY %in% i)))

mods2 <- list()
for (spp in SPP) {
  y <- Y[ss,spp]
  C <- tau[spp]^2 * pi * (1-exp(-phi[spp]))
  off <- rep(log(C), length(ss))

  mod <- glm(y ~ SurveyType, X[ss,], offset=off, family=poisson)

  mods2[[spp]] <- mod
}
Delta_summary$Delta.fix <- sapply(mods2, function(z) {
  sqrt(exp(coef(z)[2]))
})
plot(Delta.fix ~ Delta.emp, Delta_summary)
abline(0, 1)
abline(h=1, v=1, lty=2)

Exercise

Use the script below to push the fixed effects method to the limit and see where it fails. We will explore the following two situations: (1) sample size and number of detections is small, (2) sampling is biased with respect to habitat strata.

X$open <- ifelse(X$Class_Name %in% c("Open Herb/Grass",
    "Open coniferous","Open Mature Deciduous","Open Mixed",
    "Open Northern","Open Young Deciduous",
    "Open Young Mixed","Poorly Drained"), 1, 0)
## proportion of samples from ARUs (original)
prop_aru <- 0.5
## proportion of ARU samples coming from open habitats
prop_open <- 0.6

n_aru <- round(nrow(X) * prop_aru)
n_hum <- nrow(X) - n_aru
w_aru <- prop_open*X$open + (1-prop_open)*(1-X$open)
w_hum <- (1-prop_open)*X$open + prop_open*(1-X$open)

id_aru <- sample(which(X$SurveyType == "ARU"), n_aru, 
  replace=TRUE, prob=w_aru[X$SurveyType == "ARU"])
id_hum <- sample(which(X$SurveyType == "HUM"), n_hum, 
  replace=TRUE, prob=w_hum[X$SurveyType == "HUM"])

ss <- c(id_aru, id_hum)
addmargins(with(X[ss,], table(open, SurveyType)))

mods3 <- list()
for (spp in SPP) {
  y <- Y[ss,spp]
  C <- tau[spp]^2 * pi * (1-exp(-phi[spp]))
  off <- rep(log(C), length(ss))

  mod <- glm(y ~ SurveyType, X[ss,], offset=off, family=poisson)

  mods3[[spp]] <- mod
}


Est <- sapply(mods3, function(z) sqrt(exp(coef(z)[2])))

plot(Est ~ Delta.emp, Delta_summary)
abline(0, 1)
abline(h=1, v=1, lty=2)
abline(lm(Est ~ Delta.emp, Delta_summary), col=2)

6.9 ABMI data

ABMI ARU data is a subset of cca. 1000 recordings done at each survey station over the course of about 5 months (deployed in early March, retrieved in an of July).

Recordings are of 1 or 3 minutes in length. Early season and midnight recordings are 1-min long, breeding season morning surveys are 3-min long to better capture the dawn chorus.

## we'll use this for predictions
q <- c(0.025, 0.975)
qq <- quantile(abmi$ToY, q)
ToY <- seq(qq[1], qq[2], 1)
qq3 <- quantile(abmi$ToY[abmi$Duration == 3], q)
tpred <- 1 # duration to use for prediction

tmp <- nonDuplicated(abmi, visit)
plot(ToD ~ ToY, tmp, col=ifelse(tmp$Duration == 3, 2, 1),
  pch=19, cex=0.5)

We will compare results based on different subsets of the data. First, we’ll compile the 3-min surveys (breeding season, morning) and fit removal models. Make survey x interval tables for species based on 3-min samples: need to drop visits that are less than 3-mins:

y3 <- Xtab(~ visit + det1 + SpeciesID,
  data=abmi[abmi$Duration == 3 & !(abmi$SpeciesID %in% 
  c("NONE","SNI", "VNA", "DNC", "PNA")),],
  drop.unused.levels=TRUE)

D3 <- matrix(1:3, nrow(y3[[1]]), 3, byrow=TRUE)

x3 <- droplevels(nonDuplicated(abmi, visit, TRUE)[rownames(y3[[1]]),
  c("pkey", "visit", "ToY", "ToYc", "ToD", "ToDx", "ToDc")])

Pick a species and fit removal models:

spp <- "Ovenbird" # mirgratory
#spp <- "BorealChickadee" # winter resident

## use morning samples only
Y <- as.matrix(y3[[spp]])
Y <- Y[x3$ToDc == "Morning",]
D <- D3[x3$ToDc == "Morning",]

## fit a conventional model with polynomial ToY effect
z <- cmulti(Y | D ~ ToY + I(ToY^2) + I(ToY^3),
  x3[x3$ToDc == "Morning",], type="rem")
summary(z)
## 
## Call:
## cmulti(formula = Y | D ~ ToY + I(ToY^2) + I(ToY^3), data = x3[x3$ToDc == 
##     "Morning", ], type = "rem")
## 
## Removal Sampling (homogeneous singing rate)
## Conditional Maximum Likelihood estimates
## 
## Coefficients:
##                      Estimate Std. Error   z value Pr(>|z|)    
## log.phi_(Intercept)  4.78e-01   8.42e-51  5.68e+49   <2e-16 ***
## log.phi_ToY          3.20e-04   8.42e-51  3.80e+46   <2e-16 ***
## log.phi_I(ToY^2)     2.70e-05   8.42e-55  3.21e+49   <2e-16 ***
## log.phi_I(ToY^3)    -1.83e-07   8.42e-55 -2.17e+47   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Log-likelihood: -873 
## BIC = 1.77e+03
X <- model.matrix(~ ToY + I(ToY^2) + I(ToY^3),
  data.frame(ToY=ToY))
phi <- exp(X %*% coef(z))
prem <- 1-exp(-tpred*phi)

## fit a mixture model with polynomial ToY effect
z <- cmulti(Y | D ~ ToY + I(ToY^2) + I(ToY^3),
  x3[x3$ToDc == "Morning",], type="mix")
summary(z)
## 
## Call:
## cmulti(formula = Y | D ~ ToY + I(ToY^2) + I(ToY^3), data = x3[x3$ToDc == 
##     "Morning", ], type = "mix")
## 
## Removal Sampling (heterogeneous singing rate)
## Conditional Maximum Likelihood estimates
## 
## Coefficients:
##                      Estimate Std. Error   z value Pr(>|z|)    
## log.phi             -2.98e-01   8.42e-51 -3.54e+49   <2e-16 ***
## logit.c_(Intercept) -2.42e-01   8.42e-51 -2.87e+49   <2e-16 ***
## logit.c_ToY         -8.16e-03   8.42e-51 -9.69e+47   <2e-16 ***
## logit.c_I(ToY^2)     2.80e-06   8.42e-55  3.32e+48   <2e-16 ***
## logit.c_I(ToY^3)     3.07e-07   8.42e-55  3.65e+47   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Log-likelihood: -848 
## BIC = 1.73e+03
chat <- plogis(X %*% coef(z)[-1])
phi <- exp(coef(z)[1])
pmix <- 1-chat*exp(-tpred*phi)

Let’s see how does extrapolation look like:

plot(pmix ~ ToY, type="l", ylim=c(0,1))
lines(prem ~ ToY, col=4)
abline(v=qq3, lty=2)

Next, we make survey x species table based on 1st 1-min surveys by excluding counts after 1-min:

y1 <- Xtab(~ visit + SpeciesID,
  data=abmi[is.na(abmi$det1) | abmi$det1 == "0-1min",],
  cdrop=c("NONE","SNI", "VNA", "DNC", "PNA"))
x1 <- droplevels(nonDuplicated(abmi, visit, TRUE)[rownames(y1),
  c("pkey", "visit", "ToY", "ToYc", "ToD", "ToDx", "ToDc")])

We fit GAM model to detection-nondetection data, and we condition on known occurrences at the species over the breeding season (bracketed by qq3).

First, let’s explore species’ responses:

s <- colSums(y1[x1$ToY >= 150, ] > 0)
SPP <- colnames(y1)[s >= 100]
P <- matrix(0, length(ToY), length(SPP))
colnames(P) <- SPP

for (sppi in SPP) {
  x1$y <- ifelse(y1[,sppi] > 0, 1, 0)
  tmp <- with(x1[x1$ToY %[]% qq3, ], table(pkey, y))
  tmp <- tmp[tmp[,"1"] > 0,]
  x1$occ <- x1$pkey %in% rownames(tmp)
  m <- mgcv::gam(y ~ s(ToY), 
    x1[x1$ToDc == "Morning" & x1$occ,], family=poisson)
  P[,sppi] <- predict(m, newdata=data.frame(ToY=ToY), type="response")
}

matplot(ToY, P, type="l", ylim=c(0,1), lty=1, col="#00000044")

Single species models continued:

## species detections
x1$y <- ifelse(y1[,spp] > 0, 1, 0)
## finding locations with at least 1 detections
tmp <- with(x1[x1$ToY %[]% qq3, ], table(pkey, y))
tmp <- tmp[tmp[,"1"] > 0,]
head(tmp)
##               y
## pkey           0 1
##   1086_2017_SE 4 1
##   121_2016_SE  2 4
##   121_2016_SW  2 4
##   1210_2017_SW 5 1
##   1211_2017_NE 5 1
##   1211_2017_SE 4 2
x1$occ <- x1$pkey %in% rownames(tmp)
table(detection=x1$y, occupancy=x1$occ)
##          occupancy
## detection FALSE  TRUE
##         0 15235  3388
##         1    19  1090
## fit GAM to all the 1-min data
m <- mgcv::gam(y ~ s(ToY), 
  x1[x1$ToDc == "Morning" & x1$occ,], family=binomial)
pgam1 <- predict(m, newdata=data.frame(ToY=ToY), type="response")

## fit GAM to 1-min data in the breeding season
m <- mgcv::gam(y ~ s(ToY), x1[x1$ToDc == "Morning" & x1$occ &
    x1$visit %in% levels(x3$visit),], family=binomial)
pgam3 <- predict(m, newdata=data.frame(ToY=ToY), type="response")

Let’s use time-to-1st-detection data to fit survival model:

x1$y <- ifelse(y1[,spp] > 0, 1, 0)
tmp <- with(x1[x1$ToY %[]% qq3, ], table(pkey, y))
tmp <- tmp[tmp[,"1"] > 0,]
x1$occ <- x1$pkey %in% rownames(tmp)
## subset to known occurrence sites
x1 <- droplevels(x1[x1$occ,])
## we use times from raw data table
xx <- droplevels(abmi[abmi$SpeciesID == spp & !is.na(abmi$int1),])

Events need to be organized into a survival object: we treat nondetections as censored events at 60 sec.

x1$time <- xx$int1[match(x1$visit, xx$visit)]
x1$time[is.na(x1$time)] <- 60
#' time cannot be 0, so we use 1 sec instead
x1$time[x1$time == 0] <- 1
#' We give time in minutes, so we get rate as events/min.
x1$sv <- Surv(x1$time/60, x1$y)
head(x1$sv)
## [1] 1.0000+ 1.0000+ 1.0000+ 1.0000+ 0.7833  1.0000+
tail(x1$sv)
## [1] 1.0000+ 1.0000+ 0.2833  1.0000+ 1.0000+ 1.0000+

Fit a series of survival models:

mods <- list(m0 = survreg(sv ~ 1, x1, dist="exponential"),
    m1 = survreg(sv ~ ToY, x1, dist="exponential"),
    m2 = survreg(sv ~ ToY + I(ToY^2), x1, dist="exponential"),
    m3 = survreg(sv ~ ToY + I(ToY^2) + I(ToY^3), x1, dist="exponential"))

aic <- data.frame(df=sapply(mods, function(z) length(coef(z))), AIC=sapply(mods, AIC))
aic$delta_AIC <- aic$AIC - min(aic$AIC)
aic

The package survreg fits accelerated failure models, not proportional hazards models, so the coefficients are logarithms of ratios of survival times, and a positive coefficient means longer survival.

mb <- mods[[which.min(aic$AIC)]]
summary(mb)
## 
## Call:
## survreg(formula = sv ~ ToY + I(ToY^2) + I(ToY^3), data = x1, 
##     dist = "exponential")
##                 Value Std. Error     z       p
## (Intercept)  1.64e+02   2.54e+01  6.46 1.1e-10
## ToY         -2.75e+00   4.80e-01 -5.73 9.9e-09
## I(ToY^2)     1.51e-02   3.01e-03  5.01 5.4e-07
## I(ToY^3)    -2.67e-05   6.27e-06 -4.26 2.0e-05
## 
## Scale fixed at 1 
## 
## Exponential distribution
## Loglik(model)= -1906   Loglik(intercept only)= -2455
##  Chisq= 1098 on 3 degrees of freedom, p= 1e-237 
## Number of Newton-Raphson Iterations: 9 
## n= 4478
## survival times
summary(st <- predict(mb, data.frame(ToY=ToY)))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1       2       5   56731     142 2189645
## event rate per unit (1 min) time
summary(phi <- 1 / st)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0072  0.2053  0.2754  0.5175  0.7249
## probability of at least 1 event per 10 time units (mins)
summary(psurv <- 1-exp(-tpred*phi))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0072  0.1856  0.2157  0.4040  0.5156
plot(prem ~ ToY, type="l", ylim=c(0,1), 
  main=spp, ylab="Relative availability")
abline(v=qq3, lty=2, col="grey")
lines(pmix ~ ToY, col=3)
lines(pgam1 ~ ToY, col=2)
lines(pgam3 ~ ToY, col=4)
lines(psurv ~ ToY, col=5)
rug(jitter(x1$ToY), side=1, col=2)
rug(jitter(x3$ToY), side=3, col=4)
legend(100, 0.7, bty="n", lty=1, col=c(1,3,2,4,5), cex=0.6,
  legend=c("Conv. removal", "Mix. removal", 
  "GAM 1min", "GAM 3mins", "Survival"))

Exercise

Discuss why the GAM/survival model estimates are different from removal models.

Compare these models and phenology curves for a different species, e.g. a winter resident.