Jekyll2024-01-10T00:51:38-07:00https://peter.solymos.org/https://peter.solymos.org/feed.xmlPéter SólymosClosing the gap between data and decision making<a href="https://peter.solymos.org">Péter Sólymos</a>CalgaryR & YEGRUG Meetup: Data Cloning - Hierarchical Models Made Easy2023-04-11T00:00:00-06:002023-04-11T00:00:00-06:00https://peter.solymos.org/https://peter.solymos.org/talks/2023/04/11/data-cloning-workshop-in-edmonton<p>I moved to Canada in 2008 to start a postdoctoral fellowship with Prof. Subhash Lele at the stats department of the University of Alberta. Subhash at the time just published a paper about a statistical technique called data cloning. Data cloning is a way to use Bayesian MCMC algorithms to do frequentist inference. Yes, you read that right.</p>
<p>I learnt about WinBUGS, OpenBUGS, then JAGS, and used data cloning for some projects. And I started abstracting away the workflow and in 2009 I submitted the <a href="https://CRAN.R-project.org/package=dclone">dclone</a> package to CRAN. The dclone package is still alive and well, now even supports Stan.</p>
<p>Since then, Subhash has retired and I moved jobs. We thought recently that we should do another data cloning workshop like we did a few times over the years. And now it is coming up on April 13, 2023, as a joint meetup of the Edmonton and Calgary R User groups.</p>
<p>Mixed models, also known as hierarchical models and multilevel models, is a useful class of models for applied sciences. The goal of the workshop is to give an introduction to the logic, theory, and implementation of these models to solve practical problems. The workshop will include a seminar style overview and hands on exercises including common model classes and examples that participants can extend for their own needs.</p>
<p>Find more about the course on the GitHub site: <a href="https://github.com/datacloning/workshop-2023-edmonton">https://github.com/datacloning/workshop-2023-edmonton</a>.</p><a href="https://peter.solymos.org">Péter Sólymos</a>I moved to Canada in 2008 to start a postdoctoral fellowship with Prof. Subhash Lele at the stats department of the University of Alberta. Subhash at the time just published a paper about a statistical technique called data cloning. Data cloning is a way to use Bayesian MCMC algorithms to do frequentist inference. Yes, you read that right.How many birds are out there?2020-06-22T00:00:00-06:002020-06-22T00:00:00-06:00https://peter.solymos.org/https://peter.solymos.org/etc/2020/06/22/how-many-birds-are-out-there<p>In a recent paper entitled “<em>Lessons learned from comparing spatially explicit models and the Partners in Flight approach to estimate population sizes of boreal birds in Alberta, Canada</em>” we developed improved, spatially explicit models for 81 land bird species in northern Alberta, Canada. We then compared these estimates of bird abundance to a commonly-used but non-spatially explicit estimate by Partners in Flight (<a href="http://pif.birdconservancy.org/PopEstimates/">PIF v 3.0</a>) that’s based on the North American Breeding Bird Survey (<a href="https://www.pwrc.usgs.gov/bbs/">BBS</a>) data set. The publication is a result of years of collaboration between the <a href="http://abmi.ca">ABMI</a>, Boreal Avian Modelling (<a href="https://borealbirds.ualberta.ca/">BAM</a>) project, Canadian Wildlife Service (<a href="https://www.canada.ca/en/environment-climate-change.html">Environment and Climate Change Canada</a>), and <a href="https://www.usgs.gov/">United States Geological Survey</a>.</p>
<p><img src="https://peter.solymos.org/images/2020/06/22/popsize.png" class="img-responsive" alt="Population sizes" /></p>
<p>The paper represents a major step forwards in understanding the complexity of population size estimation. We paid special attention to framing the implications for conservation and management of species at risk and for improving future data collection. It was published in the Condor, and it is open access:</p>
<ul>
<li>publication: <a href="https://doi.org/10.1093/condor/duaa007">https://doi.org/10.1093/condor/duaa007</a></li>
<li>supporting information: <a href="http://doi.org/10.5281/zenodo.3563112">http://doi.org/10.5281/zenodo.3563112</a></li>
</ul>
<p>Here is the abstract:</p>
<p><em>Estimating the population abundance of landbirds is a challenging task complicated by the amount, type, and quality of available data. Avian conservationists have relied on population estimates from Partners in Flight (PIF), which primarily uses roadside data from the North American Breeding Bird Survey (BBS). However, the BBS was not designed to estimate population sizes. Therefore, we set out to compare the PIF approach with spatially explicit models incorporating roadside and off-road point-count surveys. We calculated population estimates for 81 landbird species in Bird Conservation Region 6 in Alberta, Canada, using land cover and climate as predictors. We also developed a framework to evaluate how the differences between the detection distance, time-of-day, roadside count, and habitat representation adjustments explain discrepancies between the 2 estimators. We showed that the key assumptions of the PIF population estimator were commonly violated in this region, and that the 2 approaches provided different population estimates for most species. The average differences between estimators were explained by differences in the detection-distance and time-of-day components, but these adjustments left much unexplained variation among species. Differences in the roadside count and habitat representation components explained most of the among-species variation. The variation caused by these factors was large enough to change the population ranking of the species. The roadside count bias needs serious attention when roadside surveys are used to extrapolate over off-road areas. Habitat representation bias is likely prevalent in regions sparsely and non-representatively sampled by roadside surveys, such as the boreal region of North America, and thus population estimates for these regions need to be treated with caution for certain species. Additional sampling and integrated modeling of available data sources can contribute towards more accurate population estimates for conservation in remote areas of North America.</em></p>
<p>I am not going to provide another perspective here, but rather, I list all the other sources out there that I wrote in connection with the paper, or commentaries that arose in news outlets.</p>
<p>Blog posts:</p>
<ul>
<li><a href="https://americanornithology.org/three-things-you-should-know-about-population-estimation/"><em>Three things you should know about population estimation</em></a>, the AOS blog</li>
<li><a href="http://blog.abmi.ca/2020/06/17/made-in-alberta-models-help-continental-bird-conservation/"><em>Made in Alberta models help continental bird conservation</em></a>, the ABMI blog</li>
</ul>
<p>In the news:</p>
<ul>
<li><a href="https://www.cbc.ca/news/canada/edmonton/survey-estimates-much-higher-alberta-bird-populations-than-thought-1.5621451">CBC News, Edmonton</a></li>
<li><a href="https://edmonton.ctvnews.ca/survey-estimates-much-higher-alberta-bird-populations-than-thought-1.4993454">CTV News, Edmonton</a></li>
</ul>
<p>Presentations:</p>
<ul>
<li><a href="https://speakerdeck.com/psolymos/comparing-the-pif-approach-to-a-pixel-based-approach-for-birds-in-alberta">this one walks you through the math nicely</a></li>
</ul>
<p>Social activity on <a href="https://www.altmetric.com/details/82339275">Altmetric</a>:</p>
<script type="text/javascript" src="https://d1bxh8uas1mnw7.cloudfront.net/assets/embed.js"></script>
<div class="altmetric-embed" data-badge-type="donut" data-altmetric-id="82339275"></div><a href="https://peter.solymos.org">Péter Sólymos</a>In a recent paper entitled “Lessons learned from comparing spatially explicit models and the Partners in Flight approach to estimate population sizes of boreal birds in Alberta, Canada” we developed improved, spatially explicit models for 81 land bird species in northern Alberta, Canada. We then compared these estimates of bird abundance to a commonly-used but non-spatially explicit estimate by Partners in Flight (PIF v 3.0) that’s based on the North American Breeding Bird Survey (BBS) data set. The publication is a result of years of collaboration between the ABMI, Boreal Avian Modelling (BAM) project, Canadian Wildlife Service (Environment and Climate Change Canada), and United States Geological Survey.Fitting removal models with the detect R package2018-08-30T00:00:00-06:002018-08-30T00:00:00-06:00https://peter.solymos.org/https://peter.solymos.org/code/2018/08/30/fitting-removal-models-with-the-detect-r-package<p>In a paper recently published in the <a href="http://www.americanornithologypubs.org/">Condor</a>, titled <em>Evaluating time-removal models for estimating availability of boreal birds during point-count surveys: sample size requirements and model complexity</em>, we assessed different ways of controlling for point-count duration in bird counts using data from the <a href="http://www.borealbirds.ca/">Boreal Avian Modelling Project</a>. As the title indicates, the paper describes a cost-benefit analysis to make recommendations about when to use different types of the removal model. The paper is open access, so feel free to read the <a href="https://dx.doi.org/10.1650/CONDOR-18-32.1">whole paper here</a>.</p>
<p><img src="https://peter.solymos.org/images/2018/08/30/example-species.png" class="img-responsive" alt="Example species" /></p>
<p>In summary, we evaluated a conventional removal model and a finite mixture removal model, with and without covariates, for 152 bird species. We found that the probabilities of predicted availability under conventional and finite mixture models were very similar with respect to the range of probability values and the shape of the response curves to predictor variables. However, finite mixture models were better supported for the large majority of species. We also found overwhelming support for time-varying models irrespective of the parametrization.</p>
<p>I have written a related post about the journey that led to this paper (<a href="https://americanornithologypubsblog.org/2018/08/29/author-blog-count-me-in-i-am-available-for-detection-at-6-am-on-may-26th/"><em>Count me in! I am available for detection at 6 AM on May 26th</em></a>), in this post I describe the math behind the removal modeling as implemented in the <a href="https://cran.r-project.org/package=detect"><strong>detect</strong></a> <a href="https://r-project.org/">R</a> package.</p>
<h2 id="continuous-time-removal-models">Continuous time-removal models</h2>
<p>It has long been recognized that nearly all avian field surveys underestimate abundances, unless the estimates are adjusted for the proportion of birds present but undetected at the times and locations surveyed. Detectability is the product of the probability that birds make themselves available for detection by emitting detectable cues (availability); and the probability that an available bird will be perceived by a bird surveyor (perceptibility).</p>
<p>The time-removal model, originally developed for estimating wildlife and fish abundances from mark-recapture studies, was later reformulated for avian surveys with the goal of improving estimates of bird abundance by accounting for the availability bias inherent in point-count data. The removal model applied to point-count surveys estimates the probability that a bird is available for detection as a function of the average number of detectable cues that an individual bird gives per minute (singing rate), and the known count duration.</p>
<p>Time-removal models are based on a removal experiment whereby animals are trapped and thereby removed from the closed population of animals being sampled. When applying a removal model to avian point-count surveys, the counts of singing birds (\(Y_{ij}, \ldots, Y_{iJ}\)) within a given point-count survey \(i\) (\(i = 1,\ldots, n\)) are tallied relative to when each bird is first detected in multiple and consecutive time intervals, with the survey start time \(t_{i0} = 0\), the end times of the time intervals \(t_{ij}\) (\(j = 1, 2,\ldots, J\)), and the total count duration of the survey \(t_{iJ}\). We count each individual bird once, so individuals are ‘mentally removed’ from a closed population of undetected birds by the surveyor.</p>
<h2 id="data-requirements">Data requirements</h2>
<p>We have just defined the kind of data we need for the removal models. In this post, I am going to use a data set from our paper about comparing human observer based counts to automated recording units, <a href="https://doi.org/10.5751/ACE-00975-120113"><em>Paired sampling standardizes point count data from humans and acoustic recorders</em></a>. The data set we used is wrapped up in an R package called <a href="https://github.com/borealbirds/paired"><strong>paired</strong></a> (thanks for Steve Van Wilgenburg for suggestions on this post and for agreeing to share this data set).</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">require</span><span class="p">(</span><span class="n">paired</span><span class="p">))</span><span class="w">
</span><span class="n">devtools</span><span class="o">::</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"borealbirds/paired"</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">paired</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="p">(</span><span class="n">paired</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>We will use the counts for Ovenbird, one of the most common species in the data set (abbreviated as <code class="language-plaintext highlighter-rouge">"OVEN"</code>). The data is in long format, so I am using the <a href="https://cran.r-project.org/package=mefa4"><strong>mefa4</strong></a> R package to make the sample by species cross-tabulation. Then subsetting the data to retain samples obtained by human observers, then getting rid of missing predictor data. For predictors, we will use a variable capturing date (<code class="language-plaintext highlighter-rouge">JDAY</code>; standardized ordinal day of the year) and an other one capturing time of day (<code class="language-plaintext highlighter-rouge">TSSR</code>; time since local sunrise).</p>
<p>The data frame <code class="language-plaintext highlighter-rouge">X</code> contains the predictors. The matrix <code class="language-plaintext highlighter-rouge">Y</code> contains the counts of newly counted individuals binned into consecutive time intervals (0–3, 3–5, 5–10 minutes): cell values are the \(Y_{ij}\)’s. The <code class="language-plaintext highlighter-rouge">D</code> object is another matrix mirroring the structure of <code class="language-plaintext highlighter-rouge">Y</code>
but instead of counts, it contains the interval end times: cell values are
the \(t_{ij}\)’s.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">mefa4</span><span class="p">)</span><span class="w">
</span><span class="n">spp</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"OVEN"</span><span class="w">
</span><span class="n">xt</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">Xtab</span><span class="p">(</span><span class="n">Count</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">PKEY</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">Interval</span><span class="p">,</span><span class="w"> </span><span class="n">paired</span><span class="p">,</span><span class="w">
</span><span class="n">subset</span><span class="o">=</span><span class="n">paired</span><span class="o">$</span><span class="n">SurveyType</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"HUM"</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="n">paired</span><span class="o">$</span><span class="n">SPECIES</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">spp</span><span class="p">)</span><span class="w">
</span><span class="n">Y</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">as.matrix</span><span class="p">(</span><span class="n">xt</span><span class="p">[,</span><span class="nf">c</span><span class="p">(</span><span class="s2">"0-3 min"</span><span class="p">,</span><span class="w"> </span><span class="s2">"3-5 min"</span><span class="p">,</span><span class="w"> </span><span class="s2">"5-10 min"</span><span class="p">)])</span><span class="w">
</span><span class="n">X</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">nonDuplicated</span><span class="p">(</span><span class="n">paired</span><span class="p">[</span><span class="n">paired</span><span class="o">$</span><span class="n">SurveyType</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"HUM"</span><span class="p">,],</span><span class="w">
</span><span class="n">PKEY</span><span class="p">,</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)[</span><span class="n">rownames</span><span class="p">(</span><span class="n">Y</span><span class="p">),]</span><span class="w">
</span><span class="n">i</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">X</span><span class="o">$</span><span class="n">Latitude</span><span class="p">)</span><span class="w">
</span><span class="n">Y</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">Y</span><span class="p">[</span><span class="n">i</span><span class="p">,]</span><span class="w">
</span><span class="n">X</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">,</span><span class="nf">c</span><span class="p">(</span><span class="s2">"JDAY"</span><span class="p">,</span><span class="w"> </span><span class="s2">"TSSR"</span><span class="p">)]</span><span class="w">
</span><span class="n">D</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">matrix</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="m">10</span><span class="p">),</span><span class="w"> </span><span class="n">nrow</span><span class="p">(</span><span class="n">Y</span><span class="p">),</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="n">byrow</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="nf">dimnames</span><span class="p">(</span><span class="n">D</span><span class="p">)</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">dimnames</span><span class="p">(</span><span class="n">Y</span><span class="p">)</span><span class="w">
</span><span class="n">tail</span><span class="p">(</span><span class="n">X</span><span class="p">)</span><span class="w">
</span><span class="c1">## JDAY TSSR</span><span class="w">
</span><span class="c1">## 96PA-C2-B_2 0.4383562 0.13468851</span><span class="w">
</span><span class="c1">## 96PA-C2-C_1 0.4438356 0.08526145</span><span class="w">
</span><span class="c1">## 96PA-C2-C_2 0.4383562 0.15273704</span><span class="w">
</span><span class="c1">## 96PA-C2-D_1 0.4438356 0.13597129</span><span class="w">
</span><span class="c1">## 96PA-C2-E_1 0.4438356 0.10122629</span><span class="w">
</span><span class="c1">## 96PA-C2-F_1 0.4438356 0.11999388</span><span class="w">
</span><span class="n">tail</span><span class="p">(</span><span class="n">Y</span><span class="p">)</span><span class="w">
</span><span class="c1">## 0-3 min 3-5 min 5-10 min</span><span class="w">
</span><span class="c1">## 96PA-C2-B_2 5 0 0</span><span class="w">
</span><span class="c1">## 96PA-C2-C_1 2 0 0</span><span class="w">
</span><span class="c1">## 96PA-C2-C_2 2 0 0</span><span class="w">
</span><span class="c1">## 96PA-C2-D_1 7 0 0</span><span class="w">
</span><span class="c1">## 96PA-C2-E_1 2 0 0</span><span class="w">
</span><span class="c1">## 96PA-C2-F_1 2 0 0</span><span class="w">
</span><span class="n">tail</span><span class="p">(</span><span class="n">D</span><span class="p">)</span><span class="w">
</span><span class="m">0-3</span><span class="w"> </span><span class="n">min</span><span class="w"> </span><span class="m">3-5</span><span class="w"> </span><span class="n">min</span><span class="w"> </span><span class="m">5-10</span><span class="w"> </span><span class="n">min</span><span class="w">
</span><span class="c1">## 96PA-C2-B_2 3 5 10</span><span class="w">
</span><span class="c1">## 96PA-C2-C_1 3 5 10</span><span class="w">
</span><span class="c1">## 96PA-C2-C_2 3 5 10</span><span class="w">
</span><span class="c1">## 96PA-C2-D_1 3 5 10</span><span class="w">
</span><span class="c1">## 96PA-C2-E_1 3 5 10</span><span class="w">
</span><span class="c1">## 96PA-C2-F_1 3 5 10</span><span class="w">
</span></code></pre></div></div>
<h2 id="time-invariant-conventional-removal-model">Time-invariant conventional removal model</h2>
<p>In the simplest continuous time-removal model, singing events by individual birds are assumed to follow a Poisson process. We can use the rate parameter of the Poisson process (\(\phi\)) to estimate the singing rate of birds during a point count.</p>
<p>In the time-invariant conventional removal model (<code class="language-plaintext highlighter-rouge">Me0</code>), the individuals of a species at a given location and time are assumed to be homogeneous in their singing rates. The time to first detection follows the exponential distribution \(f(t_{ij}) = \phi exp(-t_{ij} \phi)\), and the cumulative density function of times to first detection in time interval (0, \(t_{iJ}\)) gives us the probability that a bird sings at least once during the point count as \(p(t_{iJ}) = 1 - exp(-t_{iJ} \phi)\).</p>
<p>We use the <code class="language-plaintext highlighter-rouge">cmulti</code> function from the <strong>detect</strong> R package to fit the removal models. The algorithm used in the function is based on conditional maximum likelihood, and is described in <a href="http://dx.doi.org/10.1111/2041-210X.12106">this paper</a> its <a href="https://github.com/psolymos/QPAD/tree/master/inst/doc/v2">supporting material</a>.
We are using the <code class="language-plaintext highlighter-rouge">type = "rem"</code> for conventional removal models.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">detect</span><span class="p">)</span><span class="w">
</span><span class="n">Me0</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cmulti</span><span class="p">(</span><span class="n">Y</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">D</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">X</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="o">=</span><span class="s2">"rem"</span><span class="p">)</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">Me0</span><span class="p">)</span><span class="w">
</span><span class="c1">## Call:</span><span class="w">
</span><span class="c1">## cmulti(formula = Y | D ~ 1, data = X, type = "rem")</span><span class="w">
</span><span class="c1">##</span><span class="w">
</span><span class="c1">## Removal Sampling (homogeneous singing rate)</span><span class="w">
</span><span class="c1">## Conditional Maximum Likelihood estimates</span><span class="w">
</span><span class="c1">##</span><span class="w">
</span><span class="c1">## Coefficients:</span><span class="w">
</span><span class="c1">## Estimate Std. Error z value Pr(>|z|)</span><span class="w">
</span><span class="c1">## log.phi_(Intercept) -0.91751 0.05826 -15.75 <2e-16 ***</span><span class="w">
</span><span class="c1">## ---</span><span class="w">
</span><span class="c1">## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1</span><span class="w">
</span><span class="c1">##</span><span class="w">
</span><span class="c1">## Log-likelihood: -272.1</span><span class="w">
</span><span class="c1">## BIC = 549.4</span><span class="w">
</span></code></pre></div></div>
<h2 id="time-varying-conventional-removal-models">Time-varying conventional removal models</h2>
<p>Singing rates of birds vary with time of day, time of year, breeding status, and stage of the nesting cycle. Thus, removal model estimates of availability may be improved by accounting for variation in singing rates using covariates for day of year and time of day. In this case \(p(t_{iJ}) = 1 - exp(-t_{iJ} \phi_{i})\) and \(log(\phi_{i}) = \beta_{0} + \sum^{K}_{k=1} \beta_{k} x_{ik}\) is the linear predictor with \(K\) covariates and the corresponding unknown coefficients (\(\beta_{k}\), \(k = 0,\ldots, K\)).</p>
<p>We could fit all the possible multivariate and nonlinear models as we did in the paper, but
let’s just keep it simple for now and fit models with <code class="language-plaintext highlighter-rouge">JDAY</code> and <code class="language-plaintext highlighter-rouge">TSSR</code> as covariates
(models <code class="language-plaintext highlighter-rouge">Me1</code> and <code class="language-plaintext highlighter-rouge">Me2</code>).</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Me1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cmulti</span><span class="p">(</span><span class="n">Y</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">D</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">JDAY</span><span class="p">,</span><span class="w"> </span><span class="n">X</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="o">=</span><span class="s2">"rem"</span><span class="p">)</span><span class="w">
</span><span class="n">Me2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cmulti</span><span class="p">(</span><span class="n">Y</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">D</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">TSSR</span><span class="p">,</span><span class="w"> </span><span class="n">X</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="o">=</span><span class="s2">"rem"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>Now compare the three conventional models based on AIC and inspect the summary for the best supported model with the <code class="language-plaintext highlighter-rouge">JDAY</code> effect.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Me_AIC</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">AIC</span><span class="p">(</span><span class="n">Me0</span><span class="p">,</span><span class="w"> </span><span class="n">Me1</span><span class="p">,</span><span class="w"> </span><span class="n">Me2</span><span class="p">)</span><span class="w">
</span><span class="n">Me_AIC</span><span class="o">$</span><span class="n">dAIC</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">Me_AIC</span><span class="o">$</span><span class="n">AIC</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nf">min</span><span class="p">(</span><span class="n">Me_AIC</span><span class="o">$</span><span class="n">AIC</span><span class="p">)</span><span class="w">
</span><span class="n">MeBest</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">get</span><span class="p">(</span><span class="n">rownames</span><span class="p">(</span><span class="n">Me_AIC</span><span class="p">)[</span><span class="n">Me_AIC</span><span class="o">$</span><span class="n">dAIC</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">0</span><span class="p">])</span><span class="w">
</span><span class="n">Me_AIC</span><span class="w">
</span><span class="c1">## df AIC dAIC</span><span class="w">
</span><span class="c1">## Me0 1 546.1270 0.7187895</span><span class="w">
</span><span class="c1">## Me1 2 545.4082 0.0000000</span><span class="w">
</span><span class="c1">## Me2 2 546.4612 1.0529236</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">MeBest</span><span class="p">)</span><span class="w">
</span><span class="c1">## Call:</span><span class="w">
</span><span class="c1">## cmulti(formula = Y | D ~ JDAY, data = X, type = "rem")</span><span class="w">
</span><span class="c1">##</span><span class="w">
</span><span class="c1">## Removal Sampling (homogeneous singing rate)</span><span class="w">
</span><span class="c1">## Conditional Maximum Likelihood estimates</span><span class="w">
</span><span class="c1">##</span><span class="w">
</span><span class="c1">## Coefficients:</span><span class="w">
</span><span class="c1">## Estimate Std. Error z value Pr(>|z|)</span><span class="w">
</span><span class="c1">## log.phi_(Intercept) 1.460 1.471 0.993 0.321</span><span class="w">
</span><span class="c1">## log.phi_JDAY -5.235 3.247 -1.612 0.107</span><span class="w">
</span><span class="c1">##</span><span class="w">
</span><span class="c1">## Log-likelihood: -270.7</span><span class="w">
</span><span class="c1">## BIC = 552</span><span class="w">
</span></code></pre></div></div>
<p>To visually capture the time-varying effects, we make some plots using base graphics, colors matching the time-varying predictor. This way we can not only assess how availability probability (given a fixed time interval) is changing with the values of the predictor, but also how the cumulative distribution changes with time.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">n</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">100</span><span class="w">
</span><span class="n">JDAY</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="nf">min</span><span class="p">(</span><span class="n">X</span><span class="o">$</span><span class="n">JDAY</span><span class="p">),</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">X</span><span class="o">$</span><span class="n">JDAY</span><span class="p">),</span><span class="w"> </span><span class="n">length.out</span><span class="o">=</span><span class="n">n</span><span class="m">+1</span><span class="p">)</span><span class="w">
</span><span class="n">TSSR</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="nf">min</span><span class="p">(</span><span class="n">X</span><span class="o">$</span><span class="n">TSSR</span><span class="p">),</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">X</span><span class="o">$</span><span class="n">TSSR</span><span class="p">),</span><span class="w"> </span><span class="n">length.out</span><span class="o">=</span><span class="n">n</span><span class="m">+1</span><span class="p">)</span><span class="w">
</span><span class="n">Duration</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">length.out</span><span class="o">=</span><span class="n">n</span><span class="p">)</span><span class="w">
</span><span class="n">col</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">viridis</span><span class="o">::</span><span class="n">viridis</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w">
</span><span class="n">b</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">coef</span><span class="p">(</span><span class="n">MeBest</span><span class="p">)</span><span class="w">
</span><span class="n">op</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">par</span><span class="p">(</span><span class="n">las</span><span class="o">=</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">mfrow</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">2</span><span class="p">,</span><span class="m">1</span><span class="p">),</span><span class="w"> </span><span class="n">mar</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">4</span><span class="p">,</span><span class="m">4</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">2</span><span class="p">))</span><span class="w">
</span><span class="n">p1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">1</span><span class="o">-</span><span class="nf">exp</span><span class="p">(</span><span class="m">-3</span><span class="o">*</span><span class="nf">exp</span><span class="p">(</span><span class="n">b</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="o">+</span><span class="n">b</span><span class="p">[</span><span class="m">2</span><span class="p">]</span><span class="o">*</span><span class="n">JDAY</span><span class="p">))</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">JDAY</span><span class="p">,</span><span class="w"> </span><span class="n">p1</span><span class="p">,</span><span class="w"> </span><span class="n">ylim</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">1</span><span class="p">),</span><span class="w"> </span><span class="n">type</span><span class="o">=</span><span class="s2">"n"</span><span class="p">,</span><span class="w">
</span><span class="n">main</span><span class="o">=</span><span class="n">paste</span><span class="p">(</span><span class="n">spp</span><span class="p">,</span><span class="w"> </span><span class="n">rownames</span><span class="p">(</span><span class="n">Me_AIC</span><span class="p">)[</span><span class="n">Me_AIC</span><span class="o">$</span><span class="n">dAIC</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">0</span><span class="p">]),</span><span class="w">
</span><span class="n">ylab</span><span class="o">=</span><span class="s2">"P(availability)"</span><span class="p">)</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nf">seq_len</span><span class="p">(</span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">lines</span><span class="p">(</span><span class="n">JDAY</span><span class="p">[</span><span class="nf">c</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="n">i</span><span class="m">+1</span><span class="p">)],</span><span class="w"> </span><span class="n">p1</span><span class="p">[</span><span class="nf">c</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="n">i</span><span class="m">+1</span><span class="p">)],</span><span class="w"> </span><span class="n">col</span><span class="o">=</span><span class="n">col</span><span class="p">[</span><span class="n">i</span><span class="p">],</span><span class="w"> </span><span class="n">lwd</span><span class="o">=</span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">Duration</span><span class="p">,</span><span class="w"> </span><span class="n">Duration</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="o">=</span><span class="s2">"n"</span><span class="p">,</span><span class="w"> </span><span class="n">ylim</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">1</span><span class="p">),</span><span class="w">
</span><span class="n">ylab</span><span class="o">=</span><span class="s2">"P(availability)"</span><span class="p">)</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nf">seq_len</span><span class="p">(</span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">p2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">1</span><span class="o">-</span><span class="nf">exp</span><span class="p">(</span><span class="o">-</span><span class="n">Duration</span><span class="o">*</span><span class="nf">exp</span><span class="p">(</span><span class="n">b</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="o">+</span><span class="n">b</span><span class="p">[</span><span class="m">2</span><span class="p">]</span><span class="o">*</span><span class="n">JDAY</span><span class="p">[</span><span class="n">i</span><span class="p">]))</span><span class="w">
</span><span class="n">lines</span><span class="p">(</span><span class="n">Duration</span><span class="p">,</span><span class="w"> </span><span class="n">p2</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="o">=</span><span class="n">col</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">abline</span><span class="p">(</span><span class="n">v</span><span class="o">=</span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="o">=</span><span class="s2">"grey"</span><span class="p">)</span><span class="w">
</span><span class="n">par</span><span class="p">(</span><span class="n">op</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="https://peter.solymos.org/images/2018/08/30/MePlot.png" class="img-responsive" alt="Me model predictions" /></p>
<h2 id="time-invariant-finite-mixture-removal-model">Time-invariant finite mixture removal model</h2>
<p>The removal model can also accommodate behavioral heterogeneity in singing by subdividing the sampled population for a species at a given point into a finite mixture of birds with low and high singing rates, which requires the additional estimation of the proportion of birds in the sampled population with low singing rates.</p>
<p>In the continuous-time formulation of the finite mixture (or two-point mixture) removal model, the cumulative density function during a point count is given by \(p(t_{iJ}) = (1 - c) 1 + c [1 - exp(-t_{iJ} \phi)] = 1 - c exp(-t_{iJ} \phi)\), where \(\phi\) is the singing rate for the group of infrequently singing birds, and \(c\) is the proportion of birds during the point count that are infrequent singers. The remaining proportions (\(1 - c\); the intercept of the cumulative density function) of the frequent singers are assumed to be detected instantaneously at the start of the first time interval. In the simplest form of the finite mixture model, the proportion and singing rate of birds that sing infrequently is homogeneous across all times and locations (model <code class="language-plaintext highlighter-rouge">Mf0</code>). We are using the <code class="language-plaintext highlighter-rouge">type = "fmix"</code> for finite mixture removal models.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Mf0</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cmulti</span><span class="p">(</span><span class="n">Y</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">D</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">X</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="o">=</span><span class="s2">"fmix"</span><span class="p">)</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">Mf0</span><span class="p">)</span><span class="w">
</span><span class="c1">## Call:</span><span class="w">
</span><span class="c1">## cmulti(formula = Y | D ~ 1, data = X, type = "fmix")</span><span class="w">
</span><span class="c1">##</span><span class="w">
</span><span class="c1">## Removal Sampling (heterogeneous singing rate)</span><span class="w">
</span><span class="c1">## Conditional Maximum Likelihood estimates</span><span class="w">
</span><span class="c1">##</span><span class="w">
</span><span class="c1">## Coefficients:</span><span class="w">
</span><span class="c1">## Estimate Std. Error z value Pr(>|z|)</span><span class="w">
</span><span class="c1">## log.phi_(Intercept) -2.1902 0.4914 -4.457 8.32e-06 ***</span><span class="w">
</span><span class="c1">## logit.c 0.1182 0.1543 0.766 0.444</span><span class="w">
</span><span class="c1">## ---</span><span class="w">
</span><span class="c1">## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1</span><span class="w">
</span><span class="c1">##</span><span class="w">
</span><span class="c1">## Log-likelihood: -257.6</span><span class="w">
</span><span class="c1">## BIC = 525.8</span><span class="w">
</span></code></pre></div></div>
<h2 id="time-varying-finite-mixture-removal-models">Time-varying finite mixture removal models</h2>
<p>Previously, researchers (see refs in the paper) have applied covariate effects on the parameter \(\phi_{i}\) of the finite mixture model, similarly to how we modeled these effects in conventional models. This model assumes that the parameter \(c\) is constant irrespective of time and location (i.e. only the infrequent singer group changes its singing behavior).</p>
<p>We can fit finite mixture models with <code class="language-plaintext highlighter-rouge">JDAY</code> and <code class="language-plaintext highlighter-rouge">TSSR</code> as covariates on \(\phi\)
(models <code class="language-plaintext highlighter-rouge">Mf1</code> and <code class="language-plaintext highlighter-rouge">Mf2</code>). In this case \(p(t_{iJ}) = 1 - c exp(-t_{iJ} \phi_{i})\) and \(log(\phi_{i}) = \beta_{0} + \sum^{K}_{k=1} \beta_{k} x_{ik}\) is the linear predictor with \(K\) covariates and the corresponding unknown coefficients (\(\beta_{k}\), \(k = 0,\ldots, K\)).</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Mf1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cmulti</span><span class="p">(</span><span class="n">Y</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">D</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">JDAY</span><span class="p">,</span><span class="w"> </span><span class="n">X</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="o">=</span><span class="s2">"fmix"</span><span class="p">)</span><span class="w">
</span><span class="n">Mf2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cmulti</span><span class="p">(</span><span class="n">Y</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">D</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">TSSR</span><span class="p">,</span><span class="w"> </span><span class="n">X</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="o">=</span><span class="s2">"fmix"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>Compare the three finite mixture models based on AIC and inspect the summary for the best supported model with the <code class="language-plaintext highlighter-rouge">TSSR</code> effect in this case.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Mf_AIC</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">AIC</span><span class="p">(</span><span class="n">Mf0</span><span class="p">,</span><span class="w"> </span><span class="n">Mf1</span><span class="p">,</span><span class="w"> </span><span class="n">Mf2</span><span class="p">)</span><span class="w">
</span><span class="n">Mf_AIC</span><span class="o">$</span><span class="n">dAIC</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">Mf_AIC</span><span class="o">$</span><span class="n">AIC</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nf">min</span><span class="p">(</span><span class="n">Mf_AIC</span><span class="o">$</span><span class="n">AIC</span><span class="p">)</span><span class="w">
</span><span class="n">MfBest</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">get</span><span class="p">(</span><span class="n">rownames</span><span class="p">(</span><span class="n">Mf_AIC</span><span class="p">)[</span><span class="n">Mf_AIC</span><span class="o">$</span><span class="n">dAIC</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">0</span><span class="p">])</span><span class="w">
</span><span class="n">Mf_AIC</span><span class="w">
</span><span class="c1">## df AIC dAIC</span><span class="w">
</span><span class="c1">## Mf0 2 519.2222 0.1053855</span><span class="w">
</span><span class="c1">## Mf1 3 520.4007 1.2838985</span><span class="w">
</span><span class="c1">## Mf2 3 519.1168 0.0000000</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">MfBest</span><span class="p">)</span><span class="w">
</span><span class="c1">## Call:</span><span class="w">
</span><span class="c1">## cmulti(formula = Y | D ~ TSSR, data = X, type = "fmix")</span><span class="w">
</span><span class="c1">##</span><span class="w">
</span><span class="c1">## Removal Sampling (heterogeneous singing rate)</span><span class="w">
</span><span class="c1">## Conditional Maximum Likelihood estimates</span><span class="w">
</span><span class="c1">##</span><span class="w">
</span><span class="c1">## Coefficients:</span><span class="w">
</span><span class="c1">## Estimate Std. Error z value Pr(>|z|)</span><span class="w">
</span><span class="c1">## log.phi_(Intercept) -1.1939 0.4195 -2.846 0.00442 **</span><span class="w">
</span><span class="c1">## log.phi_TSSR -9.0089 4.7712 -1.888 0.05900 .</span><span class="w">
</span><span class="c1">## logit.c 0.2016 0.1702 1.184 0.23622</span><span class="w">
</span><span class="c1">## ---</span><span class="w">
</span><span class="c1">## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1</span><span class="w">
</span><span class="c1">##</span><span class="w">
</span><span class="c1">## Log-likelihood: -256.6</span><span class="w">
</span><span class="c1">## BIC = 529</span><span class="w">
</span></code></pre></div></div>
<p>We produce a similar plot as before.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">b</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">coef</span><span class="p">(</span><span class="n">MfBest</span><span class="p">)</span><span class="w">
</span><span class="n">op</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">par</span><span class="p">(</span><span class="n">las</span><span class="o">=</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">mfrow</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">2</span><span class="p">,</span><span class="m">1</span><span class="p">),</span><span class="w"> </span><span class="n">mar</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">4</span><span class="p">,</span><span class="m">4</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">2</span><span class="p">))</span><span class="w">
</span><span class="n">p1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">1</span><span class="o">-</span><span class="n">plogis</span><span class="p">(</span><span class="n">b</span><span class="p">[</span><span class="m">3</span><span class="p">])</span><span class="o">*</span><span class="nf">exp</span><span class="p">(</span><span class="m">-3</span><span class="o">*</span><span class="nf">exp</span><span class="p">(</span><span class="n">b</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="o">+</span><span class="n">b</span><span class="p">[</span><span class="m">2</span><span class="p">]</span><span class="o">*</span><span class="n">TSSR</span><span class="p">))</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">TSSR</span><span class="p">,</span><span class="w"> </span><span class="n">p1</span><span class="p">,</span><span class="w"> </span><span class="n">ylim</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">1</span><span class="p">),</span><span class="w"> </span><span class="n">type</span><span class="o">=</span><span class="s2">"n"</span><span class="p">,</span><span class="w">
</span><span class="n">main</span><span class="o">=</span><span class="n">paste</span><span class="p">(</span><span class="n">spp</span><span class="p">,</span><span class="w"> </span><span class="n">rownames</span><span class="p">(</span><span class="n">Mf_AIC</span><span class="p">)[</span><span class="n">Mf_AIC</span><span class="o">$</span><span class="n">dAIC</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">0</span><span class="p">]),</span><span class="w">
</span><span class="n">ylab</span><span class="o">=</span><span class="s2">"P(availability)"</span><span class="p">)</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nf">seq_len</span><span class="p">(</span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">lines</span><span class="p">(</span><span class="n">TSSR</span><span class="p">[</span><span class="nf">c</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="n">i</span><span class="m">+1</span><span class="p">)],</span><span class="w"> </span><span class="n">p1</span><span class="p">[</span><span class="nf">c</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="n">i</span><span class="m">+1</span><span class="p">)],</span><span class="w"> </span><span class="n">col</span><span class="o">=</span><span class="n">col</span><span class="p">[</span><span class="n">i</span><span class="p">],</span><span class="w"> </span><span class="n">lwd</span><span class="o">=</span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">Duration</span><span class="p">,</span><span class="w"> </span><span class="n">Duration</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="o">=</span><span class="s2">"n"</span><span class="p">,</span><span class="w"> </span><span class="n">ylim</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">1</span><span class="p">),</span><span class="w">
</span><span class="n">ylab</span><span class="o">=</span><span class="s2">"P(availability)"</span><span class="p">)</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nf">seq_len</span><span class="p">(</span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">p2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">1</span><span class="o">-</span><span class="n">plogis</span><span class="p">(</span><span class="n">b</span><span class="p">[</span><span class="m">3</span><span class="p">])</span><span class="o">*</span><span class="nf">exp</span><span class="p">(</span><span class="o">-</span><span class="n">Duration</span><span class="o">*</span><span class="nf">exp</span><span class="p">(</span><span class="n">b</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="o">+</span><span class="n">b</span><span class="p">[</span><span class="m">2</span><span class="p">]</span><span class="o">*</span><span class="n">TSSR</span><span class="p">[</span><span class="n">i</span><span class="p">]))</span><span class="w">
</span><span class="n">lines</span><span class="p">(</span><span class="n">Duration</span><span class="p">,</span><span class="w"> </span><span class="n">p2</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="o">=</span><span class="n">col</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">abline</span><span class="p">(</span><span class="n">v</span><span class="o">=</span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="o">=</span><span class="s2">"grey"</span><span class="p">)</span><span class="w">
</span><span class="n">par</span><span class="p">(</span><span class="n">op</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="https://peter.solymos.org/images/2018/08/30/MfPlot.png" class="img-responsive" alt="Me model predictions" /></p>
<p>An alternative parametrization is that \(c_{i}\) rather than \(\phi\) be the time-varying parameter, allowing the individuals to switch between the frequent and infrequent group depending on covariates. We can fit this class of finite mixture model with <code class="language-plaintext highlighter-rouge">JDAY</code> and <code class="language-plaintext highlighter-rouge">TSSR</code> as covariates on \(c\) (models <code class="language-plaintext highlighter-rouge">Mm1</code> and <code class="language-plaintext highlighter-rouge">Mm2</code>) using <code class="language-plaintext highlighter-rouge">type = "mix"</code> (instead of <code class="language-plaintext highlighter-rouge">"fmix"</code>). In this case \(p(t_{iJ}) = 1 - c_{i} exp(-t_{iJ} \phi)\) and \(logit(c_{i}) = \beta_{0} + \sum^{K}_{k=1} \beta_{k} x_{ik}\) is the linear predictor with \(K\) covariates and the corresponding unknown coefficients (\(\beta_{k}\), \(k = 0,\ldots, K\)). Because \(c_{i}\) is a proportion, we model it on the logit scale.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Mm1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cmulti</span><span class="p">(</span><span class="n">Y</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">D</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">JDAY</span><span class="p">,</span><span class="w"> </span><span class="n">X</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="o">=</span><span class="s2">"mix"</span><span class="p">)</span><span class="w">
</span><span class="n">Mm2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cmulti</span><span class="p">(</span><span class="n">Y</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">D</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">TSSR</span><span class="p">,</span><span class="w"> </span><span class="n">X</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="o">=</span><span class="s2">"mix"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>We did not fit a null model for this parametrization, because it is identical to the <code class="language-plaintext highlighter-rouge">Mf0</code> model, so that model <code class="language-plaintext highlighter-rouge">Mf0</code> is what we use to compare AIC values and inspect the summary for the best supported model with the <code class="language-plaintext highlighter-rouge">JDAY</code> effect in this case.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Mm_AIC</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">AIC</span><span class="p">(</span><span class="n">Mf0</span><span class="p">,</span><span class="w"> </span><span class="n">Mm1</span><span class="p">,</span><span class="w"> </span><span class="n">Mm2</span><span class="p">)</span><span class="w">
</span><span class="n">Mm_AIC</span><span class="o">$</span><span class="n">dAIC</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">Mm_AIC</span><span class="o">$</span><span class="n">AIC</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nf">min</span><span class="p">(</span><span class="n">Mm_AIC</span><span class="o">$</span><span class="n">AIC</span><span class="p">)</span><span class="w">
</span><span class="n">MmBest</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">get</span><span class="p">(</span><span class="n">rownames</span><span class="p">(</span><span class="n">Mm_AIC</span><span class="p">)[</span><span class="n">Mm_AIC</span><span class="o">$</span><span class="n">dAIC</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">0</span><span class="p">])</span><span class="w">
</span><span class="n">Mm_AIC</span><span class="w">
</span><span class="c1">## df AIC dAIC</span><span class="w">
</span><span class="c1">## Mf0 2 519.2222 0.1949952</span><span class="w">
</span><span class="c1">## Mm1 3 519.0272 0.0000000</span><span class="w">
</span><span class="c1">## Mm2 3 520.8744 1.8471803</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">MmBest</span><span class="p">)</span><span class="w">
</span><span class="c1">## Call:</span><span class="w">
</span><span class="c1">## cmulti(formula = Y | D ~ JDAY, data = X, type = "mix")</span><span class="w">
</span><span class="c1">##</span><span class="w">
</span><span class="c1">## Removal Sampling (heterogeneous singing rate)</span><span class="w">
</span><span class="c1">## Conditional Maximum Likelihood estimates</span><span class="w">
</span><span class="c1">##</span><span class="w">
</span><span class="c1">## Coefficients:</span><span class="w">
</span><span class="c1">## Estimate Std. Error z value Pr(>|z|)</span><span class="w">
</span><span class="c1">## log.phi -2.1910 0.4914 -4.459 8.24e-06 ***</span><span class="w">
</span><span class="c1">## logit.c_(Intercept) -4.7600 3.3828 -1.407 0.159</span><span class="w">
</span><span class="c1">## logit.c_JDAY 10.7368 7.4287 1.445 0.148</span><span class="w">
</span><span class="c1">## ---</span><span class="w">
</span><span class="c1">## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1</span><span class="w">
</span><span class="c1">##</span><span class="w">
</span><span class="c1">## Log-likelihood: -256.5</span><span class="w">
</span><span class="c1">## BIC = 528.9</span><span class="w">
</span></code></pre></div></div>
<p>We produce a similar plot as before.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">b</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">coef</span><span class="p">(</span><span class="n">MmBest</span><span class="p">)</span><span class="w">
</span><span class="n">op</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">par</span><span class="p">(</span><span class="n">las</span><span class="o">=</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">mfrow</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">2</span><span class="p">,</span><span class="m">1</span><span class="p">),</span><span class="w"> </span><span class="n">mar</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">4</span><span class="p">,</span><span class="m">4</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">2</span><span class="p">))</span><span class="w">
</span><span class="n">p1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">1</span><span class="o">-</span><span class="n">plogis</span><span class="p">(</span><span class="n">b</span><span class="p">[</span><span class="m">2</span><span class="p">]</span><span class="o">+</span><span class="n">b</span><span class="p">[</span><span class="m">3</span><span class="p">]</span><span class="o">*</span><span class="n">JDAY</span><span class="p">)</span><span class="o">*</span><span class="nf">exp</span><span class="p">(</span><span class="m">-3</span><span class="o">*</span><span class="nf">exp</span><span class="p">(</span><span class="n">b</span><span class="p">[</span><span class="m">1</span><span class="p">]))</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">JDAY</span><span class="p">,</span><span class="w"> </span><span class="n">p1</span><span class="p">,</span><span class="w"> </span><span class="n">ylim</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">1</span><span class="p">),</span><span class="w"> </span><span class="n">type</span><span class="o">=</span><span class="s2">"n"</span><span class="p">,</span><span class="w">
</span><span class="n">main</span><span class="o">=</span><span class="n">paste</span><span class="p">(</span><span class="n">spp</span><span class="p">,</span><span class="w"> </span><span class="n">rownames</span><span class="p">(</span><span class="n">Mm_AIC</span><span class="p">)[</span><span class="n">Mm_AIC</span><span class="o">$</span><span class="n">dAIC</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">0</span><span class="p">]),</span><span class="w">
</span><span class="n">ylab</span><span class="o">=</span><span class="s2">"P(availability)"</span><span class="p">)</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nf">seq_len</span><span class="p">(</span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">lines</span><span class="p">(</span><span class="n">JDAY</span><span class="p">[</span><span class="nf">c</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="n">i</span><span class="m">+1</span><span class="p">)],</span><span class="w"> </span><span class="n">p1</span><span class="p">[</span><span class="nf">c</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="n">i</span><span class="m">+1</span><span class="p">)],</span><span class="w"> </span><span class="n">col</span><span class="o">=</span><span class="n">col</span><span class="p">[</span><span class="n">i</span><span class="p">],</span><span class="w"> </span><span class="n">lwd</span><span class="o">=</span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">Duration</span><span class="p">,</span><span class="w"> </span><span class="n">Duration</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="o">=</span><span class="s2">"n"</span><span class="p">,</span><span class="w"> </span><span class="n">ylim</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">1</span><span class="p">),</span><span class="w">
</span><span class="n">ylab</span><span class="o">=</span><span class="s2">"P(availability)"</span><span class="p">)</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nf">seq_len</span><span class="p">(</span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">p2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">1</span><span class="o">-</span><span class="n">plogis</span><span class="p">(</span><span class="n">b</span><span class="p">[</span><span class="m">2</span><span class="p">]</span><span class="o">+</span><span class="n">b</span><span class="p">[</span><span class="m">3</span><span class="p">]</span><span class="o">*</span><span class="n">JDAY</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="o">*</span><span class="nf">exp</span><span class="p">(</span><span class="o">-</span><span class="n">Duration</span><span class="o">*</span><span class="nf">exp</span><span class="p">(</span><span class="n">b</span><span class="p">[</span><span class="m">1</span><span class="p">]))</span><span class="w">
</span><span class="n">lines</span><span class="p">(</span><span class="n">Duration</span><span class="p">,</span><span class="w"> </span><span class="n">p2</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="o">=</span><span class="n">col</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">abline</span><span class="p">(</span><span class="n">v</span><span class="o">=</span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="o">=</span><span class="s2">"grey"</span><span class="p">)</span><span class="w">
</span><span class="n">par</span><span class="p">(</span><span class="n">op</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="https://peter.solymos.org/images/2018/08/30/MmPlot.png" class="img-responsive" alt="Me model predictions" /></p>
<h2 id="let-the-best-model-win">Let the best model win</h2>
<p>So which of the 3 parametrizations proved to be best for our Ovenbird example data? It was the finite mixture with time-varying proportion of infrequent singers with a thin margin. Second was the other finite mixture model, while the conventional model was lagging behind.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">M_AIC</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">AIC</span><span class="p">(</span><span class="n">MeBest</span><span class="p">,</span><span class="w"> </span><span class="n">MfBest</span><span class="p">,</span><span class="w"> </span><span class="n">MmBest</span><span class="p">)</span><span class="w">
</span><span class="n">M_AIC</span><span class="o">$</span><span class="n">dAIC</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">M_AIC</span><span class="o">$</span><span class="n">AIC</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nf">min</span><span class="p">(</span><span class="n">M_AIC</span><span class="o">$</span><span class="n">AIC</span><span class="p">)</span><span class="w">
</span><span class="n">M_AIC</span><span class="w">
</span><span class="c1">## df AIC dAIC</span><span class="w">
</span><span class="c1">## MeBest 2 545.4082 26.38106209</span><span class="w">
</span><span class="c1">## MfBest 3 519.1168 0.08960974</span><span class="w">
</span><span class="c1">## MmBest 3 519.0272 0.00000000</span><span class="w">
</span></code></pre></div></div>
<h2 id="conclusions-and-applications">Conclusions and applications</h2>
<p>Finite mixture models provide some really nice insight into how singing behavior changes over time and, due to more parameters, they provide a better fit and thus minimize bias in population size estimates. But all this improvement comes with a price: sample size requirements (or more precisely, the number of detections required) are really high. To have all the benefits with reduced variance, one needs about 1000 non-zero observations to fit finite mixture models, 20 times more than needed to reliably fit conventional removal models. This is much higher than previously suggested minimum sample sizes.</p>
<p>Our findings also indicate that lengthening the count duration from 3 minutes to 5–10 minutes is an important consideration when designing field surveys to increase the accuracy and precision of population estimates. Well-informed survey design combined with various forms of removal sampling are useful in accounting for availability bias in point counts, thereby improving population estimates, and allowing for better integration of disparate studies at larger spatial scales.</p>
<p>To this end, we provide our removal model estimates as part of the <a href="https://github.com/psolymos/qpad"><strong>QPAD</strong></a> R package and the R functions required to fit all the above outlined removal models as part of the <a href="https://cran.r-project.org/package=detect"><strong>detect</strong></a> R package. We at the <a href="http://www.borealbirds.ca/">Boreal Avian Modelling Project</a> and our collaborators are already utilizing the removal model estimates to correct for availability bias in our continental and regional projects to inform better management and conservation of bird populations. Read more about these projects in our <a href="http://www.borealbirds.ca/library/index.php/technical_reports">reports</a>.</p>
<p>Please report any issues <a href="https://github.com/psolymos/detect/issues">here</a> and feel free to comment below!</p>
<p><strong>UPDATE</strong>: <a href="https://americanornithologypubsblog.org/2018/08/29/a-better-way-to-count-boreal-birds/">AOS press release</a>, <a href="https://www.eurekalert.org/pub_releases/2018-08/uoa-bbb083018.php">EurekAlert!</a>, <a href="http://blog.abmi.ca/2018/08/30/easy-as-1-2-3-but-1-2-3-4-5-might-be-better-how-long-should-a-point-count-take/">ABMI blog</a>.</p><a href="https://peter.solymos.org">Péter Sólymos</a>In a paper recently published in the Condor, titled Evaluating time-removal models for estimating availability of boreal birds during point-count surveys: sample size requirements and model complexity, we assessed different ways of controlling for point-count duration in bird counts using data from the Boreal Avian Modelling Project. As the title indicates, the paper describes a cost-benefit analysis to make recommendations about when to use different types of the removal model. The paper is open access, so feel free to read the whole paper here.Shiny slider examples with the intrval R package2018-03-08T00:00:00-07:002018-03-08T00:00:00-07:00https://peter.solymos.org/https://peter.solymos.org/code/2018/03/08/shiny-slider-examples-with-the-intrval-r-package<p>The <a href="https://github.com/psolymos/intrval#readme"><strong>intrval</strong></a> R package is lightweight (~11K), standalone (apart from importing from <strong>graphics</strong>, has exactly 0 non-<strong>base</strong> dependency), and it has a very narrow scope: it implements relational operators for intervals — very well aligned with the <a href="http://www.tinyverse.org/"><em>tiny manifesto</em></a>. In this post we will explore the use of the package in two <a href="https://shiny.rstudio.com/"><strong>shiny</strong></a> apps with <a href="https://shiny.rstudio.com/articles/sliders.html">sliders</a>.</p>
<p>The first example uses a regular slider that returns a single value. To make that an interval, we will use standard deviation (SD, <em>sigma</em>) in a quality control chart (<a href="https://en.wikipedia.org/wiki/Control_chart">QCC</a>). The code is based on the <code class="language-plaintext highlighter-rouge">pistonrings</code> data set from the <a href="https://CRAN.R-project.org/package=qcc"><strong>qcc</strong></a> package. The Shewhart chart sets 3 <em>sigma</em> limit to indicate state of control. The slider is used to adjusts the <em>sigma</em> limit and the GIF below plays is as an animation.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">shiny</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">intrval</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">qcc</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="p">(</span><span class="n">pistonrings</span><span class="p">)</span><span class="w">
</span><span class="n">mu</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">mean</span><span class="p">(</span><span class="n">pistonrings</span><span class="o">$</span><span class="n">diameter</span><span class="p">[</span><span class="n">pistonrings</span><span class="o">$</span><span class="n">trial</span><span class="p">])</span><span class="w">
</span><span class="n">SD</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sd</span><span class="p">(</span><span class="n">pistonrings</span><span class="o">$</span><span class="n">diameter</span><span class="p">[</span><span class="n">pistonrings</span><span class="o">$</span><span class="n">trial</span><span class="p">])</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">pistonrings</span><span class="o">$</span><span class="n">diameter</span><span class="p">[</span><span class="o">!</span><span class="n">pistonrings</span><span class="o">$</span><span class="n">trial</span><span class="p">]</span><span class="w">
</span><span class="c1">## UI function</span><span class="w">
</span><span class="n">ui</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fluidPage</span><span class="p">(</span><span class="w">
</span><span class="n">plotOutput</span><span class="p">(</span><span class="s2">"plot"</span><span class="p">),</span><span class="w">
</span><span class="n">sliderInput</span><span class="p">(</span><span class="s2">"x"</span><span class="p">,</span><span class="w"> </span><span class="s2">"x SD:"</span><span class="p">,</span><span class="w">
</span><span class="n">min</span><span class="o">=</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">max</span><span class="o">=</span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="o">=</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">step</span><span class="o">=</span><span class="m">0.1</span><span class="p">,</span><span class="w">
</span><span class="n">animate</span><span class="o">=</span><span class="n">animationOptions</span><span class="p">(</span><span class="m">100</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1"># Server logic</span><span class="w">
</span><span class="n">server</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">input</span><span class="p">,</span><span class="w"> </span><span class="n">output</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">output</span><span class="o">$</span><span class="n">plot</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">renderPlot</span><span class="p">({</span><span class="w">
</span><span class="n">Main</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">paste</span><span class="p">(</span><span class="s2">"Shewhart quality control chart"</span><span class="p">,</span><span class="w">
</span><span class="s2">"diameter of piston rings"</span><span class="p">,</span><span class="w"> </span><span class="n">sprintf</span><span class="p">(</span><span class="s2">"+/- %.1f SD"</span><span class="p">,</span><span class="w"> </span><span class="n">input</span><span class="o">$</span><span class="n">x</span><span class="p">),</span><span class="w">
</span><span class="n">sep</span><span class="o">=</span><span class="s2">"\n"</span><span class="p">)</span><span class="w">
</span><span class="n">iv</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">mu</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">input</span><span class="o">$</span><span class="n">x</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="o">-</span><span class="n">SD</span><span class="p">,</span><span class="w"> </span><span class="n">SD</span><span class="p">)</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">pch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">19</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">%)(%</span><span class="w"> </span><span class="n">iv</span><span class="w"> </span><span class="m">+1</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"b"</span><span class="p">,</span><span class="w">
</span><span class="n">ylim</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mu</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">5</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="o">-</span><span class="n">SD</span><span class="p">,</span><span class="w"> </span><span class="n">SD</span><span class="p">),</span><span class="w"> </span><span class="n">main</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Main</span><span class="p">)</span><span class="w">
</span><span class="n">abline</span><span class="p">(</span><span class="n">h</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mu</span><span class="p">)</span><span class="w">
</span><span class="n">abline</span><span class="p">(</span><span class="n">h</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">iv</span><span class="p">,</span><span class="w"> </span><span class="n">lty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="p">})</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="c1">## Run shiny app</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nf">interactive</span><span class="p">())</span><span class="w"> </span><span class="n">shinyApp</span><span class="p">(</span><span class="n">ui</span><span class="p">,</span><span class="w"> </span><span class="n">server</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="https://github.com/psolymos/intrval/raw/master/extras/regular_slider.gif" class="img-responsive" alt="regular slider" /></p>
<p>The second example uses range slider returning two values, which is our interval. To spice things up a bit, we combine intervals on two axes to color some random points. The next range slider defines a distance interval and colors the random points inside the ring.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">shiny</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">intrval</span><span class="p">)</span><span class="w">
</span><span class="n">set.seed</span><span class="p">(</span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="n">n</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">10</span><span class="o">^</span><span class="m">4</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">round</span><span class="p">(</span><span class="n">runif</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="m">-2</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">),</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">round</span><span class="p">(</span><span class="n">runif</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="m">-2</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">),</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="n">d</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">round</span><span class="p">(</span><span class="nf">sqrt</span><span class="p">(</span><span class="n">x</span><span class="o">^</span><span class="m">2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">y</span><span class="o">^</span><span class="m">2</span><span class="p">),</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="c1">## UI function</span><span class="w">
</span><span class="n">ui</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">fluidPage</span><span class="p">(</span><span class="w">
</span><span class="n">titlePanel</span><span class="p">(</span><span class="s2">"intrval example with shiny"</span><span class="p">),</span><span class="w">
</span><span class="n">sidebarLayout</span><span class="p">(</span><span class="w">
</span><span class="n">sidebarPanel</span><span class="p">(</span><span class="w">
</span><span class="n">sliderInput</span><span class="p">(</span><span class="s2">"bb_x"</span><span class="p">,</span><span class="w"> </span><span class="s2">"x value:"</span><span class="p">,</span><span class="w">
</span><span class="n">min</span><span class="o">=</span><span class="nf">min</span><span class="p">(</span><span class="n">x</span><span class="p">),</span><span class="w"> </span><span class="n">max</span><span class="o">=</span><span class="nf">max</span><span class="p">(</span><span class="n">x</span><span class="p">),</span><span class="w"> </span><span class="n">value</span><span class="o">=</span><span class="nf">range</span><span class="p">(</span><span class="n">x</span><span class="p">),</span><span class="w">
</span><span class="n">step</span><span class="o">=</span><span class="nf">round</span><span class="p">(</span><span class="n">diff</span><span class="p">(</span><span class="nf">range</span><span class="p">(</span><span class="n">x</span><span class="p">))</span><span class="o">/</span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">),</span><span class="w"> </span><span class="n">animate</span><span class="o">=</span><span class="kc">TRUE</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">sliderInput</span><span class="p">(</span><span class="s2">"bb_y"</span><span class="p">,</span><span class="w"> </span><span class="s2">"y value:"</span><span class="p">,</span><span class="w">
</span><span class="n">min</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">min</span><span class="p">(</span><span class="n">y</span><span class="p">),</span><span class="w"> </span><span class="n">max</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">y</span><span class="p">),</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">range</span><span class="p">(</span><span class="n">y</span><span class="p">),</span><span class="w">
</span><span class="n">step</span><span class="o">=</span><span class="nf">round</span><span class="p">(</span><span class="n">diff</span><span class="p">(</span><span class="nf">range</span><span class="p">(</span><span class="n">y</span><span class="p">))</span><span class="o">/</span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">),</span><span class="w"> </span><span class="n">animate</span><span class="o">=</span><span class="kc">TRUE</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">sliderInput</span><span class="p">(</span><span class="s2">"bb_d"</span><span class="p">,</span><span class="w"> </span><span class="s2">"radial distance:"</span><span class="p">,</span><span class="w">
</span><span class="n">min</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">max</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">d</span><span class="p">),</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">d</span><span class="p">)</span><span class="o">/</span><span class="m">2</span><span class="p">),</span><span class="w">
</span><span class="n">step</span><span class="o">=</span><span class="nf">round</span><span class="p">(</span><span class="nf">max</span><span class="p">(</span><span class="n">d</span><span class="p">)</span><span class="o">/</span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">),</span><span class="w"> </span><span class="n">animate</span><span class="o">=</span><span class="kc">TRUE</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">),</span><span class="w">
</span><span class="n">mainPanel</span><span class="p">(</span><span class="w">
</span><span class="n">plotOutput</span><span class="p">(</span><span class="s2">"plot"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1"># Server logic</span><span class="w">
</span><span class="n">server</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">input</span><span class="p">,</span><span class="w"> </span><span class="n">output</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">output</span><span class="o">$</span><span class="n">plot</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">renderPlot</span><span class="p">({</span><span class="w">
</span><span class="n">iv1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">%[]%</span><span class="w"> </span><span class="n">input</span><span class="o">$</span><span class="n">bb_x</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">%[]%</span><span class="w"> </span><span class="n">input</span><span class="o">$</span><span class="n">bb_y</span><span class="w">
</span><span class="n">iv2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">%[]%</span><span class="w"> </span><span class="n">input</span><span class="o">$</span><span class="n">bb_y</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">%[]%</span><span class="w"> </span><span class="n">input</span><span class="o">$</span><span class="n">bb_x</span><span class="w">
</span><span class="n">iv3</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">d</span><span class="w"> </span><span class="o">%()%</span><span class="w"> </span><span class="n">input</span><span class="o">$</span><span class="n">bb_d</span><span class="w">
</span><span class="n">op</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">par</span><span class="p">(</span><span class="n">mfrow</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">))</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">pch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">19</span><span class="p">,</span><span class="w"> </span><span class="n">cex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.25</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">iv1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">iv2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w">
</span><span class="n">main</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Intersecting bounding boxes"</span><span class="p">)</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">pch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">19</span><span class="p">,</span><span class="w"> </span><span class="n">cex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.25</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">iv3</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w">
</span><span class="n">main</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Deck the halls:\ndistance range from center"</span><span class="p">)</span><span class="w">
</span><span class="n">par</span><span class="p">(</span><span class="n">op</span><span class="p">)</span><span class="w">
</span><span class="p">})</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="c1">## Run shiny app</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nf">interactive</span><span class="p">())</span><span class="w"> </span><span class="n">shinyApp</span><span class="p">(</span><span class="n">ui</span><span class="p">,</span><span class="w"> </span><span class="n">server</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="https://github.com/psolymos/intrval/raw/master/extras/range_slider.gif" class="img-responsive" alt="range slider" /></p>
<p>If you think there are other use cases for <strong>intrval</strong> in <strong>shiny</strong> applications, let me know in the comments section!</p>
<p><em>If you want to learn more about how to host Shiny apps, check out the <a href="https://hosting.analythium.io/">Hosting Data Apps</a> website!</em></p><a href="https://peter.solymos.org">Péter Sólymos</a>The intrval R package is lightweight (~11K), standalone (apart from importing from graphics, has exactly 0 non-base dependency), and it has a very narrow scope: it implements relational operators for intervals — very well aligned with the tiny manifesto. In this post we will explore the use of the package in two shiny apps with sliders.Phylogeny and species traits predict bird detectability2018-02-09T00:00:00-07:002018-02-09T00:00:00-07:00https://peter.solymos.org/https://peter.solymos.org/code/2018/02/09/phylogeny-and-species-traits-predict-bird-detectability<p>It all started with <a href="http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12106/abstract">this</a> paper in <em>Methods in Ecol. Evol.</em> where we looked at
detectability of many species. So we wanted to use life history
traits to validate our results. But we had to cut the manuscript,
and there was this leftover with some neat patterns, but without much focus.
It took a few years, and the <a href="https://twitter.com/psolymos/status/903634823906033664">most positive peer-review experience ever</a>,
and the paper is now early view in <a href="http://onlinelibrary.wiley.com/doi/10.1111/ecog.03415/abstract"><em>Ecography</em></a>. This post is a quick summary of the goodies stuffed inside the <a href="https://github.com/borealbirds/lhreg#readme"><strong>lhreg</strong></a> R package that makes the whole analysis reproducible, and provides some functions for similar PGLMM models.</p>
<p>The R package is hosted on <a href="https://github.com/borealbirds/lhreg">GitHub</a>
(no CRAN version yet),
please submit any issues <a href="https://github.com/borealbirds/lhreg/issues">here</a>.
The package is also archived on Zenodo with DOI <a href="http://doi.org/10.5281/zenodo.596410">10.5281/zenodo.596410</a>.
To install the package, use
<code class="language-plaintext highlighter-rouge">devtools::install_github("borealbirds/lhreg")</code>.</p>
<p>Here, I am going to skim the implementation based on the more
complete supporting information of the paper which has all the
reproducible code (try <code class="language-plaintext highlighter-rouge">vignette(topic = "lhreg", package = "lhreg")</code> after
installing and loading the package).
Here is the rendered <a href="https://borealbirds.github.io/lhreg/">html</a> version.</p>
<p>The most important function is <code class="language-plaintext highlighter-rouge">lhreg</code> which takes the following main arguments:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">Y</code>: response vector,</li>
<li><code class="language-plaintext highlighter-rouge">X</code>: model matrix for the mean.</li>
<li><code class="language-plaintext highlighter-rouge">SE</code>: standard error estimate (observation error) for the response,</li>
<li><code class="language-plaintext highlighter-rouge">V</code>: correlation matrix,</li>
</ul>
<p>and fits a Multivariate Normal model to the observed <code class="language-plaintext highlighter-rouge">Y</code> vector
with phylogenetically based (or any other known) correlations
and optionally with observation error (<code class="language-plaintext highlighter-rouge">SE</code>), and covariate effects (<code class="language-plaintext highlighter-rouge">X</code>).
The function is pretty bare-bones (i.e. no formula interface,
the design matrix <code class="language-plaintext highlighter-rouge">X</code> needs to be properly specified through
e.g. <code class="language-plaintext highlighter-rouge">model.matrix()</code>). The <code class="language-plaintext highlighter-rouge">lambda</code> argument
is a non-negative number modifying the strength of phylogenetic effects.
<code class="language-plaintext highlighter-rouge">lambda = 0</code> is equivalent to <code class="language-plaintext highlighter-rouge">lm</code> with
<code class="language-plaintext highlighter-rouge">weights = 1/(SE^2)</code>, <code class="language-plaintext highlighter-rouge">lambda = 1</code> implies Brownian motion evolution,
<code class="language-plaintext highlighter-rouge">lambda = NA</code> lets the function estimate it based on the data.</p>
<p>In terms of optimization, besides the algorithms from <code class="language-plaintext highlighter-rouge">stats::optim</code>,
we also have differential evolution algorithm based on the
<a href="https://cran.r-project.org/package=DEoptim"><strong>DEoptim</strong></a> package (a bit time consuming but very reliable).
The output object class has some methods defined (like <code class="language-plaintext highlighter-rouge">logLik</code> and <code class="language-plaintext highlighter-rouge">summary</code>)
and as a result AIC/BIC will work out of the box. The vignette also
describes a few techniques which are pretty nice to have in
a multivariate setting (i.e. profile likelihood, parametric bootstrap)
to support advanced hypothesis testing and model selection.</p>
<p>We used leave one out cross-validation to see how well we could predict the
values based on data from the other species, traits and phylogeny.
The conditional distribution we used for that is described in the paper which
made this exercise very straightforward.
Maybe it is just ignorance, but I couldn’t find another paper
that would have described it in a nice and useful manner,
however, if one wishes to make trait/phylogeny based
predictions for detectability, this formula is going to be
very useful (look inside the <code class="language-plaintext highlighter-rouge">loo2</code> function for implementation).</p>
<p>At the end of the vignette, there is a hack based on <code class="language-plaintext highlighter-rouge">phytools::contMap</code>
function to produce <em>non-rainbow</em> colors.
(It was surprisingly <em>non-straightforward</em> to hack the code —
<a href="https://en.wikipedia.org/wiki/Unix_philosophy#Doug_McIlroy_on_Unix_programming">modular code</a> please!)
The following figure shows the two input data vectors mirrored side-by-side:</p>
<p><img src="https://github.com/borealbirds/lhreg/raw/master/tree.png" class="img-responsive" alt="lhreg inputs" /></p>
<p>I realize this is not a very detailed post, but the paper
and the vignette should satisfy your curiosity.
If you still have unanswered questions, feel free to ask them below!</p>
<p><strong>UPDATE</strong></p>
<ul>
<li>2018-09-26: post on <a href="http://www.ecography.org/blog/phylogeny-and-species-traits-predict-bird-detectability">Ecography blog</a> and <a href="https://vimeo.com/291323964">video abstract</a>.</li>
<li>2018-04-13: a related <a href="https://www.okologiablog.hu/node/460">blog post</a> in Hungarian.</li>
</ul><a href="https://peter.solymos.org">Péter Sólymos</a>It all started with this paper in Methods in Ecol. Evol. where we looked at detectability of many species. So we wanted to use life history traits to validate our results. But we had to cut the manuscript, and there was this leftover with some neat patterns, but without much focus. It took a few years, and the most positive peer-review experience ever, and the paper is now early view in Ecography. This post is a quick summary of the goodies stuffed inside the lhreg R package that makes the whole analysis reproducible, and provides some functions for similar PGLMM models.PVA: Publication Viability Analysis, round 32018-02-06T00:00:00-07:002018-02-06T00:00:00-07:00https://peter.solymos.org/https://peter.solymos.org/etc/2018/02/06/pva-publication-viability-analysis-round-3<p>A friend and colleague of mine, <a href="https://sites.google.com/site/pbatary/">Péter Batáry</a>
has circulated news from <a href="https://www.nature.com/articles/d41586-018-01374-x">Nature</a>
magazine about the EU freezing innovation funds to Bulgaria.
The article had a figure about publication trends for
Bulgaria, compared with Romania and Hungary.
As I have blogged about such trends in ecology before
(<a href="http://okologiablog.hu/node/219">here</a> and
<a href="http://peter.solymos.org/etc/2016/08/30/my-first-blog-post-was-a-guest-post.html">here</a>),
I felt the need to update my PVA models
with two years worth of data from <a href="https://webofknowledge.com/">WoS</a>.</p>
<p>After downloading the yearly publications numbers
using filters <code class="language-plaintext highlighter-rouge">ADDRESS=HUNGARY; CATEGORIES=ECOLOGY</code>,
I started where I left off few years ago. I fit Ricker growth model
to two time intervals of the data: 1978–1997, and 1998–2017.</p>
<p>The R code below uses the <a href="https://CRAN.R-project.org/package=PVAClone"><strong>PVAClone</strong></a> package
that I wrote with <a href="https://www.researchgate.net/profile/Khurram_Nadeem">Khurram Nadeem</a>,
and is based on fitting state-space models using
MCMC and <a href="http://datacloning.org/">data cloning</a> with <a href="http://mcmc-jags.sourceforge.net/">JAGS</a>.
The other <a href="https://CRAN.R-project.org/package=interval"><strong>intrval</strong></a> package is pretty new but handy little helper
(see related posts <a href="http://peter.solymos.org/tags.html#intrval">here</a>)</p>
<div class="language-R highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">PVAClone</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">intrval</span><span class="p">)</span><span class="w">
</span><span class="c1">## the data from WoS</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">structure</span><span class="p">(</span><span class="nf">list</span><span class="p">(</span><span class="n">years</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1973</span><span class="o">:</span><span class="m">2017</span><span class="p">,</span><span class="w"> </span><span class="n">records</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w">
</span><span class="m">6</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w"> </span><span class="m">7</span><span class="p">,</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="m">7</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="m">9</span><span class="p">,</span><span class="w"> </span><span class="m">11</span><span class="p">,</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="m">8</span><span class="p">,</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="m">15</span><span class="p">,</span><span class="w"> </span><span class="m">29</span><span class="p">,</span><span class="w"> </span><span class="m">24</span><span class="p">,</span><span class="w"> </span><span class="m">53</span><span class="p">,</span><span class="w">
</span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="m">13</span><span class="p">,</span><span class="w"> </span><span class="m">30</span><span class="p">,</span><span class="w"> </span><span class="m">32</span><span class="p">,</span><span class="w"> </span><span class="m">36</span><span class="p">,</span><span class="w"> </span><span class="m">45</span><span class="p">,</span><span class="w"> </span><span class="m">39</span><span class="p">,</span><span class="w"> </span><span class="m">42</span><span class="p">,</span><span class="w"> </span><span class="m">43</span><span class="p">,</span><span class="w"> </span><span class="m">50</span><span class="p">,</span><span class="w"> </span><span class="m">62</span><span class="p">,</span><span class="w"> </span><span class="m">95</span><span class="p">,</span><span class="w"> </span><span class="m">106</span><span class="p">,</span><span class="w"> </span><span class="m">113</span><span class="p">,</span><span class="w"> </span><span class="m">83</span><span class="p">,</span><span class="w">
</span><span class="m">108</span><span class="p">,</span><span class="w"> </span><span class="m">99</span><span class="p">,</span><span class="w"> </span><span class="m">89</span><span class="p">,</span><span class="w"> </span><span class="m">117</span><span class="p">,</span><span class="w"> </span><span class="m">111</span><span class="p">,</span><span class="w"> </span><span class="m">134</span><span class="p">,</span><span class="w"> </span><span class="m">127</span><span class="p">)),</span><span class="w"> </span><span class="n">.Names</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"years"</span><span class="p">,</span><span class="w"> </span><span class="s2">"records"</span><span class="w">
</span><span class="p">),</span><span class="w"> </span><span class="n">row.names</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="m">45L</span><span class="p">),</span><span class="w"> </span><span class="n">class</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"data.frame"</span><span class="p">)</span><span class="w">
</span><span class="c1">## fit the 2 models</span><span class="w">
</span><span class="n">ncl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="c1"># number of clones</span><span class="w">
</span><span class="n">m1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">pva</span><span class="p">(</span><span class="n">x</span><span class="o">$</span><span class="n">records</span><span class="p">[</span><span class="n">x</span><span class="o">$</span><span class="n">years</span><span class="w"> </span><span class="o">%[]%</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1978</span><span class="p">,</span><span class="w"> </span><span class="m">1997</span><span class="p">)],</span><span class="w"> </span><span class="n">ricker</span><span class="p">(</span><span class="s2">"none"</span><span class="p">),</span><span class="w"> </span><span class="n">ncl</span><span class="p">)</span><span class="w">
</span><span class="n">m2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">pva</span><span class="p">(</span><span class="n">x</span><span class="o">$</span><span class="n">records</span><span class="p">[</span><span class="n">x</span><span class="o">$</span><span class="n">years</span><span class="w"> </span><span class="o">%[]%</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1998</span><span class="p">,</span><span class="w"> </span><span class="m">2017</span><span class="p">)],</span><span class="w"> </span><span class="n">ricker</span><span class="p">(</span><span class="s2">"none"</span><span class="p">),</span><span class="w"> </span><span class="n">ncl</span><span class="p">)</span><span class="w">
</span><span class="c1">## organize estimates</span><span class="w">
</span><span class="n">cf</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">t</span><span class="p">(</span><span class="n">sapply</span><span class="p">(</span><span class="nf">list</span><span class="p">(</span><span class="n">early</span><span class="o">=</span><span class="n">m1</span><span class="p">,</span><span class="w"> </span><span class="n">late</span><span class="o">=</span><span class="n">m2</span><span class="p">),</span><span class="w"> </span><span class="n">coef</span><span class="p">)))</span><span class="w">
</span><span class="n">cf</span><span class="o">$</span><span class="n">K</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">with</span><span class="p">(</span><span class="n">cf</span><span class="p">,</span><span class="w"> </span><span class="o">-</span><span class="n">a</span><span class="o">/</span><span class="n">b</span><span class="p">)</span><span class="w">
</span><span class="c1">## growth curve: early period</span><span class="w">
</span><span class="n">yr1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">1978</span><span class="o">:</span><span class="m">1997</span><span class="w">
</span><span class="n">pr1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">numeric</span><span class="p">(</span><span class="nf">length</span><span class="p">(</span><span class="n">yr1</span><span class="p">))</span><span class="w">
</span><span class="n">pr1</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">log</span><span class="p">(</span><span class="n">x</span><span class="o">$</span><span class="n">records</span><span class="p">[</span><span class="n">x</span><span class="o">$</span><span class="n">years</span><span class="o">==</span><span class="m">1978</span><span class="p">])</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">2</span><span class="o">:</span><span class="nf">length</span><span class="p">(</span><span class="n">pr1</span><span class="p">))</span><span class="w">
</span><span class="n">pr1</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">pr1</span><span class="p">[</span><span class="n">i</span><span class="m">-1</span><span class="p">]</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">cf</span><span class="p">[</span><span class="s2">"early"</span><span class="p">,</span><span class="w"> </span><span class="s2">"a"</span><span class="p">]</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">cf</span><span class="p">[</span><span class="s2">"early"</span><span class="p">,</span><span class="w"> </span><span class="s2">"b"</span><span class="p">]</span><span class="o">*</span><span class="nf">exp</span><span class="p">(</span><span class="n">pr1</span><span class="p">[</span><span class="n">i</span><span class="m">-1</span><span class="p">])</span><span class="w">
</span><span class="n">pr1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">exp</span><span class="p">(</span><span class="n">pr1</span><span class="p">)</span><span class="w">
</span><span class="c1">## growth curve: late period</span><span class="w">
</span><span class="n">yr2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">1998</span><span class="o">:</span><span class="m">2017</span><span class="w">
</span><span class="n">pr2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">numeric</span><span class="p">(</span><span class="nf">length</span><span class="p">(</span><span class="n">yr2</span><span class="p">))</span><span class="w">
</span><span class="n">pr2</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">log</span><span class="p">(</span><span class="n">x</span><span class="o">$</span><span class="n">records</span><span class="p">[</span><span class="n">x</span><span class="o">$</span><span class="n">years</span><span class="o">==</span><span class="m">1998</span><span class="p">])</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">2</span><span class="o">:</span><span class="nf">length</span><span class="p">(</span><span class="n">pr2</span><span class="p">))</span><span class="w">
</span><span class="n">pr2</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">pr2</span><span class="p">[</span><span class="n">i</span><span class="m">-1</span><span class="p">]</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">cf</span><span class="p">[</span><span class="s2">"late"</span><span class="p">,</span><span class="w"> </span><span class="s2">"a"</span><span class="p">]</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">cf</span><span class="p">[</span><span class="s2">"late"</span><span class="p">,</span><span class="w"> </span><span class="s2">"b"</span><span class="p">]</span><span class="o">*</span><span class="nf">exp</span><span class="p">(</span><span class="n">pr2</span><span class="p">[</span><span class="n">i</span><span class="m">-1</span><span class="p">])</span><span class="w">
</span><span class="n">pr2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">exp</span><span class="p">(</span><span class="n">pr2</span><span class="p">)</span><span class="w">
</span><span class="c1">## and finally the figure using base graphics</span><span class="w">
</span><span class="n">op</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">par</span><span class="p">(</span><span class="n">las</span><span class="o">=</span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="n">barplot</span><span class="p">(</span><span class="n">x</span><span class="o">$</span><span class="n">records</span><span class="p">,</span><span class="w"> </span><span class="n">names.arg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="o">$</span><span class="n">years</span><span class="p">,</span><span class="w"> </span><span class="n">space</span><span class="o">=</span><span class="m">0</span><span class="p">,</span><span class="w">
</span><span class="n">ylab</span><span class="o">=</span><span class="s2">"# of publications"</span><span class="p">,</span><span class="w"> </span><span class="n">xlab</span><span class="o">=</span><span class="s2">"years"</span><span class="p">,</span><span class="w">
</span><span class="n">col</span><span class="o">=</span><span class="n">ifelse</span><span class="p">(</span><span class="n">x</span><span class="o">$</span><span class="n">years</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="m">1998</span><span class="p">,</span><span class="w"> </span><span class="s2">"grey"</span><span class="p">,</span><span class="w"> </span><span class="s2">"gold"</span><span class="p">))</span><span class="w">
</span><span class="n">lines</span><span class="p">(</span><span class="n">yr1</span><span class="o">-</span><span class="nf">min</span><span class="p">(</span><span class="n">x</span><span class="o">$</span><span class="n">years</span><span class="p">)</span><span class="m">+0.5</span><span class="p">,</span><span class="w"> </span><span class="n">pr1</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="o">=</span><span class="m">4</span><span class="p">)</span><span class="w">
</span><span class="n">abline</span><span class="p">(</span><span class="n">h</span><span class="o">=</span><span class="n">cf</span><span class="p">[</span><span class="s2">"early"</span><span class="p">,</span><span class="w"> </span><span class="s2">"K"</span><span class="p">],</span><span class="w"> </span><span class="n">col</span><span class="o">=</span><span class="m">4</span><span class="p">,</span><span class="w"> </span><span class="n">lty</span><span class="o">=</span><span class="m">3</span><span class="p">)</span><span class="w">
</span><span class="n">lines</span><span class="p">(</span><span class="n">yr2</span><span class="o">-</span><span class="nf">min</span><span class="p">(</span><span class="n">x</span><span class="o">$</span><span class="n">years</span><span class="p">)</span><span class="m">+0.5</span><span class="p">,</span><span class="w"> </span><span class="n">pr2</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="o">=</span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="n">abline</span><span class="p">(</span><span class="n">h</span><span class="o">=</span><span class="n">cf</span><span class="p">[</span><span class="s2">"late2017"</span><span class="p">,</span><span class="w"> </span><span class="s2">"K"</span><span class="p">],</span><span class="w"> </span><span class="n">col</span><span class="o">=</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">lty</span><span class="o">=</span><span class="m">3</span><span class="p">)</span><span class="w">
</span><span class="n">par</span><span class="p">(</span><span class="n">op</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="https://peter.solymos.org/images/2018/02/06/pva-3.png" class="img-responsive" alt="PVA" /></p>
<p>Here are the model parameters for the two Ricker models:</p>
<table>
<thead>
<tr>
<th style="text-align: left"> </th>
<th style="text-align: right"><em>a</em></th>
<th style="text-align: right"><em>b</em></th>
<th style="text-align: right"><em>sigma</em></th>
<th style="text-align: right"><em>K</em></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">1978–1997</td>
<td style="text-align: right">0.38</td>
<td style="text-align: right">-0.03</td>
<td style="text-align: right">0.60</td>
<td style="text-align: right">13.85</td>
</tr>
<tr>
<td style="text-align: left">1998–2017</td>
<td style="text-align: right">0.21</td>
<td style="text-align: right">0.00</td>
<td style="text-align: right">0.16</td>
<td style="text-align: right">119.00</td>
</tr>
</tbody>
</table>
<p>The <em>K</em> carrying capacity used to be 100 based on
1998–2012 data, but now <em>K</em> = 119, which is
a significant improvement — heartfelt kudos to the ecologists in Hungary
(more papers please)!
The growth rate hasn’t changed (<em>a</em> = 0.21).
So we can conclude that if the rate remained constant
but carrying capacity increased, the change must be
related to resource availability
(i.e. increased funding, more jobs, improved infrastructure).</p>
<p>This is good news to me! Let me know what you think by leaving a comment below!</p><a href="https://peter.solymos.org">Péter Sólymos</a>A friend and colleague of mine, Péter Batáry has circulated news from Nature magazine about the EU freezing innovation funds to Bulgaria. The article had a figure about publication trends for Bulgaria, compared with Romania and Hungary. As I have blogged about such trends in ecology before (here and here), I felt the need to update my PVA models with two years worth of data from WoS.The progress bar just got a lot cheaper2018-01-23T00:00:00-07:002018-01-23T00:00:00-07:00https://peter.solymos.org/https://peter.solymos.org/code/2018/01/23/the-progress-bar-just-got-a-lot-cheaper<p>The <a href="http://cran.r-project.org/package=pbapply"><strong>pbapply</strong></a> R package that adds progress bar to vectorized functions has been know to accumulate overhead when calling <code class="language-plaintext highlighter-rouge">parallel::mclapply</code> with forking (see <a href="http://peter.solymos.org/code/2016/09/11/what-is-the-cost-of-a-progress-bar-in-r.html">this post</a> for more background on the issue). Strangely enough, a <a href="https://github.com/psolymos/pbapply/issues/30">GitHub issue</a> held the key to the solution that I am going to outline below. Long story short: forking is no longer expensive with <strong>pbapply</strong>, and as it turns out, it never was.</p>
<p>The issue mentioned <code class="language-plaintext highlighter-rouge">parallel::makeForkCluster</code> as the way to set up a Fork cluster, which, according to the help page, ‘<em>is merely a stub on Windows. On Unix-alike platforms it creates the worker process by forking</em>’.
So I looked at some timings starting with one of the examples on the <code class="language-plaintext highlighter-rouge">?pbapply</code> help page:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">pbapply</span><span class="p">)</span><span class="w">
</span><span class="n">set.seed</span><span class="p">(</span><span class="m">1234</span><span class="p">)</span><span class="w">
</span><span class="n">n</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">200</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rnorm</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rnorm</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">crossprod</span><span class="p">(</span><span class="n">t</span><span class="p">(</span><span class="n">model.matrix</span><span class="p">(</span><span class="o">~</span><span class="w"> </span><span class="n">x</span><span class="p">)),</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">)),</span><span class="w"> </span><span class="n">sd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.5</span><span class="p">)</span><span class="w">
</span><span class="n">d</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">)</span><span class="w">
</span><span class="n">mod</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">lm</span><span class="p">(</span><span class="n">y</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">d</span><span class="p">)</span><span class="w">
</span><span class="n">ndat</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">model.frame</span><span class="p">(</span><span class="n">mod</span><span class="p">)</span><span class="w">
</span><span class="n">B</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">100</span><span class="w">
</span><span class="n">bid</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sapply</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="n">sample</span><span class="p">(</span><span class="n">nrow</span><span class="p">(</span><span class="n">ndat</span><span class="p">),</span><span class="w"> </span><span class="n">nrow</span><span class="p">(</span><span class="n">ndat</span><span class="p">),</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">))</span><span class="w">
</span><span class="n">fun</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">z</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nf">missing</span><span class="p">(</span><span class="n">z</span><span class="p">))</span><span class="w">
</span><span class="n">z</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sample</span><span class="p">(</span><span class="n">nrow</span><span class="p">(</span><span class="n">ndat</span><span class="p">),</span><span class="w"> </span><span class="n">nrow</span><span class="p">(</span><span class="n">ndat</span><span class="p">),</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="n">coef</span><span class="p">(</span><span class="n">lm</span><span class="p">(</span><span class="n">mod</span><span class="o">$</span><span class="n">call</span><span class="o">$</span><span class="n">formula</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="o">=</span><span class="n">ndat</span><span class="p">[</span><span class="n">z</span><span class="p">,]))</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="c1">## forking with mclapply</span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">res1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">pblapply</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="n">fun</span><span class="p">(</span><span class="n">bid</span><span class="p">[,</span><span class="n">i</span><span class="p">]),</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2L</span><span class="p">))</span><span class="w">
</span><span class="c1">## |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 01s</span><span class="w">
</span><span class="c1">## user system elapsed </span><span class="w">
</span><span class="c1">## 0.587 0.919 0.845 </span><span class="w">
</span><span class="c1">## forking with parLapply</span><span class="w">
</span><span class="n">cl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">makeForkCluster</span><span class="p">(</span><span class="m">2L</span><span class="p">)</span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">res2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">pblapply</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="n">fun</span><span class="p">(</span><span class="n">bid</span><span class="p">[,</span><span class="n">i</span><span class="p">]),</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cl</span><span class="p">))</span><span class="w">
</span><span class="c1">## |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 00s</span><span class="w">
</span><span class="c1">## user system elapsed </span><span class="w">
</span><span class="c1">## 0.058 0.009 0.215 </span><span class="w">
</span><span class="n">stopCluster</span><span class="p">(</span><span class="n">cl</span><span class="p">)</span><span class="w">
</span><span class="c1">## Socket cluster (need to pass objects to workers)</span><span class="w">
</span><span class="n">cl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">makeCluster</span><span class="p">(</span><span class="m">2L</span><span class="p">)</span><span class="w">
</span><span class="n">clusterExport</span><span class="p">(</span><span class="n">cl</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"fun"</span><span class="p">,</span><span class="w"> </span><span class="s2">"mod"</span><span class="p">,</span><span class="w"> </span><span class="s2">"ndat"</span><span class="p">,</span><span class="w"> </span><span class="s2">"bid"</span><span class="p">))</span><span class="w">
</span><span class="n">system.time</span><span class="p">(</span><span class="n">res3</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">pblapply</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="n">fun</span><span class="p">(</span><span class="n">bid</span><span class="p">[,</span><span class="n">i</span><span class="p">]),</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cl</span><span class="p">))</span><span class="w">
</span><span class="c1">## |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 00s</span><span class="w">
</span><span class="c1">## user system elapsed </span><span class="w">
</span><span class="c1">## 0.053 0.008 0.169 </span><span class="w">
</span><span class="n">stopCluster</span><span class="p">(</span><span class="n">cl</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>Forking with <code class="language-plaintext highlighter-rouge">mclapply</code> is still pricey, but the almost equivalent <code class="language-plaintext highlighter-rouge">makeForkCluster</code> trick, that does not require objects to be passed to workers due to the shared memory nature of the process, is pretty close to the ordinary Socket cluster option.</p>
<p>What if I used this trick in the package? I would then create a Fork cluster
(<code class="language-plaintext highlighter-rouge">cl <- makeForkCluster(cl)</code>), run <code class="language-plaintext highlighter-rouge">parLapply(cl, ...)</code>, and destroy the cluster with <code class="language-plaintext highlighter-rouge">on.exit(stopCluster(cl), add = TRUE)</code>. So I created a <a href="https://github.com/psolymos/pbapply/tree/fork-cluster-speedup">branch</a> to do some tests:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ncl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">2</span><span class="w">
</span><span class="n">B</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">1000</span><span class="w">
</span><span class="n">fun</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">Sys.sleep</span><span class="p">(</span><span class="m">0.01</span><span class="p">)</span><span class="w">
</span><span class="n">x</span><span class="o">^</span><span class="m">2</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">pbmcapply</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="n">t1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">system.time</span><span class="p">(</span><span class="n">pbmclapply</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="n">fun</span><span class="p">,</span><span class="w"> </span><span class="n">mc.cores</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ncl</span><span class="p">)))</span><span class="w">
</span><span class="c1">## |========================================================| 100%, ETA 00:00</span><span class="w">
</span><span class="c1">## user system elapsed </span><span class="w">
</span><span class="c1">## 0.242 0.114 5.461 </span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">pbapply</span><span class="p">)</span><span class="w"> </span><span class="c1"># 1.3-4 CRAN version</span><span class="w">
</span><span class="p">(</span><span class="n">t2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">system.time</span><span class="p">(</span><span class="n">pblapply</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="n">fun</span><span class="p">,</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ncl</span><span class="p">)))</span><span class="w">
</span><span class="c1">## |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 07s</span><span class="w">
</span><span class="c1">## user system elapsed </span><span class="w">
</span><span class="c1">## 0.667 1.390 6.547 </span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">pbapply</span><span class="p">)</span><span class="w"> </span><span class="c1"># 1.3-5 fork-cluster-speedup branch</span><span class="w">
</span><span class="p">(</span><span class="n">t3</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">system.time</span><span class="p">(</span><span class="n">pblapply</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="n">fun</span><span class="p">,</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ncl</span><span class="p">)))</span><span class="w">
</span><span class="c1">## |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 06s</span><span class="w">
</span><span class="c1">## user system elapsed </span><span class="w">
</span><span class="c1">## 0.225 0.100 5.710 </span><span class="w">
</span></code></pre></div></div>
<p>Really nice so far: <strong>pbapply</strong> caught up to forking based timings with <strong>pbmcapply</strong>. Let’s see a bit more extensive runs to see how the number of progress bar updates affects the timings. If things work as I hope,
there shouldn’t be an increase with the new forking idea:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">timer_fun</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="w"> </span><span class="n">FUN</span><span class="p">,</span><span class="w"> </span><span class="n">nout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="n">...</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">pbo</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">pboptions</span><span class="p">(</span><span class="n">nout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">nout</span><span class="p">)</span><span class="w">
</span><span class="nf">on.exit</span><span class="p">(</span><span class="n">pboptions</span><span class="p">(</span><span class="n">pbo</span><span class="p">))</span><span class="w">
</span><span class="n">unname</span><span class="p">(</span><span class="n">system.time</span><span class="p">(</span><span class="n">pblapply</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="w"> </span><span class="n">FUN</span><span class="p">,</span><span class="w"> </span><span class="n">...</span><span class="p">))[</span><span class="m">3</span><span class="p">])</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">timer_NULL</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w">
</span><span class="n">nout1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">timer_fun</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="n">fun</span><span class="p">,</span><span class="w"> </span><span class="n">nout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NULL</span><span class="p">),</span><span class="w">
</span><span class="n">nout10</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">timer_fun</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="n">fun</span><span class="p">,</span><span class="w"> </span><span class="n">nout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NULL</span><span class="p">),</span><span class="w">
</span><span class="n">nout100</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">timer_fun</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="n">fun</span><span class="p">,</span><span class="w"> </span><span class="n">nout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NULL</span><span class="p">),</span><span class="w">
</span><span class="n">nout1000</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">timer_fun</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="n">fun</span><span class="p">,</span><span class="w"> </span><span class="n">nout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1000</span><span class="p">,</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NULL</span><span class="p">))</span><span class="w">
</span><span class="n">unlist</span><span class="p">(</span><span class="n">timer_NULL</span><span class="p">)</span><span class="w">
</span><span class="c1">## nout1 nout10 nout100 nout1000 </span><span class="w">
</span><span class="c1">## 12.221 11.899 11.775 11.260 </span><span class="w">
</span><span class="n">cl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">makeCluster</span><span class="p">(</span><span class="n">ncl</span><span class="p">)</span><span class="w">
</span><span class="n">timer_cl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w">
</span><span class="n">nout1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">timer_fun</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="n">fun</span><span class="p">,</span><span class="w"> </span><span class="n">nout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cl</span><span class="p">),</span><span class="w">
</span><span class="n">nout10</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">timer_fun</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="n">fun</span><span class="p">,</span><span class="w"> </span><span class="n">nout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cl</span><span class="p">),</span><span class="w">
</span><span class="n">nout100</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">timer_fun</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="n">fun</span><span class="p">,</span><span class="w"> </span><span class="n">nout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cl</span><span class="p">),</span><span class="w">
</span><span class="n">nout1000</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">timer_fun</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="n">fun</span><span class="p">,</span><span class="w"> </span><span class="n">nout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1000</span><span class="p">,</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cl</span><span class="p">))</span><span class="w">
</span><span class="n">stopCluster</span><span class="p">(</span><span class="n">cl</span><span class="p">)</span><span class="w">
</span><span class="n">unlist</span><span class="p">(</span><span class="n">timer_cl</span><span class="p">)</span><span class="w">
</span><span class="c1">## nout1 nout10 nout100 nout1000 </span><span class="w">
</span><span class="c1">## 6.033 6.091 6.011 6.273 </span><span class="w">
</span><span class="c1">## forking with 1.3-4 CRAN version</span><span class="w">
</span><span class="n">timer_mc</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w">
</span><span class="n">nout1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">timer_fun</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="n">fun</span><span class="p">,</span><span class="w"> </span><span class="n">nout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ncl</span><span class="p">),</span><span class="w">
</span><span class="n">nout10</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">timer_fun</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="n">fun</span><span class="p">,</span><span class="w"> </span><span class="n">nout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ncl</span><span class="p">),</span><span class="w">
</span><span class="n">nout100</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">timer_fun</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="n">fun</span><span class="p">,</span><span class="w"> </span><span class="n">nout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ncl</span><span class="p">),</span><span class="w">
</span><span class="n">nout1000</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">timer_fun</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="n">fun</span><span class="p">,</span><span class="w"> </span><span class="n">nout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1000</span><span class="p">,</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ncl</span><span class="p">))</span><span class="w">
</span><span class="n">unlist</span><span class="p">(</span><span class="n">timer_mc</span><span class="p">)</span><span class="w">
</span><span class="c1">## nout1 nout10 nout100 nout1000 </span><span class="w">
</span><span class="c1">## 5.563 5.659 6.620 10.692 </span><span class="w">
</span><span class="c1">## forking with 1.3-5 fork-cluster-speedup branch</span><span class="w">
</span><span class="n">timer_new</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="w">
</span><span class="n">nout1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">timer_fun</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="n">fun</span><span class="p">,</span><span class="w"> </span><span class="n">nout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ncl</span><span class="p">),</span><span class="w">
</span><span class="n">nout10</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">timer_fun</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="n">fun</span><span class="p">,</span><span class="w"> </span><span class="n">nout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ncl</span><span class="p">),</span><span class="w">
</span><span class="n">nout100</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">timer_fun</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="n">fun</span><span class="p">,</span><span class="w"> </span><span class="n">nout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ncl</span><span class="p">),</span><span class="w">
</span><span class="n">nout1000</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">timer_fun</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">B</span><span class="p">,</span><span class="w"> </span><span class="n">fun</span><span class="p">,</span><span class="w"> </span><span class="n">nout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1000</span><span class="p">,</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ncl</span><span class="p">))</span><span class="w">
</span><span class="n">unlist</span><span class="p">(</span><span class="n">timer_new</span><span class="p">)</span><span class="w">
</span><span class="c1">## nout1 nout10 nout100 nout1000 </span><span class="w">
</span><span class="c1">## 5.480 5.574 5.665 6.063 </span><span class="w">
</span></code></pre></div></div>
<p>The new implementation with the Fork cluster trick hands down beat the old implementation using <code class="language-plaintext highlighter-rouge">mclapply</code>. I wonder what is causing the
wildly different timings results. Is it due to all the other
<code class="language-plaintext highlighter-rouge">mclapply</code> arguments that give control over pre-scheduling, cleanup, and RNG seeds?</p>
<p>The new branch can be installed as:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">devtools</span><span class="o">::</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"psolymos/pbapply"</span><span class="p">,</span><span class="w"> </span><span class="n">ref</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"fork-cluster-speedup"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>I am a bit reluctant of merging the new branch for the following reasons:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">makeForkCluster</code> was already an option before by explicitly stating the cluster to be a Fork;</li>
<li>by hiding the process of creating and destroying the cluster, user options are restricted (i.e. no control over RNGs, which can be a major drawback for simulations);</li>
<li><code class="language-plaintext highlighter-rouge">mclapply</code> wasn’t so bad to begin with, because the number of updates were capped by the <code class="language-plaintext highlighter-rouge">nout</code> option.</li>
</ul>
<p>I would recommend the following workflow that is based purely on the stable CRAN version:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">cl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">makeForkCluster</span><span class="p">(</span><span class="m">2L</span><span class="p">)</span><span class="w">
</span><span class="n">output</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">pblapply</span><span class="p">(</span><span class="n">...</span><span class="p">,</span><span class="w"> </span><span class="n">cl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cl</span><span class="p">)</span><span class="w">
</span><span class="n">stopCluster</span><span class="p">(</span><span class="n">cl</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>As always, <!-- check if this shows up -->I am keen on hearing what you think: either in the comments or on <a href="https://github.com/psolymos/pbapply/issues/31">GitHub</a>.</p><a href="https://peter.solymos.org">Péter Sólymos</a>The pbapply R package that adds progress bar to vectorized functions has been know to accumulate overhead when calling parallel::mclapply with forking (see this post for more background on the issue). Strangely enough, a GitHub issue held the key to the solution that I am going to outline below. Long story short: forking is no longer expensive with pbapply, and as it turns out, it never was.What is new in the intrval R package?2017-01-26T00:00:00-07:002017-01-26T00:00:00-07:00https://peter.solymos.org/https://peter.solymos.org/code/2017/01/26/what-is-new-in-the-intrval-r-package<p>An update (v 0.1-1) of the <a href="https://github.com/psolymos/intrval"><strong>intrval</strong></a> package was recently published on CRAN. The package simplifies interval related logical operations (read more about the motivation in <a href="http://peter.solymos.org/code/2016/12/02/relational-operators-for-intervals-with-the-intrval-r-package.html">this</a> post).
So what is new in this version? Some of the inconsistencies in the 1st CRAN release have been cleaned up, and I have been pushed hard (see GitHub <a href="https://github.com/psolymos/intrval/issues/6">issue</a> to implement all the 16
interval-to-interval operators.
These operators define the open/closed nature of the lower/upper
limits of the intervals on the left and right hand side of the <code class="language-plaintext highlighter-rouge">o</code>
in the middle as in <code class="language-plaintext highlighter-rouge">c(a1, b1) %[]o[]% c(a2, b2)</code>.</p>
<table>
<thead>
<tr>
<th>Interval 1:</th>
<th>Interval 2: <code class="language-plaintext highlighter-rouge">[]</code></th>
<th><code class="language-plaintext highlighter-rouge">[)</code></th>
<th><code class="language-plaintext highlighter-rouge">(]</code></th>
<th><code class="language-plaintext highlighter-rouge">()</code></th>
</tr>
</thead>
<tbody>
<tr>
<td><code class="language-plaintext highlighter-rouge">[]</code></td>
<td><code class="language-plaintext highlighter-rouge">%[]o[]%</code></td>
<td><code class="language-plaintext highlighter-rouge">%[]o[)%</code></td>
<td><code class="language-plaintext highlighter-rouge">%[]o(]%</code></td>
<td><code class="language-plaintext highlighter-rouge">%[]o()%</code></td>
</tr>
<tr>
<td><code class="language-plaintext highlighter-rouge">[)</code></td>
<td><code class="language-plaintext highlighter-rouge">%[)o[]%</code></td>
<td><code class="language-plaintext highlighter-rouge">%[)o[)%</code></td>
<td><code class="language-plaintext highlighter-rouge">%[)o(]%</code></td>
<td><code class="language-plaintext highlighter-rouge">%[)o()%</code></td>
</tr>
<tr>
<td><code class="language-plaintext highlighter-rouge">(]</code></td>
<td><code class="language-plaintext highlighter-rouge">%(]o[]%</code></td>
<td><code class="language-plaintext highlighter-rouge">%(]o[)%</code></td>
<td><code class="language-plaintext highlighter-rouge">%(]o(]%</code></td>
<td><code class="language-plaintext highlighter-rouge">%(]o()%</code></td>
</tr>
<tr>
<td><code class="language-plaintext highlighter-rouge">()</code></td>
<td><code class="language-plaintext highlighter-rouge">%()o[]%</code></td>
<td><code class="language-plaintext highlighter-rouge">%()o[)%</code></td>
<td><code class="language-plaintext highlighter-rouge">%()o(]%</code></td>
<td><code class="language-plaintext highlighter-rouge">%()o()%</code></td>
</tr>
</tbody>
</table>
<p>The overlap of two closed intervals, [a1, b1] and [a2, b2],
is evaluated by the <code class="language-plaintext highlighter-rouge">%[]o[]%</code> (<code class="language-plaintext highlighter-rouge">%[o]%</code> is an alias)
operator (<code class="language-plaintext highlighter-rouge">a1 <= b1</code>, <code class="language-plaintext highlighter-rouge">a2 <= b2</code>).
Endpoints can be defined as a vector with two values
(<code class="language-plaintext highlighter-rouge">c(a1, b1)</code>) or can be stored in matrix-like objects or a lists
in which case comparisons are made element-wise.</p>
<p>If lengths do not match, shorter objects are recycled.
These value-to-interval operators work for numeric (integer, real)
and ordered vectors, and object types which are measured at
least on ordinal scale (e.g. dates).
Note that interval endpoints
are sorted internally thus ensuring the conditions
<code class="language-plaintext highlighter-rouge">a1 <= b1</code> and <code class="language-plaintext highlighter-rouge">a2 <= b2</code> is not necessary.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>c(2, 3) %[]o[]% c(0, 1)
list(0:4, 1:5) %[]o[]% c(2, 3)
cbind(0:4, 1:5) %[]o[]% c(2, 3)
data.frame(a=0:4, b=1:5) %[]o[]% c(2, 3)
</code></pre></div></div>
<p>If lengths do not match, shorter objects are recycled.
These value-to-interval operators work for numeric (integer, real)
and ordered vectors, and object types which are measured at
least on ordinal scale (e.g. dates).</p>
<p><code class="language-plaintext highlighter-rouge">%)o(%</code> is used for the negation of two closed interval overlap (<code class="language-plaintext highlighter-rouge">%[o]%</code>),
directional evaluation is done via the operators
<code class="language-plaintext highlighter-rouge">%[<o]%</code> and <code class="language-plaintext highlighter-rouge">%[o>]%</code>.
The overlap of two open intervals
is evaluated by the <code class="language-plaintext highlighter-rouge">%(o)%</code> (alias for <code class="language-plaintext highlighter-rouge">%()o()%</code>).
<code class="language-plaintext highlighter-rouge">%]o[%</code> is used for the negation of two open interval overlap,
directional evaluation is done via the operators
<code class="language-plaintext highlighter-rouge">%(<o)%</code> and <code class="language-plaintext highlighter-rouge">%(o>)%</code>.
Overlap operators with mixed endpoint do not have
negation and directional counterparts.</p>
<table>
<thead>
<tr>
<th>Equal</th>
<th>Not equal</th>
<th>Less than</th>
<th>Greater than</th>
</tr>
</thead>
<tbody>
<tr>
<td><code class="language-plaintext highlighter-rouge">%[o]%</code></td>
<td><code class="language-plaintext highlighter-rouge">%)o(%</code></td>
<td><code class="language-plaintext highlighter-rouge">%[<o]%</code></td>
<td><code class="language-plaintext highlighter-rouge">%[o>]%</code></td>
</tr>
<tr>
<td><code class="language-plaintext highlighter-rouge">%(o)%</code></td>
<td><code class="language-plaintext highlighter-rouge">%]o[%</code></td>
<td><code class="language-plaintext highlighter-rouge">%(<o)%</code></td>
<td><code class="language-plaintext highlighter-rouge">%(o>)%</code></td>
</tr>
</tbody>
</table>
<p>Thanks for all the feedback so far and please keep’em coming:
leave a comment below or use the <a href="https://github.com/psolymos/intrval/issues">issue tracker</a>
to provide feedback or report a problem.</p><a href="https://peter.solymos.org">Péter Sólymos</a>An update (v 0.1-1) of the intrval package was recently published on CRAN. The package simplifies interval related logical operations (read more about the motivation in this post). So what is new in this version? Some of the inconsistencies in the 1st CRAN release have been cleaned up, and I have been pushed hard (see GitHub issue to implement all the 16 interval-to-interval operators. These operators define the open/closed nature of the lower/upper limits of the intervals on the left and right hand side of the o in the middle as in c(a1, b1) %[]o[]% c(a2, b2).Relational operators for intervals with the intrval R package2016-12-02T00:00:00-07:002016-12-02T00:00:00-07:00https://peter.solymos.org/https://peter.solymos.org/code/2016/12/02/relational-operators-for-intervals-with-the-intrval-r-package<p>I recently posted a piece about <a href="http://peter.solymos.org/code/2016/11/26/how-to-write-and-document-special-functions-in-r.html">how to write and document special functions in R</a>. I meant that as a prelude for the topic I am writing about in this post. Let me start at the beginning. The other day Dirk Eddelbuettel tweeted about the new release of the <a href="https://cran.r-project.org/package=data.table"><strong>data.table</strong></a> package (v1.9.8).
There were <a href="https://cran.r-project.org/web/packages/data.table/news.html">new features announced</a> for joins based on <code class="language-plaintext highlighter-rouge">%inrange%</code> and <code class="language-plaintext highlighter-rouge">%between%</code>. That got me thinking: it would be really cool to generalize this idea for different intervals, for example as <code class="language-plaintext highlighter-rouge">x %[]% c(a, b)</code>.</p>
<h2 id="motivation">Motivation</h2>
<p>We want to evaluate if values of <code class="language-plaintext highlighter-rouge">x</code> satisfy the condition <code class="language-plaintext highlighter-rouge">x >= a & x <= b</code> given that <code class="language-plaintext highlighter-rouge">a <= b</code>. Typing <code class="language-plaintext highlighter-rouge">x %[]% c(a, b)</code> instead of the previous expression is not much shorter (14 vs. 15 characters with counting spaces). But considering the <code class="language-plaintext highlighter-rouge">a <= b</code> condition as well, it becomes a saving (<code class="language-plaintext highlighter-rouge">x >= min(a, b) & x <= mmax(a, b)</code> is 31 characters long). And sorting is really important, because by flipping <code class="language-plaintext highlighter-rouge">a</code> and <code class="language-plaintext highlighter-rouge">b</code>, we get quite different answers:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x <- 5
x >= 1 & x <= 10
# [1] TRUE
x >= 10 & x <= 1
# [1] FALSE
</code></pre></div></div>
<p>Also, <code class="language-plaintext highlighter-rouge">min</code> and <code class="language-plaintext highlighter-rouge">max</code> will not be very useful when we want to vectorize the expression. We need to use <code class="language-plaintext highlighter-rouge">pmin</code> and <code class="language-plaintext highlighter-rouge">pmax</code> for obvious reasons:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x >= min(1:10, 10:1) & x <= max(10:1, 1:10)
# [1] TRUE
x >= pmin(1:10, 10:1) & x <= pmax(10:1, 1:10)
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
</code></pre></div></div>
<p>If interval endpoints can also be open or closed, and allowing them to flip around makes the semantics of left/right closed/open interval definitions hard. We can thus all agree that there is a need for an expression, like <code class="language-plaintext highlighter-rouge">x %[]% c(a, b)</code>, that is <em>compact</em>, <em>flexible</em>, and <em>invariant</em> to endpoint sorting. This is exactly what the <a href="https://github.com/psolymos/intrval"><strong>intrval</strong></a> package is for!</p>
<h2 id="whats-in-the-package">What’s in the package</h2>
<p>Functions for evaluating if values of vectors are within
different open/closed intervals
(<code class="language-plaintext highlighter-rouge">x %[]% c(a, b)</code>), or if two closed
intervals overlap (<code class="language-plaintext highlighter-rouge">c(a1, b1) %[o]% c(a2, b2)</code>).
Operators for negation and directional relations also implemented.</p>
<h3 id="value-to-interval-relations">Value-to-interval relations</h3>
<p>Values of <code class="language-plaintext highlighter-rouge">x</code> are compared to interval endpoints <code class="language-plaintext highlighter-rouge">a</code> and <code class="language-plaintext highlighter-rouge">b</code> (<code class="language-plaintext highlighter-rouge">a <= b</code>).
Endpoints can be defined as a vector with two values (<code class="language-plaintext highlighter-rouge">c(a, b)</code>): these values will be compared as a single interval with each value in <code class="language-plaintext highlighter-rouge">x</code>.
If endpoints are stored in a matrix-like object or a list,
comparisons are made element-wise.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x <- rep(4, 5)
a <- 1:5
b <- 3:7
cbind(x=x, a=a, b=b)
x %[]% cbind(a, b) # matrix
x %[]% data.frame(a=a, b=b) # data.frame
x %[]% list(a, b) # list
</code></pre></div></div>
<p>If lengths do not match, shorter objects are recycled. Return values are logicals.
Note: interval endpoints are sorted internally thus ensuring the condition
<code class="language-plaintext highlighter-rouge">a <= b</code> is not necessary.</p>
<p>These value-to-interval operators work for numeric (integer, real) and ordered vectors, and object types which are measured at least on ordinal scale (e.g. dates).</p>
<h4 id="closed-and-open-intervals">Closed and open intervals</h4>
<p>The following special operators are used to indicate closed (<code class="language-plaintext highlighter-rouge">[</code>, <code class="language-plaintext highlighter-rouge">]</code>) or open (<code class="language-plaintext highlighter-rouge">(</code>, <code class="language-plaintext highlighter-rouge">)</code>) interval endpoints:</p>
<table>
<thead>
<tr>
<th>Operator</th>
<th>Expression</th>
<th>Condition</th>
</tr>
</thead>
<tbody>
<tr>
<td><code class="language-plaintext highlighter-rouge">%[]%</code></td>
<td><code class="language-plaintext highlighter-rouge">x %[]% c(a, b)</code></td>
<td><code class="language-plaintext highlighter-rouge">x >= a & x <= b</code></td>
</tr>
<tr>
<td><code class="language-plaintext highlighter-rouge">%[)%</code></td>
<td><code class="language-plaintext highlighter-rouge">x %[)% c(a, b)</code></td>
<td><code class="language-plaintext highlighter-rouge">x >= a & x < b</code></td>
</tr>
<tr>
<td><code class="language-plaintext highlighter-rouge">%(]%</code></td>
<td><code class="language-plaintext highlighter-rouge">x %(]% c(a, b)</code></td>
<td><code class="language-plaintext highlighter-rouge">x > a & x <= b</code></td>
</tr>
<tr>
<td><code class="language-plaintext highlighter-rouge">%()%</code></td>
<td><code class="language-plaintext highlighter-rouge">x %()% c(a, b)</code></td>
<td><code class="language-plaintext highlighter-rouge">x > a & x < b</code></td>
</tr>
</tbody>
</table>
<h4 id="negation-and-directional-relations">Negation and directional relations</h4>
<table>
<thead>
<tr>
<th>Equal</th>
<th>Not equal</th>
<th>Less than</th>
<th>Greater than</th>
</tr>
</thead>
<tbody>
<tr>
<td><code class="language-plaintext highlighter-rouge">%[]%</code></td>
<td><code class="language-plaintext highlighter-rouge">%)(%</code></td>
<td><code class="language-plaintext highlighter-rouge">%[<]%</code></td>
<td><code class="language-plaintext highlighter-rouge">%[>]%</code></td>
</tr>
<tr>
<td><code class="language-plaintext highlighter-rouge">%[)%</code></td>
<td><code class="language-plaintext highlighter-rouge">%)[%</code></td>
<td><code class="language-plaintext highlighter-rouge">%[<)%</code></td>
<td><code class="language-plaintext highlighter-rouge">%[>)%</code></td>
</tr>
<tr>
<td><code class="language-plaintext highlighter-rouge">%(]%</code></td>
<td><code class="language-plaintext highlighter-rouge">%](%</code></td>
<td><code class="language-plaintext highlighter-rouge">%(<]%</code></td>
<td><code class="language-plaintext highlighter-rouge">%(>]%</code></td>
</tr>
<tr>
<td><code class="language-plaintext highlighter-rouge">%()%</code></td>
<td><code class="language-plaintext highlighter-rouge">%][%</code></td>
<td><code class="language-plaintext highlighter-rouge">%(<)%</code></td>
<td><code class="language-plaintext highlighter-rouge">%(>)%</code></td>
</tr>
</tbody>
</table>
<p>The helper function <code class="language-plaintext highlighter-rouge">intrval_types</code> can be used to
print/plot the following summary:</p>
<p><img src="https://github.com/psolymos/intrval/raw/master/extras/intrval.png" class="img-responsive" alt="Interval types" /></p>
<h3 id="interval-to-interval-relations">Interval-to-interval relations</h3>
<p>The overlap of two closed intervals, [<code class="language-plaintext highlighter-rouge">a1</code>, <code class="language-plaintext highlighter-rouge">b1</code>] and [<code class="language-plaintext highlighter-rouge">a2</code>, <code class="language-plaintext highlighter-rouge">b2</code>],
is evaluated by the <code class="language-plaintext highlighter-rouge">%[o]%</code> operator (<code class="language-plaintext highlighter-rouge">a1 <= b1</code>, <code class="language-plaintext highlighter-rouge">a2 <= b2</code>).
Endpoints can be defined as a vector with two values
(<code class="language-plaintext highlighter-rouge">c(a1, b1)</code>)or can be stored in matrix-like objects or a lists
in which case comparisons are made element-wise.
Note: interval endpoints
are sorted internally thus ensuring the conditions
<code class="language-plaintext highlighter-rouge">a1 <= b1</code> and <code class="language-plaintext highlighter-rouge">a2 <= b2</code> is not necessary.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>c(2:3) %[o]% c(0:1)
list(0:4, 1:5) %[o]% c(2:3)
cbind(0:4, 1:5) %[o]% c(2:3)
data.frame(a=0:4, b=1:5) %[o]% c(2:3)
</code></pre></div></div>
<p>If lengths do not match, shorter objects are recycled.
These value-to-interval operators work for numeric (integer, real)
and ordered vectors, and object types which are measured at
least on ordinal scale (e.g. dates).</p>
<p><code class="language-plaintext highlighter-rouge">%)o(%</code> is used for the negation,
directional evaluation is done via the operators <code class="language-plaintext highlighter-rouge">%[<o]%</code> and <code class="language-plaintext highlighter-rouge">%[o>]%</code>.</p>
<table>
<thead>
<tr>
<th>Equal</th>
<th>Not equal</th>
<th>Less than</th>
<th>Greater than</th>
</tr>
</thead>
<tbody>
<tr>
<td><code class="language-plaintext highlighter-rouge">%[o]%</code></td>
<td><code class="language-plaintext highlighter-rouge">%)o(%</code></td>
<td><code class="language-plaintext highlighter-rouge">%[<o]%</code></td>
<td><code class="language-plaintext highlighter-rouge">%[o>]%</code></td>
</tr>
</tbody>
</table>
<h3 id="operators-for-discrete-variables">Operators for discrete variables</h3>
<p>The previous operators will return <code class="language-plaintext highlighter-rouge">NA</code> for unordered factors.
Set overlap can be evaluated by the base <code class="language-plaintext highlighter-rouge">%in%</code> operator and its negation
<code class="language-plaintext highlighter-rouge">%nin%</code>. (This feature is really <a href="http://peter.solymos.org/code/2016/11/26/how-to-write-and-document-special-functions-in-r.html">redundant</a>, I know, but decided to include regardless…)</p>
<h2 id="install">Install</h2>
<p>Install development version from GitHub (not yet on CRAN):</p>
<div class="language-R highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span><span class="w">
</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"psolymos/intrval"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>The package is licensed under <a href="https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html">GPL-2</a>.</p>
<h2 id="examples">Examples</h2>
<p><img src="https://github.com/psolymos/intrval/raw/master/extras/examples.png" class="img-responsive" alt="Interval examples" /></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>library(intrval)
## bounding box
set.seed(1)
n <- 10^4
x <- runif(n, -2, 2)
y <- runif(n, -2, 2)
d <- sqrt(x^2 + y^2)
iv1 <- x %[]% c(-0.25, 0.25) & y %[]% c(-1.5, 1.5)
iv2 <- x %[]% c(-1.5, 1.5) & y %[]% c(-0.25, 0.25)
iv3 <- d %()% c(1, 1.5)
plot(x, y, pch = 19, cex = 0.25, col = iv1 + iv2 + 1,
main = "Intersecting bounding boxes")
plot(x, y, pch = 19, cex = 0.25, col = iv3 + 1,
main = "Deck the halls:\ndistance range from center")
## time series filtering
x <- seq(0, 4*24*60*60, 60*60)
dt <- as.POSIXct(x, origin="2000-01-01 00:00:00")
f <- as.POSIXlt(dt)$hour %[]% c(0, 11)
plot(sin(x) ~ dt, type="l", col="grey",
main = "Filtering date/time objects")
points(sin(x) ~ dt, pch = 19, col = f + 1)
## QCC
library(qcc)
data(pistonrings)
mu <- mean(pistonrings$diameter[pistonrings$trial])
SD <- sd(pistonrings$diameter[pistonrings$trial])
x <- pistonrings$diameter[!pistonrings$trial]
iv <- mu + 3 * c(-SD, SD)
plot(x, pch = 19, col = x %)(% iv +1, type = "b", ylim = mu + 5 * c(-SD, SD),
main = "Shewhart quality control chart\ndiameter of piston rings")
abline(h = mu)
abline(h = iv, lty = 2)
## Annette Dobson (1990) "An Introduction to Generalized Linear Models".
## Page 9: Plant Weight Data.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
lm.D9 <- lm(weight ~ group)
## compare 95% confidence intervals with 0
(CI.D9 <- confint(lm.D9))
# 2.5 % 97.5 %
# (Intercept) 4.56934 5.4946602
# groupTrt -1.02530 0.2833003
0 %[]% CI.D9
# (Intercept) groupTrt
# FALSE TRUE
lm.D90 <- lm(weight ~ group - 1) # omitting intercept
## compare 95% confidence of the 2 groups to each other
(CI.D90 <- confint(lm.D90))
# 2.5 % 97.5 %
# groupCtl 4.56934 5.49466
# groupTrt 4.19834 5.12366
CI.D90[1,] %[o]% CI.D90[2,]
# 2.5 %
# TRUE
DATE <- as.Date(c("2000-01-01","2000-02-01", "2000-03-31"))
DATE %[<]% as.Date(c("2000-01-151", "2000-03-15"))
# [1] TRUE FALSE FALSE
DATE %[]% as.Date(c("2000-01-151", "2000-03-15"))
# [1] FALSE TRUE FALSE
DATE %[>]% as.Date(c("2000-01-151", "2000-03-15"))
# [1] FALSE FALSE TRUE
</code></pre></div></div>
<p>For more examples, see the <a href="https://github.com/psolymos/intrval/blob/master/tests/tests.R">unit-testing script</a>.</p>
<h2 id="feedback">Feedback</h2>
<p>Please check out the package and use the <a href="https://github.com/psolymos/intrval/issues">issue tracker</a>
to suggest a new feature or report a problem.</p>
<h4 id="update-2016-12-04">Update (2016-12-04)</h4>
<p>Sergey Kashin <a href="https://twitter.com/sergeykashin/status/805501566123966464/photo/1">pointed out</a> that some operators are redundant. It is now explained in the manual:</p>
<p>Note that some operators return identical results but
are syntactically different:
<code class="language-plaintext highlighter-rouge">%[<]%</code> and <code class="language-plaintext highlighter-rouge">%[<)%</code> both evaluate <code class="language-plaintext highlighter-rouge">x < a</code>;
<code class="language-plaintext highlighter-rouge">%[>]%</code> and <code class="language-plaintext highlighter-rouge">%(>]%</code> both evaluate <code class="language-plaintext highlighter-rouge">x > b</code>;
<code class="language-plaintext highlighter-rouge">%(<]%</code> and <code class="language-plaintext highlighter-rouge">%(<)%</code> evaluate <code class="language-plaintext highlighter-rouge">x <= a</code>;
<code class="language-plaintext highlighter-rouge">%[>)%</code> and <code class="language-plaintext highlighter-rouge">%(>)%</code> both evaluate <code class="language-plaintext highlighter-rouge">x >= b</code>.
This is so because we evaluate only one end of the interval
but still conceptually referring to the relationship
defined by the right-hand-side interval object.
This implies 2 conditional logical evaluations
instead of treating it as a single 3-level ordered factor.</p>
<h4 id="update-2016-12-06">Update (2016-12-06)</h4>
<p><strong>intrval</strong> R package v0.1 is on CRAN: <a href="https://CRAN.R-project.org/package=intrval">https://CRAN.R-project.org/package=intrval</a></p><a href="https://peter.solymos.org">Péter Sólymos</a>I recently posted a piece about how to write and document special functions in R. I meant that as a prelude for the topic I am writing about in this post. Let me start at the beginning. The other day Dirk Eddelbuettel tweeted about the new release of the data.table package (v1.9.8). There were new features announced for joins based on %inrange% and %between%. That got me thinking: it would be really cool to generalize this idea for different intervals, for example as x %[]% c(a, b).How to write and document %special% functions in R2016-11-26T00:00:00-07:002016-11-26T00:00:00-07:00https://peter.solymos.org/https://peter.solymos.org/code/2016/11/26/how-to-write-and-document-special-functions-in-r<p>I spend a considerable portion of my working hours with data processing where I often use the <code class="language-plaintext highlighter-rouge">%in%</code> R function as <code class="language-plaintext highlighter-rouge">x %in% y</code>. Whenever I need the negation of that, I used to write <code class="language-plaintext highlighter-rouge">!(x %in% y)</code>. Not much of a hassle, but still, wouldn’t it be nicer to have <code class="language-plaintext highlighter-rouge">x %notin% y</code> instead? So I decided to code it for my <a href="https://CRAN.R-project.org/package=mefa4"><strong>mefa4</strong></a> package that I maintain primarily to make my data munging time shorter and more efficient. Coding a <code class="language-plaintext highlighter-rouge">%special%</code> function was no big deal. But I had to do quite a bit of research and trial-error until I figured out the proper documentation. So here it goes.</p>
<h2 id="the-function">The function</h2>
<p>The function name needs quotes and exactly two arguments, one for the left and one for the right hand side of the operator in the middle:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"%notin%" <- function(x, table) !(match(x, table, nomatch = 0) > 0)
</code></pre></div></div>
<p>Let us see what it does:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1:4 %in% 3:5
## [1] FALSE FALSE TRUE TRUE
1:4 %notin% 3:5
## [1] TRUE TRUE FALSE FALSE
</code></pre></div></div>
<h2 id="the-namespace-entry">The NAMESPACE entry</h2>
<p>We need to export the function, so just add the following entry to the <code class="language-plaintext highlighter-rouge">NAMESPACE</code> file:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>export("%notin%")
</code></pre></div></div>
<h2 id="the-rd-file">The Rd file</h2>
<p>This is where things get are a bit more interesting. The LaTeX engine needs the percent sign to be escaped (<code class="language-plaintext highlighter-rouge">\%</code>) throughout the whole documentation. Also pay close attention to the usage section (<code class="language-plaintext highlighter-rouge">x \%notin\% table</code>).</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>\name{\%notin\%}
\alias{\%notin\%}
\title{
Negated Value Matching
}
\description{
\code{\%notin\%} is the negation of \code{\link{\%in\%}},
which returns a logical vector indicating if there is a non-match or not
for its left operand.
}
\usage{
x \%notin\% table
}
\arguments{
\item{x}{
vector or \code{NULL}: the values to be matched.
}
\item{table}{
vector or \code{NULL}: the values to be matched against.
}
}
\value{
A logical vector, indicating if a non-match was located for each element of
\code{x}: thus the values are \code{TRUE} or \code{FALSE} and never \code{NA}.
}
\author{
Peter Solymos <solymos@ualberta.ca>
}
\seealso{
All the opposite of what is written for \code{\link{\%in\%}}.
}
\examples{
1:10 \%notin\% c(1,3,5,9)
sstr <- c("c","ab","B","bba","c",NA,"@","bla","a","Ba","\%")
sstr[sstr \%notin\% c(letters, LETTERS)]
}
\keyword{manip}
\keyword{logic}
</code></pre></div></div>
<p><strong>UPDATE</strong></p>
<p>Some updates from the comments:</p>
<ul>
<li>From Marcin: One can use <a href="https://cran.r-project.org/package=roxygen2"><strong>roxygen2</strong></a> for writing package documentation, see the <a href="https://cran.r-project.org/package=magrittr"><strong>magrittr</strong></a> package docs on the <a href="https://github.com/tidyverse/magrittr/blob/master/R/pipe.R"><code class="language-plaintext highlighter-rouge">%>%</code> (pipe)</a> operator.</li>
<li>From Andrey: The <a href="https://cran.r-project.org/package=Hmisc"><strong>Hmisc</strong></a> package also has a similar <code class="language-plaintext highlighter-rouge">%nin%</code> function (<code class="language-plaintext highlighter-rouge">{match(x, table, nomatch = 0) == 0}</code>). (Note that the unexported <code class="language-plaintext highlighter-rouge">Matrix:::"%nin%"</code> is defined as <code class="language-plaintext highlighter-rouge">{is.na(match(x, table))}</code>.)</li>
</ul><a href="https://peter.solymos.org">Péter Sólymos</a>I spend a considerable portion of my working hours with data processing where I often use the %in% R function as x %in% y. Whenever I need the negation of that, I used to write !(x %in% y). Not much of a hassle, but still, wouldn’t it be nicer to have x %notin% y instead? So I decided to code it for my mefa4 package that I maintain primarily to make my data munging time shorter and more efficient. Coding a %special% function was no big deal. But I had to do quite a bit of research and trial-error until I figured out the proper documentation. So here it goes.