December 02, 2016 Code R functions special intrval

I recently posted a piece about how to write and document special functions in R. I meant that as a prelude for the topic I am writing about in this post. Let me start at the beginning. The other day Dirk Eddelbuettel tweeted about the new release of the **data.table** package (v1.9.8).
There were new features announced for joins based on `%inrange%`

and `%between%`

. That got me thinking: it would be really cool to generalize this idea for different intervals, for example as `x %[]% c(a, b)`

.

We want to evaluate if values of `x`

satisfy the condition `x >= a & x <= b`

given that `a <= b`

. Typing `x %[]% c(a, b)`

instead of the previous expression is not much shorter (14 vs. 15 characters with counting spaces). But considering the `a <= b`

condition as well, it becomes a saving (`x >= min(a, b) & x <= mmax(a, b)`

is 31 characters long). And sorting is really important, because by flipping `a`

and `b`

, we get quite different answers:

```
x <- 5
x >= 1 & x <= 10
# [1] TRUE
x >= 10 & x <= 1
# [1] FALSE
```

Also, `min`

and `max`

will not be very useful when we want to vectorize the expression. We need to use `pmin`

and `pmax`

for obvious reasons:

```
x >= min(1:10, 10:1) & x <= max(10:1, 1:10)
# [1] TRUE
x >= pmin(1:10, 10:1) & x <= pmax(10:1, 1:10)
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
```

If interval endpoints can also be open or closed, and allowing them to flip around makes the semantics of left/right closed/open interval definitions hard. We can thus all agree that there is a need for an expression, like `x %[]% c(a, b)`

, that is *compact*, *flexible*, and *invariant* to endpoint sorting. This is exactly what the **intrval** package is for!

Functions for evaluating if values of vectors are within
different open/closed intervals
(`x %[]% c(a, b)`

), or if two closed
intervals overlap (`c(a1, b1) %[o]% c(a2, b2)`

).
Operators for negation and directional relations also implemented.

Values of `x`

are compared to interval endpoints `a`

and `b`

(`a <= b`

).
Endpoints can be defined as a vector with two values (`c(a, b)`

): these values will be compared as a single interval with each value in `x`

.
If endpoints are stored in a matrix-like object or a list,
comparisons are made element-wise.

```
x <- rep(4, 5)
a <- 1:5
b <- 3:7
cbind(x=x, a=a, b=b)
x %[]% cbind(a, b) # matrix
x %[]% data.frame(a=a, b=b) # data.frame
x %[]% list(a, b) # list
```

If lengths do not match, shorter objects are recycled. Return values are logicals.
Note: interval endpoints are sorted internally thus ensuring the condition
`a <= b`

is not necessary.

These value-to-interval operators work for numeric (integer, real) and ordered vectors, and object types which are measured at least on ordinal scale (e.g. dates).

The following special operators are used to indicate closed (`[`

, `]`

) or open (`(`

, `)`

) interval endpoints:

Operator | Expression | Condition |
---|---|---|

`%[]%` |
`x %[]% c(a, b)` |
`x >= a & x <= b` |

`%[)%` |
`x %[)% c(a, b)` |
`x >= a & x < b` |

`%(]%` |
`x %(]% c(a, b)` |
`x > a & x <= b` |

`%()%` |
`x %()% c(a, b)` |
`x > a & x < b` |

Equal | Not equal | Less than | Greater than |
---|---|---|---|

`%[]%` |
`%)(%` |
`%[<]%` |
`%[>]%` |

`%[)%` |
`%)[%` |
`%[<)%` |
`%[>)%` |

`%(]%` |
`%](%` |
`%(<]%` |
`%(>]%` |

`%()%` |
`%][%` |
`%(<)%` |
`%(>)%` |

The helper function `intrval_types`

can be used to
print/plot the following summary:

The overlap of two closed intervals, [`a1`

, `b1`

] and [`a2`

, `b2`

],
is evaluated by the `%[o]%`

operator (`a1 <= b1`

, `a2 <= b2`

).
Endpoints can be defined as a vector with two values
(`c(a1, b1)`

)or can be stored in matrix-like objects or a lists
in which case comparisons are made element-wise.
Note: interval endpoints
are sorted internally thus ensuring the conditions
`a1 <= b1`

and `a2 <= b2`

is not necessary.

```
c(2:3) %[o]% c(0:1)
list(0:4, 1:5) %[o]% c(2:3)
cbind(0:4, 1:5) %[o]% c(2:3)
data.frame(a=0:4, b=1:5) %[o]% c(2:3)
```

If lengths do not match, shorter objects are recycled. These value-to-interval operators work for numeric (integer, real) and ordered vectors, and object types which are measured at least on ordinal scale (e.g. dates).

`%)o(%`

is used for the negation,
directional evaluation is done via the operators `%[<o]%`

and `%[o>]%`

.

Equal | Not equal | Less than | Greater than |
---|---|---|---|

`%[o]%` |
`%)o(%` |
`%[<o]%` |
`%[o>]%` |

The previous operators will return `NA`

for unordered factors.
Set overlap can be evaluated by the base `%in%`

operator and its negation
`%nin%`

. (This feature is really redundant, I know, but decided to include regardlessâ€¦)

Install development version from GitHub (not yet on CRAN):

```
library(devtools)
install_github("psolymos/intrval")
```

The package is licensed under GPL-2.

```
library(intrval)
## bounding box
set.seed(1)
n <- 10^4
x <- runif(n, -2, 2)
y <- runif(n, -2, 2)
d <- sqrt(x^2 + y^2)
iv1 <- x %[]% c(-0.25, 0.25) & y %[]% c(-1.5, 1.5)
iv2 <- x %[]% c(-1.5, 1.5) & y %[]% c(-0.25, 0.25)
iv3 <- d %()% c(1, 1.5)
plot(x, y, pch = 19, cex = 0.25, col = iv1 + iv2 + 1,
main = "Intersecting bounding boxes")
plot(x, y, pch = 19, cex = 0.25, col = iv3 + 1,
main = "Deck the halls:\ndistance range from center")
## time series filtering
x <- seq(0, 4*24*60*60, 60*60)
dt <- as.POSIXct(x, origin="2000-01-01 00:00:00")
f <- as.POSIXlt(dt)$hour %[]% c(0, 11)
plot(sin(x) ~ dt, type="l", col="grey",
main = "Filtering date/time objects")
points(sin(x) ~ dt, pch = 19, col = f + 1)
## QCC
library(qcc)
data(pistonrings)
mu <- mean(pistonrings$diameter[pistonrings$trial])
SD <- sd(pistonrings$diameter[pistonrings$trial])
x <- pistonrings$diameter[!pistonrings$trial]
iv <- mu + 3 * c(-SD, SD)
plot(x, pch = 19, col = x %)(% iv +1, type = "b", ylim = mu + 5 * c(-SD, SD),
main = "Shewhart quality control chart\ndiameter of piston rings")
abline(h = mu)
abline(h = iv, lty = 2)
## Annette Dobson (1990) "An Introduction to Generalized Linear Models".
## Page 9: Plant Weight Data.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
lm.D9 <- lm(weight ~ group)
## compare 95% confidence intervals with 0
(CI.D9 <- confint(lm.D9))
# 2.5 % 97.5 %
# (Intercept) 4.56934 5.4946602
# groupTrt -1.02530 0.2833003
0 %[]% CI.D9
# (Intercept) groupTrt
# FALSE TRUE
lm.D90 <- lm(weight ~ group - 1) # omitting intercept
## compare 95% confidence of the 2 groups to each other
(CI.D90 <- confint(lm.D90))
# 2.5 % 97.5 %
# groupCtl 4.56934 5.49466
# groupTrt 4.19834 5.12366
CI.D90[1,] %[o]% CI.D90[2,]
# 2.5 %
# TRUE
DATE <- as.Date(c("2000-01-01","2000-02-01", "2000-03-31"))
DATE %[<]% as.Date(c("2000-01-151", "2000-03-15"))
# [1] TRUE FALSE FALSE
DATE %[]% as.Date(c("2000-01-151", "2000-03-15"))
# [1] FALSE TRUE FALSE
DATE %[>]% as.Date(c("2000-01-151", "2000-03-15"))
# [1] FALSE FALSE TRUE
```

For more examples, see the unit-testing script.

Please check out the package and use the issue tracker to suggest a new feature or report a problem.

Sergey Kashin pointed out that some operators are redundant. It is now explained in the manual:

Note that some operators return identical results but
are syntactically different:
`%[<]%`

and `%[<)%`

both evaluate `x < a`

;
`%[>]%`

and `%(>]%`

both evaluate `x > b`

;
`%(<]%`

and `%(<)%`

evaluate `x <= a`

;
`%[>)%`

and `%(>)%`

both evaluate `x >= b`

.
This is so because we evaluate only one end of the interval
but still conceptually referring to the relationship
defined by the right-hand-side interval object.
This implies 2 conditional logical evaluations
instead of treating it as a single 3-level ordered factor.

**intrval** R package v0.1 is on CRAN: https://CRAN.R-project.org/package=intrval

An update (v 0.1-1) of the **intrval** package was recently published on CRAN. The package simplifies interval related logical operations (read more about the motivation in this post).
So what is new in this version? Some of the inconsistencies in the 1st CRAN release have been cleaned up, and I have been pushed hard (see GitHub issue to implement all the 16
interval-to-interval operators.
These operators define the open/closed nature of the lower/upper
limits of the intervals on the left and right hand side of the `o`

in the middle as in `c(a1, b1) %[]o[]% c(a2, b2)`

.

- What is new in the intrval R package?
- How to write and document %special% functions in R
- Progress bar overhead comparisons
- How to add pbapply to R packages
- What is the cost of a progress bar in R?

ABMI (6) ARU (1) C (1) CRAN (1) Hungary (1) JOSM (2) PVA (1) QPAD (1) R (15) R packages (1) bioacoustics (1) biodiversity (1) birds (2) course (2) data (1) data cloning (3) dclone (3) dependencies (1) detect (2) footprint (3) forecasting (1) functions (3) intrval (2) mefa4 (1) monitoring (2) pbapply (4) plyr (1) poster (2) processing time (1) progress bar (3) publications (1) report (1) sector effects (1) single visit (1) site (1) slides (2) special (3) species (1) trend (1) tutorials (2) video (4)