Data set with all the conceivable errors

June 14, 2016 Etc R data

As I was preparing for an R intro course I came up with the idea of creating a fake data set that is stuffed full of all the conceivable errors one can imagine. Just in case my imagination falls short, I’d appreciate all the suggestions in the comments so that I can incorporate more errors.

There is a Hungarian saying about the veterinarian’s horse to describe a case that exhibits all the possible conditions a subject can suffer from (read more of the etymology here). I would like to create a data set that shows all the possible errors a data set can exhibit. This data would be then used in the aforementioned course to make participants’ life miserable experience more diverse.

So far I have been able to come up with the following issues:

  • ill formatted entries, usually as GIS output: "1,234,567.0058654" (needs to clear commas, turn it into numeric, digits are irrelevant but eating up memory)
  • special characters (e.g. from MS Word) where UTF-8 or ASCII is expected
  • mixed case typos: "W-123" vs. "w-123"
  • leading/trailing whitespace: "W-123" vs. "W-123 "
  • MS Excel turning values into dates (e.g. 0-3 works fine, but 3-5 becomes 05-Mar)

I don’t imagine that this list can ever be complete, but right now it is far from complete. If you have struggled with a problem in the past and would like others to learn from it, please leave a comment and I will expand the list accordingly.

Statistical computing meets biodiversity conservation and natural resource management

How many birds are out there?

In a recent paper entitled “Lessons learned from comparing spatially explicit models and the Partners in Flight approach to estimate population sizes of boreal birds in Alberta, Canada” we developed improved, spatially explicit models for 81 land bird species in northern Alberta, Canada. We then compared these estimates of bird abundance to a commonly-used but non-spatially explicit estimate by Partners in Flight (PIF v 3.0) that’s based on the North American Breeding Bird Survey (BBS) data set. The publication is a result of years of collaboration between the ABMI, Boreal Avian Modelling (BAM) project, Canadian Wildlife Service (Environment and Climate Change Canada), and United States Geological Survey.

ABMI (7) ARU (1) Alberta (1) BAM (1) C (1) CRAN (1) Hungary (2) JOSM (2) PVA (2) PVAClone (1) QPAD (3) R (20) R packages (1) abundance (1) bioacoustics (1) biodiversity (1) birds (2) course (2) data (1) data cloning (4) dclone (3) density (1) dependencies (1) detect (3) detectability (3) footprint (3) forecasting (1) functions (3) intrval (4) lhreg (1) mefa4 (1) monitoring (2) pbapply (5) phylogeny (1) plyr (1) poster (2) processing time (2) progress bar (4) publications (2) report (1) sector effects (1) shiny (1) single visit (1) site (1) slider (1) slides (2) special (3) species (1) trend (1) tutorials (2) video (4)