Dr. Péter Sólymos is an ecologist and R programmer. He has worked with continental scale data sets and developed statistical techniques for estimating population density from messy data sets. He is the author of numerous well-known R packages, including detect, dclone, vegan, and ResourceSelection. He works currently as a data scientist helping utility companies improving their outage and impact prevention practices, and is an adjunct professor at the University of Alberta in Edmonton, Canada.
This course is aimed towards researchers analysing field observations, who are often faced by data heterogeneities due to field sampling protocols changing from one project to another, or through time over the lifespan of projects, or trying to combine legacy data sets with new data collected by recording units.
Such heterogeneities can bias analyses when data sets are integrated inadequately or can lead to information loss when filtered and standardized to common standards. Accounting for these issues is important for better inference regarding status and trend of species and communities.
Analysts of such “messy” data sets need to feel comfortable with manipulating the data, need a full understanding the mechanics of the models being used (i.e. critically interpreting the results and acknowledging assumptions and limitations), and should be able to make informed choices when faced with methodological challenges.
The course emphasizes critical thinking and active learning through hands on programming exercises. We will use publicly available data sets to demonstrate the data manipulation and analysis. We will use freely available and open-source R packages.
The expected outcome of the course is a solid foundation for further professional development via increased confidence in applying these methods for field observations.
By the end of the course, participants should:
Introductory lectures on the concepts and refreshers on R usage. Intermediate-level lectures interspersed with hands-on mini practicals and longer projects. Data sets for computer practicals will be provided by the instructors, but participants are welcome to bring their own data.
A basic understanding of statistical, mathematical and physical concepts. Specifically, generalised linear regression models, including mixed models; basic knowledge of calculus.
Familiarity with R, ability to import/export data, manipulate data frames, fit basic statistical models (up to GLM) and generate simple exploratory and diagnostic plots.
See publications listed in the papers folder.
© 2025 Péter Sólymos
This work is licensed under CC BY-SA 4.0