Data analysis pipelines with R

Transparency and reproducibility are key aspects of rigorous scientific work. But how do you make your analyses transparent and reproducible without too much hassle?

Learning goal

Subgoals

Background

The goal of this lesson is to teach novice programmers to write modular code and best practices for using R for data analysis. R is commonly used in many scientific disciplines for statistical analysis and its array of third-party packages. The emphasis of these materials is to give participants practical advice and hands on experience in how to manage their data analysis in a reproducible, robust, and easy to automate way.

Note that this workshop will not focus on teaching the fundamentals of the programming language R, nor will not teach statistical analysis. There are many great courses and resources available (some of our favorite listed below) that do a far better job at that than we could. Instead we focus on building the scaffolding that helps you save time by automating tedious and repetitive tasks.

Prerequisites

Understand that computers store data and instructions (programs, scripts etc.) in files. Files are organised in directories (folders). Know how to access files not in the working directory by specifying the path.

Other resources:

Schedule

Setup Download files required for the lesson
00:00 1. Introduction to R and RStudio How to find your way around RStudio?
How to interact with R?
How to manage your environment?
How to install packages?
00:55 2. Project Management With RStudio How can I manage my projects in R?
01:25 3. Seeking Help How can I get help in R?
01:45 4. Reading and Writing CSV Files How do I read data from a CSV file into R?
How do I write data to a CSV file?
02:15 5. Writing Data How can I save plots and data created in R?
02:35 6. Producing Reports With knitr How can I integrate software and reports?
03:50 7. Writing Good Software How can I write software that other people can use?
04:05 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.