Using github to create customized reports

This tutorial shows you how to use github to automatically download survey data and prepare customized reports for each case or participant.

Background

We just launched the journal Global Environmental Psychology. A key priority of this journal is to advance reproducible, transparent, and open science. To achieve this, we need to check and enforce certain policies, for example, ensure that authors share their data or that they adhere to specific reporting standards.

The “traditional” approach to do this would be that authors fill in paper forms and editors check the completed forms. This is problematic for several reasons:

  • Such a form would include many questions that may not be relevant for a particular work (e.g., questions about data if it’s a conceptual manuscript without data).
  • Moreover, authors need to provide certain information in several places (e.g., manuscript, paper form, submission system). Mistakes can happen when completing or returning the forms (e.g., PDF not saving correctly, inadvertently deleting parts in text editor).
  • This approach would also be cumbersome for editors and publishers, as they would have to look very closely at the large list of questions, may overlook omissions, or waste a lot of time transferring information to other documents or databases.

To avoid these problems, we implemented the form as an online survey. This makes it possible to use filters, hide irrelevant questions, and reduce the workload for authors. The survey data are then used to render reports with R and R Markdown. This approach offers great functionality such as automatically highlighting parts of the submission that don’t comply with our guidelines (see this blogpost for more information). This simplifies the review of manuscripts enormously.

The reason why we implemented this on github is that this ensures that the editors-in-chief can access that data and reports (which wouldn’t be the case if the reports were created on a local computer). Moreover, the possibility to create new reports on a regular schedule ensures that the data and reports are always up to date.

In sum, to make this process efficient for authors and editors, we collect the relevant information via the online survey tool formr and use R and R Markdown to generate customized summary reports (see also this blogpost). These reports are used in the review process and are published alongside accepted papers to document that open science requirements were met.

Overview

The remainder of this post explains how to set up a github repository to enable the complete workflow (download data, create and save customized reports). A copy of the github repository with all relevant R code is available here.

If you want to implement something similar on your local machine (rather than on github), the last part of this post should be useful.

Setting up a github repository with git actions

Getting started

If you don’t have a github account, go to https://github.com/ and create one. This is free for educational purposes. Then you need to create a new repository (probably private, unless everyone should have access).

If you want to run online surveys with formr, then of course you also need a formr account. This free survey platform is versatile and really great for open and replicable research.

The R part

To automatically create the customized reports, you need R code that does the following things:

  • Download the survey data. In the present case, this requires downloading the survey data (0_connect_git.R) and processing the data (1_import.R).
  • Extract relevant data and send them to the template (2_create_docs_git.R)
  • A template (“Transparency_Peer_Review.Rmd”) that defines the layout of the report, the parts that are identical, and the placeholders that will differ as a function of the underlying data.
  • List of R packages to use. Developers often update their packages and chances are that this causes problems. As a counter-measure, one can specify which package versions to use. This is done in the file “renv.lock”.

For more explanations on what these code files do, see this blog post.

Moreover, make sure all the folders defined in the code exist in the right place. An oddity that we encountered is that github can’t deal with empty folders. To solve this problem, we added placeholder files (without any relevant content).

Tell github what to do

Once the required files and folders are in the repository, you need to tell github what to do by setting up a workflow. Click on “Actions” and then on “set up a workflow yourself”. Then specify which virtual system to start (e.g., Linux or Windows), define what code to run, and where to save output. There are also two options to run the workflow: Either manually on “click” (on: workflow_dispatch) or on a schedule (e.g., every 12 hours: on: schedule: - cron: ‘0 */12 * * *’ ; for more information on defining the schedule, see https://crontab.guru/).

These instructions can be saved by committing them (click “Start commit”, then “Commit new file”). Note that this may not work in all browsers.

The instruction file is saved in the folder “.github/workflows”.

Setting up secrets

Certain steps such as accessing survey data on an external server require usernames and passwords. For obvious reasons these shouldn’t be saved openly. A better, more protected place to save sensitive information is in Github Actions Secrets.

Creating reports on a local computer

If you want to create the customized reports on your local computer rather than on github, you can download the code and run the files locally. To make sure your local machine finds the paths to data and folders, I’d recommend starting R Studio with “reports.Rproj”. Then simply use “0_connect_offline.R” instead of “.0_connect_git.R”. Then run the remaining R files according to their sequential numbering. And remember to save your credentials and tell R where to find them. More information about what else you need to consider when running the code locally can be found in this blogpost.

Related