Quickly create descriptive tables in R for MS Word and Excel

Presentation of R packages to generate descriptive tables and save them as Word files

Overview

Most reports of empirical work require a table that describes the study sample (e.g., people, animals, organizations). Gathering and aggregating the relevant information can be tedious. To illustrate, different types of variables require different statistics: Ratio and interval scale variables (e.g., age, test score) are best summarized by their mean and standard deviation, while nominal and ordinal data (e.g., gender, level of education) are better summarized by frequencies and percentages. Preparing statistics for different subgroups (e.g., experimental conditions or administrative regions) may entail additional work (see here and here for some tips on how to prepare descriptive tables manually; useful functions to prepare multiple summary statistics in one step include rstatix::get_summary_stats, psych::describe, Hmisc::describe, and DescrTab2::descr for ratio and interval data, and janitor::tabyl for nominal and ordinal data).

Fortunately, there are R packages that facilitate creating such tables. The packages differ in terms of portability:
- some only show the results in the console
- others are intended for creating PDF files
- still others can save tables directly as Word or Excel files

Producing tables as Word or Excel files is very convenient when collaborating with others. Another advantage is that the process of creating the tables is automated, transparent, and replicable.

In what follows, I present a couple of packages that require only little coding, automatically prepare the appropriate statistics, and offer great portability (i.e., save tables as Word or Excel files while keeping as much of the formatting as possible). To see how the Word files look, just click on the package’s name:
- crosstable
- gtsummary
- arsenal
- table1
- tableone
- furniture

Preparation

Make sure all required packages are available and loaded:

if (!require("pacman")) install.packages("pacman")
pacman::p_load(
  "httr", # for downloading data
  "sjlabelled",  # for working with item labels (retrieving their content) 
  "labelled", # for with labelled variables
  "crosstable", # create  table
  "gtsummary", # create  table
  "arsenal", # create  table
  "table1", # create  table
  "tableone", # create  table
  "furniture", # create  table
  "kableExtra", # for formatting tables
  "openxlsx", # to save table as Excel file
  "flextable", # for formatting and exporting tables to word
  "officer", # for exporting tables to word
  "tidyverse") # data handling & cleaning

To be able to replicate the code and create the table, you can download the sample data from a previous study directly from OSF:

url <- 'https://osf.io/62ya5//?action=download'
filename <- 'df_study1.RData'
httr::GET(url, httr::write_disk(filename, overwrite = TRUE))
load("df_study1.RData")

# keep only relevant variables
df <- df %>%
  # keep:
  select("age","gender","student","condition")

Most of the packages introduced below automatically check the scale level of variables and then compute the appropriate statistic (e.g., frequencies for ordinal data). For this to work properly, it is essential that the “type” of the variables is correctly defined. In this dataset, some of the categorical variables need updating from type numeric to factor.

df <- df %>% 
  # convert to type factor
  dplyr::mutate(df, across(c(condition,gender,student), as.factor)) 

Packages that save tables as Word / Excel files

The following packages can save formatted tables as Word files. One package (gtsummary) can also directly save the table as an Excel file (there may be indirect ways for the other packages, too).

crosstable

Let’s start with crosstable, the package that requires the least amount of code and has good default settings. In its simplest form, the code for this package couldn’t be shorter:

crosstable(df) 
## # A tibble: 13 × 4
##    .id       label     variable   value           
##    <chr>     <chr>     <chr>      <chr>           
##  1 age       age       Min / Max  13.0 / 71.0     
##  2 age       age       Med [IQR]  26.0 [24.0;29.0]
##  3 age       age       Mean (std) 28.8 (9.3)      
##  4 age       age       N (NA)     377 (1)         
##  5 gender    gender    1          269 (71.16%)    
##  6 gender    gender    2          109 (28.84%)    
##  7 student   student   1          229 (61.07%)    
##  8 student   student   2          146 (38.93%)    
##  9 student   student   NA         3               
## 10 condition condition 1          94 (24.87%)     
## 11 condition condition 2          91 (24.07%)     
## 12 condition condition 3          101 (26.72%)    
## 13 condition condition 4          92 (24.34%)

By default, crosstable displays the table in the console. A nicer output (e.g., for websites or R Studio’s viewer) can be obtained via knitr::kable(). Grouping the data by a column (here the experimental condition) is also easy.

crosstable(df, by = "condition") %>% 
  select(-".id") %>% # remove first column
  # optimize formatting for web display (table looks better without next line)
  knitr::kable()
label variable 1 2 3 4
age Min / Max 17.0 / 62.0 17.0 / 54.0 13.0 / 61.0 19.0 / 71.0
age Med [IQR] 26.0 [24.0;29.0] 26.0 [24.0;30.5] 26.0 [24.0;30.0] 25.0 [23.0;28.2]
age Mean (std) 29.1 (9.3) 28.8 (8.3) 29.0 (10.0) 28.2 (9.6)
age N (NA) 94 (0) 90 (1) 101 (0) 92 (0)
gender 1 63 (23.42%) 68 (25.28%) 72 (26.77%) 66 (24.54%)
gender 2 31 (28.44%) 23 (21.10%) 29 (26.61%) 26 (23.85%)
student 1 57 (24.89%) 57 (24.89%) 59 (25.76%) 56 (24.45%)
student 2 36 (24.66%) 34 (23.29%) 40 (27.40%) 36 (24.66%)
student NA 1 0 2 0

With the following code, it is also possible to save the table as a Word file (see here and here).

# Specify properties for the target document
library(officer)
sect_properties <- prop_section(
  page_size(width = 21, height = 29.7, orient = "portrait"),
  type = "continuous",
  page_margins = page_mar())

# create table
df %>% 
  crosstable::crosstable(by = "condition") %>%
  # needs to be a flex_table 
  crosstable::as_flextable() %>% 
  # use flextable package to save table as word
  flextable::save_as_docx(path = "descriptives_crosstable.docx", pr_section = sect_properties)

# for more details about last step, see https://davidgohel.github.io/flextable/reference/save_as_docx.html

gtsummary

There is also a very quick way to get a descriptive table with the package gtsummary:

df %>% 
  gtsummary::tbl_summary() %>% 
  # for nicer display on website
  gtsummary::as_kable()
Characteristic N = 378
age 26 (24, 29)
Unknown 1
gender
1 269 (71%)
2 109 (29%)
student
1 229 (61%)
2 146 (39%)
Unknown 3
condition
1 94 (25%)
2 91 (24%)
3 101 (27%)
4 92 (24%)

The information for the categorical / ordinal variables looks fine (frequencies and percentages). However, for interval scale variables such as age I’d rather have the mean and standard deviation than the median and the interquartile range. The “by” command can be used to summarize the data by group.

df %>% 
  gtsummary::tbl_summary( 
    by = condition,
    statistic = list(age = "{mean} ({sd})")) %>%
  # for nicer display on website
  gtsummary::as_kable()
Characteristic 1, N = 94 2, N = 91 3, N = 101 4, N = 92
age 29 (9) 29 (8) 29 (10) 28 (10)
Unknown 0 1 0 0
gender
1 63 (67%) 68 (75%) 72 (71%) 66 (72%)
2 31 (33%) 23 (25%) 29 (29%) 26 (28%)
student
1 57 (61%) 57 (63%) 59 (60%) 56 (61%)
2 36 (39%) 34 (37%) 40 (40%) 36 (39%)
Unknown 1 0 2 0

This is how gtsummary saves tables as Word files:

# set documents properties
library(officer)
sect_properties <- prop_section(
  page_size(width = 21, height = 29.7, orient = "portrait"),
  type = "continuous",
  page_margins = page_mar()
)

df %>% 
  gtsummary::tbl_summary( 
    by = condition,
    statistic = list(age = "{mean} ({sd})")) %>% 
  # this changes the table to a different format that can be saved as Word
  as_flex_table() %>% 
  # use flextable package to save table as word
  flextable::save_as_docx(path = "descriptives_gtsummary.docx", pr_section = sect_properties)
# for more details about last step, see https://davidgohel.github.io/flextable/reference/save_as_docx.html

To the best of my knowledge, gtsummary is the only package that can export tables as Excel files:

df %>% 
  gtsummary::tbl_summary( 
    by = condition,
    statistic = list(age = "{mean} ({sd})"),
    digits = all_continuous() ~ 2) %>% 
  # export to Excel
  as_hux_xlsx("descriptives_gtsummary.xlsx") 

arsenal

arsenal also exports tables as Word files but requires a bit more coding. For instance, it is necessary to specify all the variables to include in the table. Note that by default arsenal shows the total and performs statistical tests when comparing groups (this can be turned off, though).

# create table
arsenal::tableby(condition~ age + gender + student, 
                 data = df, 
                 total = FALSE, # don't show column with total numbers
                 test = FALSE) %>% # don't perform test for group comparisons
  # display table  
  summary()
1 (N=94) 2 (N=91) 3 (N=101) 4 (N=92)
age
   N-Miss 0 1 0 0
   Mean (SD) 29.096 (9.275) 28.756 (8.305) 28.980 (9.971) 28.250 (9.587)
   Range 17.000 - 62.000 17.000 - 54.000 13.000 - 61.000 19.000 - 71.000
gender
   1 63 (67.0%) 68 (74.7%) 72 (71.3%) 66 (71.7%)
   2 31 (33.0%) 23 (25.3%) 29 (28.7%) 26 (28.3%)
student
   N-Miss 1 0 2 0
   1 57 (61.3%) 57 (62.6%) 59 (59.6%) 56 (60.9%)
   2 36 (38.7%) 34 (37.4%) 40 (40.4%) 36 (39.1%)

Saving the table as word file should be simple. However, for reasons unknown to me this doesn’t always work on my computer.

# save table as Word document  
descriptives <- arsenal::tableby(condition~ age + gender + student, data = df) 
arsenal::write2word(descriptives, "descriptives_arsenal.docx", title = "My table",
                    quiet = TRUE) # to suppress the command line output

table1

table1 is another package that requires only little code to create a descriptive table. Group-comparisons are easy (specify grouping variable with “|”). Unfortunately, I couldn’t figure out how to render the table on the website.

# not working on server
table1::table1(~age + gender + student | condition, data = df) 

table1 has its own function t1flex to convert the table into a flextable object, which can then be conveniently be saved as a Word file.

descriptive_table <- table1::table1(~age + gender + student | condition, data = df) 
# convert to flextable and save
table1::t1flex(descriptive_table) %>% 
  flextable::save_as_docx(path="descriptives_table1.docx",pr_section = sect_properties) # see previous sections for how to define "sect_properties"

tableone

tableone also requires only little code to create a descriptive table. strata can be used to group the data by a column. By default, tableone omits one level of categorical variables. If you want to see all levels, you need to change this manually (showAllLevels = TRUE). Note that when comparing groups, the default is to also conduct a statistical test (this can be turned off, though).

tableone::CreateTableOne(vars = c("age", "gender", "student"), 
                         data = df, strata = "condition",
                         test = FALSE) %>% # don't perform test for group comparison
  print(showAllLevels = TRUE) %>% 
  # use kableone for nicer display
  tableone::kableone()
level 1 2 3 4
n 94 91 101 92
age (mean (SD)) 29.10 (9.27) 28.76 (8.31) 28.98 (9.97) 28.25 (9.59)
gender (%) 1 63 (67.0) 68 (74.7) 72 (71.3) 66 (71.7)
2 31 (33.0) 23 (25.3) 29 (28.7) 26 (28.3)
student (%) 1 57 (61.3) 57 (62.6) 59 (59.6) 56 (60.9)
2 36 (38.7) 34 (37.4) 40 (40.4) 36 (39.1)

tableone doesn’t offer a direct way to export tables as Word files. However, with some additional code (partially taken from here) the table can be converted into a flextable object and then be saved as a Word file.

(Note that tableone can also prepare the table in a way that makes it easy to copy-and-paste to Excel (look for “quote = TRUEhere and here.)

# save as word by using the print function and by converting the table to a flextable 
descriptives_tableone <- tableone::CreateTableOne(data = df, strata = "condition")
descriptives_tableone <- print(descriptives_tableone,printToggle = FALSE)
descriptives_tableone %>% 
  as.data.frame() %>% 
  rownames_to_column("Variables") %>%
  flextable() %>%
  flextable::save_as_docx(path = "descriptives_tableone.docx", pr_section = sect_properties) # see previous sections for how to define "sect_properties"

furniture

Another package that quickly generates descriptive tables is furniture.

furniture::table1(df, age, gender, student, splitby = ~condition) %>% 
  # for some reason this only works if the object is converted 
  kableExtra::kable()
. 1 2 3 4
n = 93 n = 90 n = 99 n = 92
age
29.1 (9.3) 28.8 (8.3) 28.9 (9.9) 28.2 (9.6)
gender
1 62 (66.7%) 68 (75.6%) 71 (71.7%) 66 (71.7%)
2 31 (33.3%) 22 (24.4%) 28 (28.3%) 26 (28.3%)
student
1 57 (61.3%) 56 (62.2%) 59 (59.6%) 56 (60.9%)
2 36 (38.7%) 34 (37.8%) 40 (40.4%) 36 (39.1%)

When passed on to flextable, tables can be saved as Word files.

furniture::table1(df, age, gender, student, splitby = ~condition) %>% 
  as.data.frame() %>% 
  flextable() %>%
  flextable::save_as_docx(path="descriptives_furniture.docx",pr_section = sect_properties) # see previous sections for how to define "sect_properties"

Other interesting packages

There are a number of other packages that don’t offer the possibility to save tables as Word or Excel files but nonetheless generate nice, descriptive tables (see Table below). Most of them offer more manual ways to import tables into Word and Excel such as copy-and-pasting html tables, which is cool because it keeps the formatting (see here, here, or here). Another possibility is to save tables as text files (see e.g., here or here).

Package Version Output Pipe1 Comments
DescrTab2 2.1.16 html, R Markdown, Word (if run as .Rmd file) Yes Many options to customize table
summarytools 1.0.1 R Markdown, R Studio Viewer, html Yes Gives nice, detailed overview; may need installing addiational software
tab::tabmulti 5.1.1 kable No Should work well with R Markdown and knitr documents; further processing and formatting should be possible (not tested)
Gmisc 3.0.0 html and LaTeX Yes Can be passed on to htmlTable
modelsummary::datasummary_balance 1.1.0 HTML, LaTeX, MS Word & Powerpoint, Text/Markdown, PDF, RTF, Image files. No No user-friendly way to quickly combine and nicely display stats for categorical and numeric variables
vtable::sumtable 1.4.1 html, LaTeX Yes No user-friendly way to quickly combine and nicely display stats for categorical and numeric variables
qwraps2 0.5.2 R Markdown, LaTeX ? Highly customizable (e.g., ad-hoc merging of levels in categorical variables), requires more coding

1 Does the package work with pipes (%>%)?

Final words

There are really nice R packages to create descriptive tables with little effort. Which one you find most useful may depend on criteria not covered in this post (e.g., how missing values are computed and displayed, how well the default statistics and formatting style corresponds to your disciplinary conventions).

If you find any errors, omissions, or believe additional packages should be included in this post, please let me know.