<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>data preparation | Adrian Gadient-Brügger</title>
    <link>/tag/data-preparation/</link>
      <atom:link href="/tag/data-preparation/index.xml" rel="self" type="application/rss+xml" />
    <description>data preparation</description>
    <generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Sat, 27 Feb 2021 00:00:00 +0000</lastBuildDate>
    <image>
      <url>/images/icon_huaf89efe5379de1a05284391d3347ab6a_20288_512x512_fill_lanczos_center_2.png</url>
      <title>data preparation</title>
      <link>/tag/data-preparation/</link>
    </image>
    
    <item>
      <title>Generate a simple codebook in R</title>
      <link>/post/simple-codebook/</link>
      <pubDate>Sat, 27 Feb 2021 00:00:00 +0000</pubDate>
      <guid>/post/simple-codebook/</guid>
      <description>
&lt;script src=&#34;/rmarkdown-libs/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;div id=&#34;overview&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Overview&lt;/h2&gt;
&lt;p&gt;A key task in empirical projects is to document the structure and content of datasets. This is often done in form of a &lt;strong&gt;codebook&lt;/strong&gt;, a file that lists at least the names of the variables (items) and their content. Sometimes it also describes the variables’ scale level (e.g., interval, ordinal, categorical) and answer options (e.g., 1 = strongly disagree, 5 = strongly agree; not covered here).&lt;/p&gt;
&lt;p&gt;Most researchers probably have some form of a codebook when they start designing their study and it might not be necessary to create one. However, at times the initial items are more a sketch and the contents are revised when creating the study or when implementing feedback from pretesters. Researchers can either track and transfer such changes to the provisional version of the codebook or they can create a new codebook based on their dataset. This post demonstrates how to do the latter in R: to create a simple codebook containing item names, labels, and some descriptive statistics. A strength of this approach is that it is less error prone than updating existing codebooks (assuming the necessary information is saved in the dataframe).&lt;/p&gt;
&lt;p&gt;But before you go through the trouble of reading this post, note that there’s also an automatic way to generate codebooks that doesn’t require any programming: &lt;a href=&#34;https://codebook.formr.org&#34; target=&#34;_blank&#34;&gt;https://codebook.formr.org&lt;/a&gt;. This should work for various file types (.sav (SPSS), .dta (Stata), .rds (R), .rdata (R), .por, .xpt, .csv, .tsv, .csv2).&lt;/p&gt;
&lt;div id=&#34;load-the-data-and-examine-item-labels&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Load the data and examine item labels&lt;/h3&gt;
&lt;p&gt;Let’s first load and examine a sample dataset:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(haven) # package to read files from popular statistical software packages such as SPSS, SAS, Stata
data &amp;lt;- read_sav(&amp;quot;https://mmi.psycho.unibas.ch/r-toolbox/data/Cars.sav&amp;quot;) # import data&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(sjlabelled) # package to read and write item labels and values
get_label(data) # show content of variables (what the variable measures)

# which returns for example:
# MPG               
# &amp;quot;Miles per Gallon&amp;quot;

get_labels(data) # show value labels (what the different answer options mean)
# note: the value labels are not used for this very simple codebook.

# which returns for example:
# $CYLINDER
# [1] &amp;quot;3 Cylinders&amp;quot; &amp;quot;4 Cylinders&amp;quot; &amp;quot;5 Cylinders&amp;quot; &amp;quot;6 Cylinders&amp;quot; &amp;quot;8 Cylinders&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that the functions &lt;code&gt;get_label&lt;/code&gt; and &lt;code&gt;get_labels&lt;/code&gt; will only return something if the data are labelled. If your dataset doesn’t contain any item / value labels, you can add them manually using the package &lt;code&gt;sjlabelled&lt;/code&gt; (see &lt;a href=&#34;https://cran.r-project.org/web/packages/sjlabelled/index.html&#34; target=&#34;_blank&#34;&gt;here&lt;/a&gt;). See also &lt;a href=&#34;https://www.adrianbruegger.com/post/import-and-tidy-spss-from_qualtrics/&#34; target=&#34;_blank&#34;&gt;my post&lt;/a&gt; about cleaning labelled data from Qualtrics.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;extract-item-labels&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Extract item labels&lt;/h3&gt;
&lt;p&gt;The next steps consists in extracting the item labels, item names, and saving them in a new data frame (here a tibble, could also be a different format):&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# extract labels from dataframe and store as new object
library(tibble)
simple_codebook &amp;lt;- enframe(get_label(data))

# use more informative column names
colnames(simple_codebook) &amp;lt;- c(&amp;quot;variable_id&amp;quot;, &amp;quot;item_text&amp;quot;)

# Show the new data frame
simple_codebook&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;simple_codebook_1.png&#34; alt=&#34;&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Figure 1. Simple codebook with name and content of variables.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;add-some-statistics&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Add some statistics&lt;/h3&gt;
&lt;p&gt;One more thing that is very helpful to include is the range of the answer options. These and other useful stats can quickly be computed using the function &lt;code&gt;describe&lt;/code&gt; from the package &lt;code&gt;psych&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# get descriptive statistics and select those of interest
library(psych)
library(dplyr)
descriptives &amp;lt;- data %&amp;gt;% describe() %&amp;gt;% as_tibble() %&amp;gt;% select(&amp;quot;n&amp;quot;,&amp;quot;min&amp;quot;,&amp;quot;max&amp;quot;,&amp;quot;mean&amp;quot;)
# add stats to codebook 
simple_codebook &amp;lt;- cbind(simple_codebook,descriptives)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Et voilà, the simple codebook:
&lt;img src=&#34;simple_codebook_2.png&#34; alt=&#34;Figure 2. Simple codebook with item names and item content.&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;save-codebook-as-.csv-or-.xlsx&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Save codebook as .csv or .xlsx&lt;/h3&gt;
&lt;p&gt;Finally, it is helpful to save the codebook in a file that can easily be shared and accessed.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# write to csv and Excel
write.csv(simple_codebook,file=&amp;quot;simple_codebook.csv&amp;quot;, na=&amp;quot;&amp;quot;, row.names=FALSE) 

library(openxlsx)
write.xlsx(simple_codebook,file=&amp;quot;simple_codebook.xlsx&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;more-sophisticated-alternative&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;More sophisticated alternative&lt;/h3&gt;
&lt;p&gt;If you’re interested in a more sophisticated codebook that includes more information, there’s a great package called &lt;code&gt;codebook&lt;/code&gt; (see &lt;a href=&#34;https://cran.r-project.org/web/packages/codebook/index.html&#34; target=&#34;_blank&#34;&gt;here&lt;/a&gt; and &lt;a href=&#34;https://journals.sagepub.com/doi/full/10.1177/2515245919838783&#34; target=&#34;_blank&#34;&gt;here&lt;/a&gt;). This works particularly well when the data have been collected with &lt;code&gt;formr&lt;/code&gt; (see &lt;a href=&#34;https://formr.org/&#34; target=&#34;_blank&#34;&gt;here&lt;/a&gt;). However, there are some caveats: First, to the best of my knowledge you can only use this package if you have installed &lt;code&gt;R Markdown&lt;/code&gt; and the necessary additional software to create PDF documents from within &lt;code&gt;R Markdown&lt;/code&gt;. Second, in my own experience there are often variables that cause problems when generating a codebook using &lt;code&gt;codebook&lt;/code&gt;, which means that you need to spend some time to identify problematic variables and find a way to deal with them. Third, it isn’t possible to programmatically create .csv or .xlsx files. To achieve this, you need to click your way through a menu.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
  </channel>
</rss>
