Longitudinal Data Analysis

Data Structure

suppressPackageStartupMessages(library(tidyr))
suppressPackageStartupMessages(library(data.table))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(haven))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(GGally))
suppressPackageStartupMessages(library(kableExtra))

The most important part of any statistical analysis begins with loading the data into Rstudio. Data can come in many forms with two popular ones being csv (comma separated values) and dta. Below we show different methods for how to load the data into RStudio.

Loading CSV files

Using base R

The following method is a pretty standard way of loading csv files into R. It requires no external packages (this is already a base R function) and works as follows. First, specify the location of your data, and put it into function as an input.

TLC <- read.csv("Data/TLC.csv")

We can then get a look at the data by using the head function which provides us with a sneak peek of the first n rows.

head(TLC, n = 10)
   id lead0 lead1 lead4 lead6     group
1   1  30.8  26.9  25.8  23.8   Placebo
2   2  26.5  14.8  19.5  21.0 Treatment
3   3  25.8  23.0  19.1  23.2 Treatment
4   4  24.7  24.5  22.0  22.5   Placebo
5   5  20.4   2.8   3.2   9.4 Treatment
6   6  20.4   5.4   4.5  11.9 Treatment
7   7  28.6  20.8  19.2  18.4   Placebo
8   8  33.7  31.6  28.5  25.1   Placebo
9   9  19.7  14.9  15.3  14.7   Placebo
10 10  31.1  31.2  29.2  30.1   Placebo

Using the readr package

The next method requires the use of the readr package. It works exactly the same as read.csv, save for the fact that it is faster than read.csv.

library(readr)
TLC <- read_csv("Data/TLC.csv")

We can also print the first few rows to take a look of our data using function head, here we print the first 10 rows of the data.

head(TLC, n = 10)
# A tibble: 10 × 6
      id lead0 lead1 lead4 lead6 group    
   <dbl> <dbl> <dbl> <dbl> <dbl> <chr>    
 1     1  30.8 26.9   25.8 23.8  Placebo  
 2     2  26.5 14.8   19.5 21    Treatment
 3     3  25.8 23     19.1 23.2  Treatment
 4     4  24.7 24.5   22   22.5  Placebo  
 5     5  20.4  2.8    3.2  9.40 Treatment
 6     6  20.4  5.40   4.5 11.9  Treatment
 7     7  28.6 20.8   19.2 18.4  Placebo  
 8     8  33.7 31.6   28.5 25.1  Placebo  
 9     9  19.7 14.9   15.3 14.7  Placebo  
10    10  31.1 31.2   29.2 30.1  Placebo  

Using the data.table package

If we have large datasets, we can use the fread function in the data.table package to read the data faster compared to the other methods above, and we print the first 5 rows of the data.

library(data.table)
TLC <- fread("Data/TLC.csv")
head(TLC, n = 5)
      id lead0 lead1 lead4 lead6     group
   <int> <num> <num> <num> <num>    <char>
1:     1  30.8  26.9  25.8  23.8   Placebo
2:     2  26.5  14.8  19.5  21.0 Treatment
3:     3  25.8  23.0  19.1  23.2 Treatment
4:     4  24.7  24.5  22.0  22.5   Placebo
5:     5  20.4   2.8   3.2   9.4 Treatment

Loading dta files

We can also read files in other formats from other software (STATA, SPSS, SAS, etc). Here we will explore reading dta files which is used in STATA software. In order to load these into Rstudio we need to use a package known as haven. The haven package has a function known as read_dta() which serves a similar purpose as read.csv(), read_csv() and fread().

TLCdta <- read_dta("Data/TLC.dta")
head(TLCdta, n = 15)
# A tibble: 15 × 6
      id lead0 lead1 lead4 lead6 group    
   <dbl> <dbl> <dbl> <dbl> <dbl> <chr>    
 1     1  30.8 26.9  25.8  23.8  Placebo  
 2     2  26.5 14.8  19.5  21    Treatment
 3     3  25.8 23    19.1  23.2  Treatment
 4     4  24.7 24.5  22    22.5  Placebo  
 5     5  20.4  2.80  3.20  9.40 Treatment
 6     6  20.4  5.40  4.5  11.9  Treatment
 7     7  28.6 20.8  19.2  18.4  Placebo  
 8     8  33.7 31.6  28.5  25.1  Placebo  
 9     9  19.7 14.9  15.3  14.7  Placebo  
10    10  31.1 31.2  29.2  30.1  Placebo  
11    11  19.8 17.5  20.5  27.5  Placebo  
12    12  24.8 23.1  24.6  30.9  Treatment
13    13  21.4 26.3  19.5  19    Placebo  
14    14  27.9  6.30 18.5  16.3  Treatment
15    15  21.1 20.3  18.4  20.8  Placebo