suppressPackageStartupMessages(library(tidyr))
suppressPackageStartupMessages(library(data.table))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(haven))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(GGally))
suppressPackageStartupMessages(library(kableExtra))
Longitudinal Data Analysis
Data Structure
The most important part of any statistical analysis begins with loading the data into Rstudio. Data can come in many forms with two popular ones being csv (comma separated values) and dta. Below we show different methods for how to load the data into RStudio.
Loading CSV files
Using base R
The following method is a pretty standard way of loading csv files into R. It requires no external packages (this is already a base R function) and works as follows. First, specify the location of your data, and put it into function as an input.
<- read.csv("Data/TLC.csv") TLC
We can then get a look at the data by using the head function which provides us with a sneak peek of the first n rows.
head(TLC, n = 10)
id lead0 lead1 lead4 lead6 group
1 1 30.8 26.9 25.8 23.8 Placebo
2 2 26.5 14.8 19.5 21.0 Treatment
3 3 25.8 23.0 19.1 23.2 Treatment
4 4 24.7 24.5 22.0 22.5 Placebo
5 5 20.4 2.8 3.2 9.4 Treatment
6 6 20.4 5.4 4.5 11.9 Treatment
7 7 28.6 20.8 19.2 18.4 Placebo
8 8 33.7 31.6 28.5 25.1 Placebo
9 9 19.7 14.9 15.3 14.7 Placebo
10 10 31.1 31.2 29.2 30.1 Placebo
Using the readr package
The next method requires the use of the readr package. It works exactly the same as read.csv, save for the fact that it is faster than read.csv.
library(readr)
<- read_csv("Data/TLC.csv") TLC
We can also print the first few rows to take a look of our data using function head
, here we print the first 10 rows of the data.
head(TLC, n = 10)
# A tibble: 10 × 6
id lead0 lead1 lead4 lead6 group
<dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 30.8 26.9 25.8 23.8 Placebo
2 2 26.5 14.8 19.5 21 Treatment
3 3 25.8 23 19.1 23.2 Treatment
4 4 24.7 24.5 22 22.5 Placebo
5 5 20.4 2.8 3.2 9.40 Treatment
6 6 20.4 5.40 4.5 11.9 Treatment
7 7 28.6 20.8 19.2 18.4 Placebo
8 8 33.7 31.6 28.5 25.1 Placebo
9 9 19.7 14.9 15.3 14.7 Placebo
10 10 31.1 31.2 29.2 30.1 Placebo
Using the data.table package
If we have large datasets, we can use the fread function in the data.table package to read the data faster compared to the other methods above, and we print the first 5 rows of the data.
library(data.table)
<- fread("Data/TLC.csv")
TLC head(TLC, n = 5)
id lead0 lead1 lead4 lead6 group
<int> <num> <num> <num> <num> <char>
1: 1 30.8 26.9 25.8 23.8 Placebo
2: 2 26.5 14.8 19.5 21.0 Treatment
3: 3 25.8 23.0 19.1 23.2 Treatment
4: 4 24.7 24.5 22.0 22.5 Placebo
5: 5 20.4 2.8 3.2 9.4 Treatment
Loading dta files
We can also read files in other formats from other software (STATA, SPSS, SAS, etc). Here we will explore reading dta files which is used in STATA software. In order to load these into Rstudio we need to use a package known as haven. The haven package has a function known as read_dta()
which serves a similar purpose as read.csv()
, read_csv()
and fread()
.
<- read_dta("Data/TLC.dta")
TLCdta head(TLCdta, n = 15)
# A tibble: 15 × 6
id lead0 lead1 lead4 lead6 group
<dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 30.8 26.9 25.8 23.8 Placebo
2 2 26.5 14.8 19.5 21 Treatment
3 3 25.8 23 19.1 23.2 Treatment
4 4 24.7 24.5 22 22.5 Placebo
5 5 20.4 2.80 3.20 9.40 Treatment
6 6 20.4 5.40 4.5 11.9 Treatment
7 7 28.6 20.8 19.2 18.4 Placebo
8 8 33.7 31.6 28.5 25.1 Placebo
9 9 19.7 14.9 15.3 14.7 Placebo
10 10 31.1 31.2 29.2 30.1 Placebo
11 11 19.8 17.5 20.5 27.5 Placebo
12 12 24.8 23.1 24.6 30.9 Treatment
13 13 21.4 26.3 19.5 19 Placebo
14 14 27.9 6.30 18.5 16.3 Treatment
15 15 21.1 20.3 18.4 20.8 Placebo