A glimpse of the tidyverse
Introduction
Goal for Today
Introduce you to R and Rstudio.
What Are R and Rstudio?
What Is R?
R is an open source programming language with origins in C and FORTRAN. Advantages:
- Flexibility
- It’s free (and open source)!
- Ease of handling advanced computational models
- Ease of handling multiple data sets in one session
- Higher demand in industries.
But more importantly, it’s free.
Some disadvantages:
- “Bleeding” edge? (Even then…)
- Higher learning curve
- A “programming language” and not a “program.”
Rstudio will help with the learning curve component.
Getting Started in R and Rstudio
Let’s get started in Rstudio first. Select “Tools” in the menu.
- Scroll to “Global Options” (should be at the bottom)
- On the pop-up, select “pane layout.”
- Rearrange so that “Source” is top left, “Console” is top right", and the files/plots/packages/etc. is the bottom right.
- Save
Getting Started in R and Rstudio
Hit Ctrl-Shift-N (Cmd-Shift-N if you’re on a Mac) to open up a new script.
- Minimize the “Environment/History/Connections/Git” pane in the bottom left.
- Adjust the console output to your liking.
This should maximize your Rstudio experience, esp. as you’ll eventually start writing documents in Rstudio.
- That should maximize your Rstudio experience, esp. as you begin to write documents in Rstudio as well.
A Few Commands to Get Started
getwd()
will spit out your current working directory.
getwd()
#> [1] "D:/MY THESIS/Git/academic-website/content/post/2023-09-09-a-glimpse-of-the-tidyverse"
By default, assuming your username is “Bongani”:
- Windows:
"D:/"
(notice the forward slashes!)
Creating Objects
R is an “object-oriented” programming language.
- i.e. inputs create outputs that may be assigned to objects in the workspace.
For example:
a <- 3
b <- 4
this_is_a_long_object_name_and_you_should_not_do_this <- 5
d <- pi # notice there are a few built-in functions/objects
Sometimes it’s useful to see all the mess you’ve created in your workspace
ls()
#> [1] "a"
#> [2] "b"
#> [3] "d"
#> [4] "insert_html_math"
#> [5] "insert_inc_bullet"
#> [6] "insert_pause"
#> [7] "insert_slide_break"
#> [8] "make_latex_decorator"
#> [9] "this_is_a_long_object_name_and_you_should_not_do_this"
Install Packages
R depends on user-created libraries to do much of its functionality. We’re going to start with a few for the sake of this exercise.
# This will take a while, mostly for tidyverse
install.packages(c("tidyverse","devtools"))
# Once it's installed:
library(tidyverse)
library(devtools)
Load Data
You can load data from your hard drive, or even the internet. Some commands:
haven::read_dta()
for Stata .dta fileshaven::read_spss()
for SPSS filesread_csv()
for CSV filesreadxl::read_excel()
for MS Excel spreadsheetsread_tsv()
for tab-separated values.
Just make sure to apply it to an object.
\scriptsize
# Note: hypothetical data
Apply <- haven::read_dta("D:\MY THESIS\BONGANIfinalologit.dta")
Cunemp <- read_tsv("D:\MY THESIS\BONGANIfinalla.data.64.County")
\normalsize
Load Data
Some R packages, like pharmacoSmoking
package, has built-in data. For example:
pwt_sample= pharmacoSmoking |>
as.tibble()
names(pwt_sample)
#> [1] "id" "ttr" "relapse" "grp"
#> [5] "age" "gender" "race" "employment"
#> [9] "yearsSmoking" "levelSmoking" "ageGroup2" "ageGroup4"
#> [13] "priorAttempts" "longestNoSmoke"
Tidyverse
Tidyverse
The tidyverse is a suite of functions/packages that totally rethink base R. Some functions we’ll discuss:
%>%
(the pipe)glimpse()
andsummary()
select()
group_by()
summarize()
mutate()
filter()
I cannot fully discuss everything from the tidyverse. That’s why there’s Google/Stackexchange. :P
%>%
The pipe (%>%
) allows you to chain together a series of tidyverse functions.
- This is especially useful when you’re recoding data and you want to make sure you got everything right before saving the data.
You can chain together a host of tidyverse commands within it.
glimpse()
and summary()
glimpse()
and summary()
will get you some basic descriptions of your data. For example:
pwt_sample %>% glimpse() # notice the pipe
#> Rows: 125
#> Columns: 14
#> $ id <int> 21, 113, 39, 80, 87, 29, 16, 35, 54, 70, 84, 85, 25, 47~
#> $ ttr <int> 182, 14, 5, 16, 0, 182, 14, 77, 2, 0, 12, 182, 21, 3, 1~
#> $ relapse <int> 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1~
#> $ grp <fct> patchOnly, patchOnly, combination, combination, combina~
#> $ age <int> 36, 41, 25, 54, 45, 43, 66, 78, 40, 38, 64, 51, 37, 65,~
#> $ gender <fct> Male, Male, Female, Male, Male, Male, Male, Female, Fem~
#> $ race <fct> white, white, white, white, white, hispanic, black, bla~
#> $ employment <fct> ft, other, other, ft, other, ft, pt, other, ft, ft, oth~
#> $ yearsSmoking <int> 26, 27, 12, 39, 30, 30, 54, 56, 25, 23, 30, 35, 23, 50,~
#> $ levelSmoking <fct> heavy, heavy, heavy, heavy, heavy, heavy, heavy, light,~
#> $ ageGroup2 <fct> 21-49, 21-49, 21-49, 50+, 21-49, 21-49, 50+, 50+, 21-49~
#> $ ageGroup4 <fct> 35-49, 35-49, 21-34, 50-64, 35-49, 35-49, 65+, 65+, 35-~
#> $ priorAttempts <int> 0, 3, 3, 0, 0, 2, 0, 10, 4, 10, 12, 1, 5, 6, 5, 2, 1, 1~
#> $ longestNoSmoke <int> 0, 90, 21, 0, 0, 1825, 0, 15, 7, 90, 365, 7, 1095, 180,~
glimpse()
and summary()
summary()
is technically not a tidyverse function, but it works within the pipe.
pwt_sample= pharmacoSmoking
pwt_sample %>% summary()
#> id ttr relapse grp
#> Min. : 1.00 Min. : 0.00 Min. :0.000 combination:61
#> 1st Qu.: 33.00 1st Qu.: 8.00 1st Qu.:0.000 patchOnly :64
#> Median : 67.00 Median : 49.00 Median :1.000
#> Mean : 66.15 Mean : 77.44 Mean :0.712
#> 3rd Qu.: 99.00 3rd Qu.:182.00 3rd Qu.:1.000
#> Max. :130.00 Max. :182.00 Max. :1.000
#> age gender race employment yearsSmoking
#> Min. :22.00 Female:81 black :38 ft :72 Min. : 9.00
#> 1st Qu.:41.00 Male :44 hispanic: 8 other:39 1st Qu.:22.00
#> Median :49.00 other : 2 pt :14 Median :30.00
#> Mean :48.84 white :77 Mean :30.88
#> 3rd Qu.:56.00 3rd Qu.:39.00
#> Max. :86.00 Max. :56.00
#> levelSmoking ageGroup2 ageGroup4 priorAttempts longestNoSmoke
#> heavy:89 21-49:66 21-34:16 Min. : 0.00 Min. : 0.0
#> light:36 50+ :59 35-49:50 1st Qu.: 1.00 1st Qu.: 7.0
#> 50-64:48 Median : 2.00 Median : 90.0
#> 65+ :11 Mean : 12.68 Mean : 539.7
#> 3rd Qu.: 5.00 3rd Qu.: 365.0
#> Max. :1000.00 Max. :6205.0
select()
select()
will grab (or omit) columns from the data.
# grab everything
pwt_sample %>% select(everything()) |> head()
#> id ttr relapse grp age gender race employment yearsSmoking
#> 1 21 182 0 patchOnly 36 Male white ft 26
#> 2 113 14 1 patchOnly 41 Male white other 27
#> 3 39 5 1 combination 25 Female white other 12
#> 4 80 16 1 combination 54 Male white ft 39
#> 5 87 0 1 combination 45 Male white other 30
#> 6 29 182 0 combination 43 Male hispanic ft 30
#> levelSmoking ageGroup2 ageGroup4 priorAttempts longestNoSmoke
#> 1 heavy 21-49 35-49 0 0
#> 2 heavy 21-49 35-49 3 90
#> 3 heavy 21-49 21-34 3 21
#> 4 heavy 50+ 50-64 0 0
#> 5 heavy 21-49 35-49 0 0
#> 6 heavy 21-49 35-49 2 1825
select()
# grab everything, but drop the id variable.
pwt_sample %>% select(-id)
#> ttr relapse grp age gender race employment yearsSmoking
#> 1 182 0 patchOnly 36 Male white ft 26
#> 2 14 1 patchOnly 41 Male white other 27
#> 3 5 1 combination 25 Female white other 12
#> 4 16 1 combination 54 Male white ft 39
#> 5 0 1 combination 45 Male white other 30
#> 6 182 0 combination 43 Male hispanic ft 30
#> 7 14 1 patchOnly 66 Male black pt 54
#> 8 77 1 patchOnly 78 Female black other 56
#> 9 2 1 patchOnly 40 Female black ft 25
#> 10 0 1 patchOnly 38 Male black ft 23
#> 11 12 1 patchOnly 64 Female black other 30
#> 12 182 0 combination 51 Male black ft 35
#> 13 21 1 patchOnly 37 Female white pt 23
#> 14 3 1 patchOnly 65 Male white other 50
#> 15 170 1 patchOnly 42 Female white ft 30
#> 16 25 1 patchOnly 40 Male white other 22
#> 17 4 1 patchOnly 65 Female white other 50
#> 18 182 0 combination 52 Female white other 19
#> 19 140 1 combination 43 Male white ft 27
#> 20 63 1 combination 34 Female white ft 18
#> 21 15 1 combination 46 Female white other 26
#> 22 140 1 combination 60 Male white ft 42
#> 23 110 1 combination 49 Female white other 35
#> 24 182 0 combination 58 Female white ft 38
#> 25 0 1 patchOnly 48 Male hispanic other 33
#> 26 182 0 combination 54 Female hispanic other 38
#> 27 15 1 patchOnly 49 Female black pt 35
#> 28 182 0 patchOnly 55 Male black ft 39
#> 29 4 1 patchOnly 33 Male black ft 12
#> 30 56 1 combination 49 Female black ft 37
#> 31 2 1 combination 46 Male black ft 25
#> 32 80 1 patchOnly 34 Female other ft 18
#> 33 182 0 combination 46 Female white pt 25
#> 34 56 1 patchOnly 52 Male black other 25
#> 35 0 1 patchOnly 52 Female black ft 30
#> 36 14 1 patchOnly 48 Female white ft 34
#> 37 14 1 patchOnly 48 Female white ft 39
#> 38 28 1 patchOnly 56 Female white ft 36
#> 39 182 0 patchOnly 58 Male white other 41
#> 40 6 1 patchOnly 60 Male white ft 50
#> 41 182 0 patchOnly 55 Male white ft 40
#> 42 14 1 patchOnly 43 Female white ft 28
#> 43 15 1 patchOnly 55 Female white ft 42
#> 44 182 0 combination 70 Female white other 52
#> 45 75 1 combination 62 Female white pt 46
#> 46 30 1 combination 86 Male white other 40
#> 47 4 1 combination 52 Male white ft 33
#> 48 56 1 combination 27 Female white ft 13
#> 49 182 0 combination 52 Male white ft 35
#> 50 182 0 combination 62 Male white ft 35
#> 51 5 1 combination 57 Female white ft 47
#> 52 8 1 combination 40 Male white ft 27
#> 53 140 1 combination 49 Male white other 14
#> 54 20 1 combination 68 Female white other 53
#> 55 63 1 combination 47 Female white pt 15
#> 56 30 1 combination 46 Female white ft 34
#> 57 8 1 combination 55 Male white pt 40
#> 58 50 1 combination 29 Male white ft 12
#> 59 14 1 combination 64 Female white pt 45
#> 60 0 1 combination 52 Male white ft 38
#> 61 84 1 patchOnly 38 Female hispanic ft 10
#> 62 0 1 patchOnly 35 Male hispanic other 20
#> 63 105 1 patchOnly 50 Female hispanic other 33
#> 64 182 0 combination 63 Female hispanic other 45
#> 65 182 0 patchOnly 58 Male black ft 40
#> 66 182 0 patchOnly 56 Male black ft 16
#> 67 7 1 patchOnly 44 Female black other 31
#> 68 182 0 patchOnly 34 Female black ft 20
#> 69 0 1 patchOnly 49 Female black ft 36
#> 70 8 1 patchOnly 43 Female black other 24
#> 71 1 1 patchOnly 39 Female black ft 27
#> 72 182 0 combination 41 Female black ft 20
#> 73 12 1 combination 46 Female black other 30
#> 74 182 0 combination 46 Female black ft 21
#> 75 49 1 patchOnly 53 Female white pt 18
#> 76 182 0 patchOnly 58 Male white ft 46
#> 77 182 0 patchOnly 40 Female white ft 25
#> 78 2 1 patchOnly 62 Female white other 49
#> 79 182 0 patchOnly 53 Male white ft 38
#> 80 56 1 patchOnly 44 Female white ft 35
#> 81 182 0 patchOnly 64 Female white other 47
#> 82 0 1 patchOnly 50 Female white ft 30
#> 83 28 1 patchOnly 47 Male white ft 22
#> 84 155 1 patchOnly 49 Female white pt 35
#> 85 2 1 patchOnly 51 Male white other 36
#> 86 0 1 patchOnly 41 Female white other 26
#> 87 0 1 patchOnly 22 Female white pt 10
#> 88 1 1 patchOnly 22 Female white pt 9
#> 89 140 1 patchOnly 34 Female white ft 18
#> 90 1 1 patchOnly 48 Female white ft 30
#> 91 28 1 patchOnly 30 Female white ft 12
#> 92 1 1 patchOnly 31 Male white ft 19
#> 93 182 0 patchOnly 44 Female white ft 30
#> 94 77 1 patchOnly 56 Female white ft 44
#> 95 56 1 patchOnly 29 Female white ft 10
#> 96 182 0 combination 69 Male white other 50
#> 97 182 0 combination 41 Male white ft 29
#> 98 182 0 combination 52 Female white other 33
#> 99 182 0 combination 52 Female white ft 30
#> 100 182 0 combination 53 Male white ft 30
#> 101 21 1 combination 31 Female white ft 15
#> 102 60 1 combination 70 Female white other 54
#> 103 0 1 combination 43 Male white other 29
#> 104 182 0 combination 58 Female white ft 36
#> 105 65 1 combination 48 Female white ft 34
#> 106 182 0 combination 72 Male white other 55
#> 107 182 0 combination 61 Female white other 45
#> 108 182 0 combination 53 Female white ft 41
#> 109 182 0 combination 63 Female other other 52
#> 110 2 1 combination 40 Female hispanic pt 23
#> 111 40 1 patchOnly 39 Female black ft 20
#> 112 100 1 patchOnly 60 Female black ft 20
#> 113 1 1 patchOnly 54 Female black other 34
#> 114 45 1 patchOnly 68 Female black ft 25
#> 115 14 1 patchOnly 54 Female black ft 22
#> 116 30 1 patchOnly 51 Male black other 30
#> 117 42 1 combination 39 Female black ft 23
#> 118 2 1 combination 47 Female black ft 33
#> 119 182 0 combination 33 Female black ft 10
#> 120 60 1 combination 27 Female black ft 11
#> 121 10 1 combination 45 Female black ft 32
#> 122 0 1 combination 36 Female black ft 20
#> 123 170 1 combination 39 Male black ft 20
#> 124 15 1 combination 56 Female black other 39
#> 125 182 0 combination 50 Female black pt 30
#> levelSmoking ageGroup2 ageGroup4 priorAttempts longestNoSmoke
#> 1 heavy 21-49 35-49 0 0
#> 2 heavy 21-49 35-49 3 90
#> 3 heavy 21-49 21-34 3 21
#> 4 heavy 50+ 50-64 0 0
#> 5 heavy 21-49 35-49 0 0
#> 6 heavy 21-49 35-49 2 1825
#> 7 heavy 50+ 65+ 0 0
#> 8 light 50+ 65+ 10 15
#> 9 heavy 21-49 35-49 4 7
#> 10 light 21-49 35-49 10 90
#> 11 heavy 50+ 50-64 12 365
#> 12 heavy 50+ 50-64 1 7
#> 13 light 21-49 35-49 5 1095
#> 14 heavy 50+ 65+ 6 180
#> 15 heavy 21-49 35-49 5 240
#> 16 heavy 21-49 35-49 2 2
#> 17 light 50+ 65+ 1 2
#> 18 light 50+ 50-64 1 7
#> 19 heavy 21-49 35-49 5 120
#> 20 heavy 21-49 21-34 8 90
#> 21 heavy 21-49 35-49 2 14
#> 22 light 50+ 50-64 6 2920
#> 23 heavy 21-49 35-49 10 60
#> 24 heavy 50+ 50-64 1 0
#> 25 heavy 21-49 35-49 4 120
#> 26 light 50+ 50-64 30 3650
#> 27 light 21-49 35-49 4 540
#> 28 heavy 50+ 50-64 1 28
#> 29 light 21-49 21-34 1 730
#> 30 heavy 21-49 35-49 3 2920
#> 31 light 21-49 35-49 2 1095
#> 32 heavy 21-49 21-34 1 120
#> 33 heavy 21-49 35-49 3 365
#> 34 light 50+ 50-64 10 7
#> 35 heavy 50+ 50-64 2 42
#> 36 heavy 21-49 35-49 6 2555
#> 37 heavy 21-49 35-49 1 2
#> 38 heavy 50+ 50-64 5 180
#> 39 light 50+ 50-64 100 180
#> 40 heavy 50+ 50-64 6 30
#> 41 heavy 50+ 50-64 1 180
#> 42 heavy 21-49 35-49 4 365
#> 43 heavy 50+ 50-64 2 60
#> 44 heavy 50+ 65+ 3 1
#> 45 heavy 50+ 50-64 8 1095
#> 46 light 50+ 65+ 4 2190
#> 47 heavy 50+ 50-64 2 2
#> 48 light 21-49 21-34 1 7
#> 49 heavy 50+ 50-64 3 730
#> 50 heavy 50+ 50-64 1 2555
#> 51 heavy 50+ 50-64 0 0
#> 52 heavy 21-49 35-49 2 90
#> 53 heavy 21-49 35-49 0 0
#> 54 heavy 50+ 65+ 0 0
#> 55 heavy 21-49 35-49 1 90
#> 56 light 21-49 35-49 20 90
#> 57 heavy 50+ 50-64 6 913
#> 58 heavy 21-49 21-34 0 0
#> 59 heavy 50+ 50-64 4 60
#> 60 heavy 50+ 50-64 4 7
#> 61 heavy 21-49 35-49 2 3
#> 62 light 21-49 35-49 2 7
#> 63 heavy 50+ 50-64 2 30
#> 64 heavy 50+ 50-64 1 28
#> 65 heavy 50+ 50-64 0 0
#> 66 light 50+ 50-64 3 700
#> 67 heavy 21-49 35-49 2 2
#> 68 heavy 21-49 21-34 5 90
#> 69 heavy 21-49 35-49 1 6205
#> 70 light 21-49 35-49 2 3
#> 71 heavy 21-49 35-49 1 1
#> 72 light 21-49 35-49 6 270
#> 73 light 21-49 35-49 2 55
#> 74 light 21-49 35-49 1 1095
#> 75 heavy 50+ 50-64 6 3650
#> 76 heavy 50+ 50-64 10 14
#> 77 heavy 21-49 35-49 1 2920
#> 78 heavy 50+ 50-64 1 8
#> 79 heavy 50+ 50-64 5 1095
#> 80 heavy 21-49 35-49 8 180
#> 81 heavy 50+ 50-64 4 365
#> 82 heavy 50+ 50-64 1 90
#> 83 heavy 21-49 35-49 3 4
#> 84 heavy 21-49 35-49 1 1095
#> 85 heavy 50+ 50-64 1 5
#> 86 heavy 21-49 35-49 1 7
#> 87 heavy 21-49 21-34 3 2
#> 88 heavy 21-49 21-34 2 3
#> 89 light 21-49 21-34 2 2190
#> 90 light 21-49 35-49 10 1095
#> 91 light 21-49 21-34 0 0
#> 92 heavy 21-49 21-34 10 120
#> 93 heavy 21-49 35-49 1 3
#> 94 heavy 50+ 50-64 4 1095
#> 95 light 21-49 21-34 8 240
#> 96 heavy 50+ 65+ 6 5475
#> 97 heavy 21-49 35-49 20 180
#> 98 heavy 50+ 50-64 5 270
#> 99 heavy 50+ 50-64 3 1095
#> 100 heavy 50+ 50-64 3 3285
#> 101 heavy 21-49 21-34 4 90
#> 102 heavy 50+ 65+ 1 90
#> 103 heavy 21-49 35-49 12 6
#> 104 light 50+ 50-64 2 90
#> 105 heavy 21-49 35-49 1000 548
#> 106 light 50+ 65+ 30 30
#> 107 heavy 50+ 50-64 1 60
#> 108 heavy 50+ 50-64 3 60
#> 109 heavy 50+ 50-64 2 180
#> 110 heavy 21-49 35-49 2 3
#> 111 light 21-49 35-49 3 210
#> 112 light 50+ 50-64 5 1825
#> 113 light 50+ 50-64 2 365
#> 114 light 50+ 65+ 2 7
#> 115 heavy 50+ 50-64 5 10
#> 116 heavy 50+ 50-64 2 30
#> 117 light 21-49 35-49 2 1825
#> 118 light 21-49 35-49 4 365
#> 119 light 21-49 21-34 2 1
#> 120 light 21-49 21-34 2 14
#> 121 heavy 21-49 35-49 1 75
#> 122 heavy 21-49 35-49 1 270
#> 123 light 21-49 35-49 3 180
#> 124 heavy 50+ 50-64 3 7
#> 125 heavy 50+ 50-64 0 0
select()
# grab just these three columns.
pwt_sample %>% select(ttr, grp, gender)
#> ttr grp gender
#> 1 182 patchOnly Male
#> 2 14 patchOnly Male
#> 3 5 combination Female
#> 4 16 combination Male
#> 5 0 combination Male
#> 6 182 combination Male
#> 7 14 patchOnly Male
#> 8 77 patchOnly Female
#> 9 2 patchOnly Female
#> 10 0 patchOnly Male
#> 11 12 patchOnly Female
#> 12 182 combination Male
#> 13 21 patchOnly Female
#> 14 3 patchOnly Male
#> 15 170 patchOnly Female
#> 16 25 patchOnly Male
#> 17 4 patchOnly Female
#> 18 182 combination Female
#> 19 140 combination Male
#> 20 63 combination Female
#> 21 15 combination Female
#> 22 140 combination Male
#> 23 110 combination Female
#> 24 182 combination Female
#> 25 0 patchOnly Male
#> 26 182 combination Female
#> 27 15 patchOnly Female
#> 28 182 patchOnly Male
#> 29 4 patchOnly Male
#> 30 56 combination Female
#> 31 2 combination Male
#> 32 80 patchOnly Female
#> 33 182 combination Female
#> 34 56 patchOnly Male
#> 35 0 patchOnly Female
#> 36 14 patchOnly Female
#> 37 14 patchOnly Female
#> 38 28 patchOnly Female
#> 39 182 patchOnly Male
#> 40 6 patchOnly Male
#> 41 182 patchOnly Male
#> 42 14 patchOnly Female
#> 43 15 patchOnly Female
#> 44 182 combination Female
#> 45 75 combination Female
#> 46 30 combination Male
#> 47 4 combination Male
#> 48 56 combination Female
#> 49 182 combination Male
#> 50 182 combination Male
#> 51 5 combination Female
#> 52 8 combination Male
#> 53 140 combination Male
#> 54 20 combination Female
#> 55 63 combination Female
#> 56 30 combination Female
#> 57 8 combination Male
#> 58 50 combination Male
#> 59 14 combination Female
#> 60 0 combination Male
#> 61 84 patchOnly Female
#> 62 0 patchOnly Male
#> 63 105 patchOnly Female
#> 64 182 combination Female
#> 65 182 patchOnly Male
#> 66 182 patchOnly Male
#> 67 7 patchOnly Female
#> 68 182 patchOnly Female
#> 69 0 patchOnly Female
#> 70 8 patchOnly Female
#> 71 1 patchOnly Female
#> 72 182 combination Female
#> 73 12 combination Female
#> 74 182 combination Female
#> 75 49 patchOnly Female
#> 76 182 patchOnly Male
#> 77 182 patchOnly Female
#> 78 2 patchOnly Female
#> 79 182 patchOnly Male
#> 80 56 patchOnly Female
#> 81 182 patchOnly Female
#> 82 0 patchOnly Female
#> 83 28 patchOnly Male
#> 84 155 patchOnly Female
#> 85 2 patchOnly Male
#> 86 0 patchOnly Female
#> 87 0 patchOnly Female
#> 88 1 patchOnly Female
#> 89 140 patchOnly Female
#> 90 1 patchOnly Female
#> 91 28 patchOnly Female
#> 92 1 patchOnly Male
#> 93 182 patchOnly Female
#> 94 77 patchOnly Female
#> 95 56 patchOnly Female
#> 96 182 combination Male
#> 97 182 combination Male
#> 98 182 combination Female
#> 99 182 combination Female
#> 100 182 combination Male
#> 101 21 combination Female
#> 102 60 combination Female
#> 103 0 combination Male
#> 104 182 combination Female
#> 105 65 combination Female
#> 106 182 combination Male
#> 107 182 combination Female
#> 108 182 combination Female
#> 109 182 combination Female
#> 110 2 combination Female
#> 111 40 patchOnly Female
#> 112 100 patchOnly Female
#> 113 1 patchOnly Female
#> 114 45 patchOnly Female
#> 115 14 patchOnly Female
#> 116 30 patchOnly Male
#> 117 42 combination Female
#> 118 2 combination Female
#> 119 182 combination Female
#> 120 60 combination Female
#> 121 10 combination Female
#> 122 0 combination Female
#> 123 170 combination Male
#> 124 15 combination Female
#> 125 182 combination Female
group_by()
group_by()
might be the most powerful function in tidyverse.
- tl;dr: it allows you to perform functions within specific subsets (groups) of the data.
# Notice we can chain some pipes together
pwt_sample %>%
# group by gender
group_by(gender) %>%
# Get me the first observation, by group.
slice(1)
#> # A tibble: 2 x 14
#> # Groups: gender [2]
#> id ttr relapse grp age gender race employment yearsSmoking
#> <int> <int> <int> <fct> <int> <fct> <fct> <fct> <int>
#> 1 39 5 1 combination 25 Female white other 12
#> 2 21 182 0 patchOnly 36 Male white ft 26
#> # i 5 more variables: levelSmoking <fct>, ageGroup2 <fct>, ageGroup4 <fct>,
#> # priorAttempts <int>, longestNoSmoke <int>
group_by()
Notice what would happen in the absence of group_by()
pwt_sample %>%
# Get me the first observation for each gender
slice(1) #Forgot to group_by()
#> id ttr relapse grp age gender race employment yearsSmoking
#> 1 21 182 0 patchOnly 36 Male white ft 26
#> levelSmoking ageGroup2 ageGroup4 priorAttempts longestNoSmoke
#> 1 heavy 21-49 35-49 0 0
Caveat: if you’re applying a group-specific function (that you need once), it’s generally advisable to “ungroup” (i.e. ungroup()
) the data when you’re done.
summarize()
summarize()
creates condensed summaries of the data, for whatever it is you want.
pwt_sample %>%
# How many observations are in the data?
dplyr::summarize(n = n())
#> n
#> 1 125
summarize()
# Note: works *wonderfully* with group_by()
pwt_sample %>%
group_by(gender) %>%
# Give me the max time to relapse observed in the data.
dplyr::summarize(maxttr = max(ttr, na.rm=T))
#> # A tibble: 2 x 2
#> gender maxttr
#> <fct> <int>
#> 1 Female 182
#> 2 Male 182
mutate()
mutate()
creates new columns while retaining original dimensions of the data (unlike summarize()
).
pwt_sample %>%
# Convert rgdpna from real GDP in millions to real GDP in billions
mutate(ttrmonths = ttr/30)
#> id ttr relapse grp age gender race employment yearsSmoking
#> 1 21 182 0 patchOnly 36 Male white ft 26
#> 2 113 14 1 patchOnly 41 Male white other 27
#> 3 39 5 1 combination 25 Female white other 12
#> 4 80 16 1 combination 54 Male white ft 39
#> 5 87 0 1 combination 45 Male white other 30
#> 6 29 182 0 combination 43 Male hispanic ft 30
#> 7 16 14 1 patchOnly 66 Male black pt 54
#> 8 35 77 1 patchOnly 78 Female black other 56
#> 9 54 2 1 patchOnly 40 Female black ft 25
#> 10 70 0 1 patchOnly 38 Male black ft 23
#> 11 84 12 1 patchOnly 64 Female black other 30
#> 12 85 182 0 combination 51 Male black ft 35
#> 13 25 21 1 patchOnly 37 Female white pt 23
#> 14 47 3 1 patchOnly 65 Male white other 50
#> 15 59 170 1 patchOnly 42 Female white ft 30
#> 16 63 25 1 patchOnly 40 Male white other 22
#> 17 102 4 1 patchOnly 65 Female white other 50
#> 18 3 182 0 combination 52 Female white other 19
#> 19 15 140 1 combination 43 Male white ft 27
#> 20 32 63 1 combination 34 Female white ft 18
#> 21 79 15 1 combination 46 Female white other 26
#> 22 90 140 1 combination 60 Male white ft 42
#> 23 110 110 1 combination 49 Female white other 35
#> 24 127 182 0 combination 58 Female white ft 38
#> 25 119 0 1 patchOnly 48 Male hispanic other 33
#> 26 33 182 0 combination 54 Female hispanic other 38
#> 27 62 15 1 patchOnly 49 Female black pt 35
#> 28 67 182 0 patchOnly 55 Male black ft 39
#> 29 112 4 1 patchOnly 33 Male black ft 12
#> 30 60 56 1 combination 49 Female black ft 37
#> 31 93 2 1 combination 46 Male black ft 25
#> 32 122 80 1 patchOnly 34 Female other ft 18
#> 33 130 182 0 combination 46 Female white pt 25
#> 34 19 56 1 patchOnly 52 Male black other 25
#> 35 65 0 1 patchOnly 52 Female black ft 30
#> 36 4 14 1 patchOnly 48 Female white ft 34
#> 37 20 14 1 patchOnly 48 Female white ft 39
#> 38 22 28 1 patchOnly 56 Female white ft 36
#> 39 26 182 0 patchOnly 58 Male white other 41
#> 40 43 6 1 patchOnly 60 Male white ft 50
#> 41 107 182 0 patchOnly 55 Male white ft 40
#> 42 111 14 1 patchOnly 43 Female white ft 28
#> 43 117 15 1 patchOnly 55 Female white ft 42
#> 44 8 182 0 combination 70 Female white other 52
#> 45 12 75 1 combination 62 Female white pt 46
#> 46 13 30 1 combination 86 Male white other 40
#> 47 23 4 1 combination 52 Male white ft 33
#> 48 30 56 1 combination 27 Female white ft 13
#> 49 34 182 0 combination 52 Male white ft 35
#> 50 36 182 0 combination 62 Male white ft 35
#> 51 38 5 1 combination 57 Female white ft 47
#> 52 44 8 1 combination 40 Male white ft 27
#> 53 61 140 1 combination 49 Male white other 14
#> 54 68 20 1 combination 68 Female white other 53
#> 55 69 63 1 combination 47 Female white pt 15
#> 56 82 30 1 combination 46 Female white ft 34
#> 57 97 8 1 combination 55 Male white pt 40
#> 58 106 50 1 combination 29 Male white ft 12
#> 59 114 14 1 combination 64 Female white pt 45
#> 60 120 0 1 combination 52 Male white ft 38
#> 61 40 84 1 patchOnly 38 Female hispanic ft 10
#> 62 49 0 1 patchOnly 35 Male hispanic other 20
#> 63 125 105 1 patchOnly 50 Female hispanic other 33
#> 64 123 182 0 combination 63 Female hispanic other 45
#> 65 7 182 0 patchOnly 58 Male black ft 40
#> 66 9 182 0 patchOnly 56 Male black ft 16
#> 67 37 7 1 patchOnly 44 Female black other 31
#> 68 52 182 0 patchOnly 34 Female black ft 20
#> 69 86 0 1 patchOnly 49 Female black ft 36
#> 70 94 8 1 patchOnly 43 Female black other 24
#> 71 104 1 1 patchOnly 39 Female black ft 27
#> 72 42 182 0 combination 41 Female black ft 20
#> 73 75 12 1 combination 46 Female black other 30
#> 74 100 182 0 combination 46 Female black ft 21
#> 75 1 49 1 patchOnly 53 Female white pt 18
#> 76 6 182 0 patchOnly 58 Male white ft 46
#> 77 11 182 0 patchOnly 40 Female white ft 25
#> 78 24 2 1 patchOnly 62 Female white other 49
#> 79 27 182 0 patchOnly 53 Male white ft 38
#> 80 31 56 1 patchOnly 44 Female white ft 35
#> 81 56 182 0 patchOnly 64 Female white other 47
#> 82 72 0 1 patchOnly 50 Female white ft 30
#> 83 78 28 1 patchOnly 47 Male white ft 22
#> 84 81 155 1 patchOnly 49 Female white pt 35
#> 85 83 2 1 patchOnly 51 Male white other 36
#> 86 88 0 1 patchOnly 41 Female white other 26
#> 87 91 0 1 patchOnly 22 Female white pt 10
#> 88 95 1 1 patchOnly 22 Female white pt 9
#> 89 99 140 1 patchOnly 34 Female white ft 18
#> 90 101 1 1 patchOnly 48 Female white ft 30
#> 91 115 28 1 patchOnly 30 Female white ft 12
#> 92 116 1 1 patchOnly 31 Male white ft 19
#> 93 124 182 0 patchOnly 44 Female white ft 30
#> 94 126 77 1 patchOnly 56 Female white ft 44
#> 95 129 56 1 patchOnly 29 Female white ft 10
#> 96 2 182 0 combination 69 Male white other 50
#> 97 5 182 0 combination 41 Male white ft 29
#> 98 14 182 0 combination 52 Female white other 33
#> 99 18 182 0 combination 52 Female white ft 30
#> 100 51 182 0 combination 53 Male white ft 30
#> 101 57 21 1 combination 31 Female white ft 15
#> 102 64 60 1 combination 70 Female white other 54
#> 103 76 0 1 combination 43 Male white other 29
#> 104 92 182 0 combination 58 Female white ft 36
#> 105 98 65 1 combination 48 Female white ft 34
#> 106 103 182 0 combination 72 Male white other 55
#> 107 105 182 0 combination 61 Female white other 45
#> 108 121 182 0 combination 53 Female white ft 41
#> 109 96 182 0 combination 63 Female other other 52
#> 110 46 2 1 combination 40 Female hispanic pt 23
#> 111 28 40 1 patchOnly 39 Female black ft 20
#> 112 53 100 1 patchOnly 60 Female black ft 20
#> 113 55 1 1 patchOnly 54 Female black other 34
#> 114 73 45 1 patchOnly 68 Female black ft 25
#> 115 77 14 1 patchOnly 54 Female black ft 22
#> 116 109 30 1 patchOnly 51 Male black other 30
#> 117 17 42 1 combination 39 Female black ft 23
#> 118 45 2 1 combination 47 Female black ft 33
#> 119 48 182 0 combination 33 Female black ft 10
#> 120 50 60 1 combination 27 Female black ft 11
#> 121 74 10 1 combination 45 Female black ft 32
#> 122 89 0 1 combination 36 Female black ft 20
#> 123 108 170 1 combination 39 Male black ft 20
#> 124 118 15 1 combination 56 Female black other 39
#> 125 128 182 0 combination 50 Female black pt 30
#> levelSmoking ageGroup2 ageGroup4 priorAttempts longestNoSmoke ttrmonths
#> 1 heavy 21-49 35-49 0 0 6.06666667
#> 2 heavy 21-49 35-49 3 90 0.46666667
#> 3 heavy 21-49 21-34 3 21 0.16666667
#> 4 heavy 50+ 50-64 0 0 0.53333333
#> 5 heavy 21-49 35-49 0 0 0.00000000
#> 6 heavy 21-49 35-49 2 1825 6.06666667
#> 7 heavy 50+ 65+ 0 0 0.46666667
#> 8 light 50+ 65+ 10 15 2.56666667
#> 9 heavy 21-49 35-49 4 7 0.06666667
#> 10 light 21-49 35-49 10 90 0.00000000
#> 11 heavy 50+ 50-64 12 365 0.40000000
#> 12 heavy 50+ 50-64 1 7 6.06666667
#> 13 light 21-49 35-49 5 1095 0.70000000
#> 14 heavy 50+ 65+ 6 180 0.10000000
#> 15 heavy 21-49 35-49 5 240 5.66666667
#> 16 heavy 21-49 35-49 2 2 0.83333333
#> 17 light 50+ 65+ 1 2 0.13333333
#> 18 light 50+ 50-64 1 7 6.06666667
#> 19 heavy 21-49 35-49 5 120 4.66666667
#> 20 heavy 21-49 21-34 8 90 2.10000000
#> 21 heavy 21-49 35-49 2 14 0.50000000
#> 22 light 50+ 50-64 6 2920 4.66666667
#> 23 heavy 21-49 35-49 10 60 3.66666667
#> 24 heavy 50+ 50-64 1 0 6.06666667
#> 25 heavy 21-49 35-49 4 120 0.00000000
#> 26 light 50+ 50-64 30 3650 6.06666667
#> 27 light 21-49 35-49 4 540 0.50000000
#> 28 heavy 50+ 50-64 1 28 6.06666667
#> 29 light 21-49 21-34 1 730 0.13333333
#> 30 heavy 21-49 35-49 3 2920 1.86666667
#> 31 light 21-49 35-49 2 1095 0.06666667
#> 32 heavy 21-49 21-34 1 120 2.66666667
#> 33 heavy 21-49 35-49 3 365 6.06666667
#> 34 light 50+ 50-64 10 7 1.86666667
#> 35 heavy 50+ 50-64 2 42 0.00000000
#> 36 heavy 21-49 35-49 6 2555 0.46666667
#> 37 heavy 21-49 35-49 1 2 0.46666667
#> 38 heavy 50+ 50-64 5 180 0.93333333
#> 39 light 50+ 50-64 100 180 6.06666667
#> 40 heavy 50+ 50-64 6 30 0.20000000
#> 41 heavy 50+ 50-64 1 180 6.06666667
#> 42 heavy 21-49 35-49 4 365 0.46666667
#> 43 heavy 50+ 50-64 2 60 0.50000000
#> 44 heavy 50+ 65+ 3 1 6.06666667
#> 45 heavy 50+ 50-64 8 1095 2.50000000
#> 46 light 50+ 65+ 4 2190 1.00000000
#> 47 heavy 50+ 50-64 2 2 0.13333333
#> 48 light 21-49 21-34 1 7 1.86666667
#> 49 heavy 50+ 50-64 3 730 6.06666667
#> 50 heavy 50+ 50-64 1 2555 6.06666667
#> 51 heavy 50+ 50-64 0 0 0.16666667
#> 52 heavy 21-49 35-49 2 90 0.26666667
#> 53 heavy 21-49 35-49 0 0 4.66666667
#> 54 heavy 50+ 65+ 0 0 0.66666667
#> 55 heavy 21-49 35-49 1 90 2.10000000
#> 56 light 21-49 35-49 20 90 1.00000000
#> 57 heavy 50+ 50-64 6 913 0.26666667
#> 58 heavy 21-49 21-34 0 0 1.66666667
#> 59 heavy 50+ 50-64 4 60 0.46666667
#> 60 heavy 50+ 50-64 4 7 0.00000000
#> 61 heavy 21-49 35-49 2 3 2.80000000
#> 62 light 21-49 35-49 2 7 0.00000000
#> 63 heavy 50+ 50-64 2 30 3.50000000
#> 64 heavy 50+ 50-64 1 28 6.06666667
#> 65 heavy 50+ 50-64 0 0 6.06666667
#> 66 light 50+ 50-64 3 700 6.06666667
#> 67 heavy 21-49 35-49 2 2 0.23333333
#> 68 heavy 21-49 21-34 5 90 6.06666667
#> 69 heavy 21-49 35-49 1 6205 0.00000000
#> 70 light 21-49 35-49 2 3 0.26666667
#> 71 heavy 21-49 35-49 1 1 0.03333333
#> 72 light 21-49 35-49 6 270 6.06666667
#> 73 light 21-49 35-49 2 55 0.40000000
#> 74 light 21-49 35-49 1 1095 6.06666667
#> 75 heavy 50+ 50-64 6 3650 1.63333333
#> 76 heavy 50+ 50-64 10 14 6.06666667
#> 77 heavy 21-49 35-49 1 2920 6.06666667
#> 78 heavy 50+ 50-64 1 8 0.06666667
#> 79 heavy 50+ 50-64 5 1095 6.06666667
#> 80 heavy 21-49 35-49 8 180 1.86666667
#> 81 heavy 50+ 50-64 4 365 6.06666667
#> 82 heavy 50+ 50-64 1 90 0.00000000
#> 83 heavy 21-49 35-49 3 4 0.93333333
#> 84 heavy 21-49 35-49 1 1095 5.16666667
#> 85 heavy 50+ 50-64 1 5 0.06666667
#> 86 heavy 21-49 35-49 1 7 0.00000000
#> 87 heavy 21-49 21-34 3 2 0.00000000
#> 88 heavy 21-49 21-34 2 3 0.03333333
#> 89 light 21-49 21-34 2 2190 4.66666667
#> 90 light 21-49 35-49 10 1095 0.03333333
#> 91 light 21-49 21-34 0 0 0.93333333
#> 92 heavy 21-49 21-34 10 120 0.03333333
#> 93 heavy 21-49 35-49 1 3 6.06666667
#> 94 heavy 50+ 50-64 4 1095 2.56666667
#> 95 light 21-49 21-34 8 240 1.86666667
#> 96 heavy 50+ 65+ 6 5475 6.06666667
#> 97 heavy 21-49 35-49 20 180 6.06666667
#> 98 heavy 50+ 50-64 5 270 6.06666667
#> 99 heavy 50+ 50-64 3 1095 6.06666667
#> 100 heavy 50+ 50-64 3 3285 6.06666667
#> 101 heavy 21-49 21-34 4 90 0.70000000
#> 102 heavy 50+ 65+ 1 90 2.00000000
#> 103 heavy 21-49 35-49 12 6 0.00000000
#> 104 light 50+ 50-64 2 90 6.06666667
#> 105 heavy 21-49 35-49 1000 548 2.16666667
#> 106 light 50+ 65+ 30 30 6.06666667
#> 107 heavy 50+ 50-64 1 60 6.06666667
#> 108 heavy 50+ 50-64 3 60 6.06666667
#> 109 heavy 50+ 50-64 2 180 6.06666667
#> 110 heavy 21-49 35-49 2 3 0.06666667
#> 111 light 21-49 35-49 3 210 1.33333333
#> 112 light 50+ 50-64 5 1825 3.33333333
#> 113 light 50+ 50-64 2 365 0.03333333
#> 114 light 50+ 65+ 2 7 1.50000000
#> 115 heavy 50+ 50-64 5 10 0.46666667
#> 116 heavy 50+ 50-64 2 30 1.00000000
#> 117 light 21-49 35-49 2 1825 1.40000000
#> 118 light 21-49 35-49 4 365 0.06666667
#> 119 light 21-49 21-34 2 1 6.06666667
#> 120 light 21-49 21-34 2 14 2.00000000
#> 121 heavy 21-49 35-49 1 75 0.33333333
#> 122 heavy 21-49 35-49 1 270 0.00000000
#> 123 light 21-49 35-49 3 180 5.66666667
#> 124 heavy 50+ 50-64 3 7 0.50000000
#> 125 heavy 50+ 50-64 0 0 6.06666667
mutate()
Note: this also works well with group_by()
pwt_sample %>%
group_by(gender) %>%
# divide ttr over the gender's max, for some reason.
mutate(ttrprop = ttr/max(ttr, na.rm=T)) |>
select(ttrprop) |>
head()
#> # A tibble: 6 x 2
#> # Groups: gender [2]
#> gender ttrprop
#> <fct> <dbl>
#> 1 Male 1
#> 2 Male 0.0769
#> 3 Female 0.0275
#> 4 Male 0.0879
#> 5 Male 0
#> 6 Male 1
filter()
filter()
is a great diagnostic tool for subsetting your data to look at specific observations.
- Notice the use of double-equal signs (
==
) for thefilter()
functions.
pwt_sample %>%
filter(race== "black") |>
head()
#> id ttr relapse grp age gender race employment yearsSmoking
#> 1 16 14 1 patchOnly 66 Male black pt 54
#> 2 35 77 1 patchOnly 78 Female black other 56
#> 3 54 2 1 patchOnly 40 Female black ft 25
#> 4 70 0 1 patchOnly 38 Male black ft 23
#> 5 84 12 1 patchOnly 64 Female black other 30
#> 6 85 182 0 combination 51 Male black ft 35
#> levelSmoking ageGroup2 ageGroup4 priorAttempts longestNoSmoke
#> 1 heavy 50+ 65+ 0 0
#> 2 light 50+ 65+ 10 15
#> 3 heavy 21-49 35-49 4 7
#> 4 light 21-49 35-49 10 90
#> 5 heavy 50+ 50-64 12 365
#> 6 heavy 50+ 50-64 1 7
Don’t Forget to Assign
When you’re done, don’t forget to assign what you’ve done to an object.
pwt_sample %>%
group_by(gender) %>%
# divide ttr over the gender's max, for some reason.
mutate(ttrprop = ttr/max(ttr, na.rm=T)) |>
select(ttrprop) |>
head() -> NewObjectName
tidyverse’s greatest feature is the ability to see what you’re coding in real time before commiting/overwrting data.