Programming in R

class: center, middle, inverse, title-slide
<style>
 pre {
 background-color: lightyellow;
 white-space: pre-wrap;
 line-height: 100%;
 }
</style>

## Programming in R: a brief introduction
#### Jeremy Mack
#### Lehigh University - Digital Scholarship Team
<img src="./images/notes.png" alt="RStudio" height=242/>

---
class: center, middle, inverse, title-slide

## Programming in R: a brief introduction
#### Jeremy Mack
#### Lehigh University - Digital Scholarship Team
<img src="./images/rlang.png" alt="RStudio" height=150/> &nbsp; <img src="./images/rstudio5.png" alt="RStudio" height=150/> &nbsp; <img src="./images/tidyverse5.png" alt="RStudio" height=150/>

---
### About this presentation

* This course is a **brief introduction** into R.

--
 
 * It is targeted at people that have little to no experience
   progamming.

--
   
 * It could be useful for people who learned R some time ago and
   forgot it, or who are not familiar with modern R programming
   (`tidyverse`).

* It focuses on a bit of history, an introduction to the R 
   environment, and some hands experience with **data wrangling**.

* Slides are available on [Lehigh's Research Computing site](https://confluence.cc.lehigh.edu/display/hpc/Seminars) and Github ([slides](https://jeremymack-lu.github.io/rprog/) and [raw code](https://github.com/jeremymack-LU/rprog))
   
---

### Structure of the presentation

The presentation is split into seven topics:

* [**Topic 1:**](<https://jeremymack-lu.github.io/rprog/#10>) What is R? Why use it?
 
 * [**Topic 2:**](<https://jeremymack-lu.github.io/rprog/#22>) What is RStudio? Why use it?
 
 * [**Topic 3:**](<https://jeremymack-lu.github.io/rprog/#32>) Getting started with R and RStudio
 
 * [**Topic 4:**](<https://jeremymack-lu.github.io/rprog/#58>) Objects in R
 
 * [**Topic 5:**](<https://jeremymack-lu.github.io/rprog/#82>) Functions in R
 
 * [**Topic 6:**](<https://jeremymack-lu.github.io/rprog/#100>) Data and Data Wrangling
 
 * [**Topic 7:**](<https://jeremymack-lu.github.io/rprog/#163>) Extras - RStudio projects, Other things to do in R, and Resources

---
#### Programming in R

[<center><img src="./images/r_rollercoaster.png" alt="RStudio" height=350/></center>](https://github.com/allisonhorst/stats-illustrations)

---
class: center, middle, inverse

#### Topic 1: What is R? Why use it?
 
 
 
 
 
 
---

#### Topic 1: What is R? Why use it?

.pull-left[
<center><img src="./images/R_logo.png" alt="R logo" height=230</></center>
]

.pull-right[
- R is a **programming language** ([one of many](<https://www.tiobe.com/tiobe-index/>)) and an **environment** for statistical computing.
{{content}}
]

- Developed by Ross Ihaka and Robert Gentleman 29 years ago; now maintained by a core team supported by the R Foundation.
{{content}}

- Dialect of the S language (S-Plus)
{{content}}
   
---

#### Topic 1: What is R? Why use it?

.pull-left[
<center><img src="./images/R_logo.png" alt="R logo" height=230</></center>
]

.pull-right[
- R is a **programming language** ([one of many](<https://www.tiobe.com/tiobe-index/>)) and an **environment** for statistical computing.
 
- Developed by Ross Ihaka and Robert Gentleman 29 years ago; now maintained by a core team supported by the R Foundation.

- Dialect of the S language (S-Plus)
]

---

#### Topic 1: What is R? Why use it?

.pull-left[
<center><img src="./images/R_logo.png" alt="R logo" height=230</></center>
]

.pull-right[
- **Free!**
{{content}}
]

- Rich data analysis and visualization options
{{content}}

- Available on most platforms/OS
{{content}}
--

- **Very active development community**
 + CRAN: The Comprehensive R Archive Network
 + User contributed packages (> 18,000)
 {{content}}

- **Reproducibility**

---
#### Topic 1: What is R? Why use it?

<img src="./images/reproducibility.png" alt="R logo" height=265</> <img src="./images/reproducibility2.png" alt="R logo" height=265</>
##### - 2019 report by The National Academies of Sciences, Engineering, and Medicine

---
#### Topic 1: What is R? Why use it?

<img src="./images/reproducibility3.png" alt="R logo" height=265</> &nbsp; &nbsp; &nbsp; <img src="./images/reproducibility4b.png" alt="R logo" height=265</>
##### - 2018 series featured in Nature

######  - "There's nothing more reproducible than code. And unfortunately, there are few things LESS reproducible than pointing and clicking, then trying to tell someone how you did it."

---
class: center, middle, inverse

#### Topic 2: What is RStudio? Why use it?
 
 
 
 
 
 
---

#### Topic 2: What is RStudio? Why use it?

.pull-left[
![RStudio logo](https://rstudio.com/wp-content/uploads/2018/10/RStudio-Logo.png)
]

.pull-right[
* RStudio is a **company** that develops **free and open tools** for R, and enterprise-ready professional products.
{{content}}
]

* **Integrated Development Environment** (IDE), or a front end platform
to run R.
{{content}}

---

#### Topic 2: What is RStudio? Why use it?

.pull-left[
![RStudio logo](https://rstudio.com/wp-content/uploads/2018/10/RStudio-Logo.png)
]

.pull-right[
* RStudio is a **company** that develops **free and open tools** for R, and enterprise-ready professional products.

* **Integrated Development Environment** (IDE), or a front end platform
to run R.
]

---

#### Topic 2: What is RStudio? Why use it?

.pull-left[
![RStudio logo](https://rstudio.com/wp-content/uploads/2018/10/RStudio-Logo.png)
]

.pull-right[
* Like R, it's **free**!
{{content}}
]

* It can reduce the learning curve of R, by creating **organization**.
{{content}}

* Integrates nicely with other R features/applications:
    + Projects
    + Version control
    + R Markdown
    + ShinyApps

---

#### Topic 2: What is RStudio? Why use it?

How do R and RStudio work together? Consider a car analogy.

.pull-left[
**RStudio - the body**
 - RStudio provides a frame that keeps things organized and finishings that make it visualling appealing.

**R - the engine**
 - R runs things under the hood - it's the enginge that allows the car to drive.
]

.pull-right[
<img src="./images/car_engine.jpg" alt="R logo" width=400</>
]

---
class: inverse

#### Review - R and RStudio:

* **R** is a programming language built for statistical computing ("Engine").

+ It's open source and it's free.

* **RStudio** is an integrated development envrionment that makes working with R easier ("Body").

+ It's developed by a company, but it's also free.

---
class: center, middle, inverse

#### Topic 3: Getting started with R and RStudio
 
 
 
 
 
 
---

#### Topic 3: Getting started with R and RStudio

How do I get R and RStudio?

* Download and local install:

+ You can download R on its own through the [R Project website](https://www.r-project.org).
   
 + You can download RStudio, including R,
   at the [RStudio website](https://rstudio.com/products/rstudio/download/).
   
--

* R and RStudio at Lehigh:

+ Both R and RStudio are available on [LUapps](https://luapps.lehigh.edu).
 
 + LUapps can be accessed both on campus and off-campus (over VPN).
 
<center><img src="./images/luapps.png" alt="LUApps website" height=200/></center>

---

#### Topic 3: Getting started with R and RStudio

First, let's explore RStudio.

[<img src="./images/rstudio.png" align="center" alt="RStudio" height=500/>](https://luapps.lehigh.edu)

---

#### Topic 3: Getting started with R and RStudio

First, let's explore RStudio.

---

#### Topic 3: Getting started with R and RStudio
<style>
 pre {
 background-color: lightyellow;
 white-space: pre-wrap;
 line-height: 100%;
 }
</style>

Next, let's explore R - the engine under the hood.

In true computer science fashion, let's first try typing:

.tiny[

```r
print("Hello world!")
```
]

What happend?

.tiny[

```
[1] "Hello world!"
```
]

---

#### Topic 3: Getting started with R and RStudio

Two things to note:

- We didn't just get "Hello world!", we also got `[1]`. This is R's way
of printing to the screen; it's telling us the position we're at.

- We didn't need to put anything at the end of the line, we just hit return.

---

#### Topic 3: Getting started with R and RStudio

Now, let's try three things...

Try capitalizing `Print(...)`:
 
.tiny[

```r
Print("Hello world!")
```
]

Try putting a space between `print` and `("Hello world!")`:

.tiny[

```r
print ("Hello world!")
```
]

Try just entering `"Hello world!"`:

.tiny[

```r
"Hello world!"
```
]

What happened?

--
 
.tiny[

```
Error in Print("Hello world!"): could not find function "Print"
```

```
[1] "Hello world!"
```

```
[1] "Hello world!"
```
]

---

#### Topic 3: Getting started with R and RStudio

Three things you just learned:

- R is **case-sensitive**.
 
 - R does not care about **whitespace**.
 
 - R will **print** results by default.

---

#### Topic 3: Getting started with R and RStudio

You can also use R as a calculator. Let's try the following:

.tiny[

```r
2 + 2
```
]
--
.tiny[

```
[1] 4
```
]
--
.tiny[

```r
4 * 2
```
]
--
.tiny[

```
[1] 8
```
]
--
.tiny[

```r
8 / 3
```
]
--
.tiny[

```
[1] 2.666667
```
]
--
.tiny[

```r
exp(log(8)-log(3))
```
]
--
.tiny[

```
[1] 2.666667
```
]

---
#### Topic 3: Getting started with R and RStudio
#### Assignments

We often want to save the results of our calculations, rather than print them to the screen. To do so, 
we'll use the **assignment operator**, `<-`

Here's an example:

.tiny[

```r
x <- log(8)
y <- log(3)
```
]

--
Now we can redo our last calculation using the assignments:

.tiny[

```r
exp(x-y)
```

```
[1] 2.666667
```
]

*Shortcut in RStudio: Option + - (Mac OS), Alt + - (Windows OS)

---
#### Topic 3: Getting started with R and RStudio
#### Concatenation

We will often want to work on sequences of values, rather than specific values.

To do so, we'll use the **concatenation operator**, `c(...)`

Here's an example:

.tiny[

```r
n <- c(2, 3, 5, 8, 13, 21, 34, 55)
```
]

We can now apply opertions across the entire vector.

For example:

.tiny[

```r
n * 2
```

```
[1]   4   6  10  16  26  42  68 110
```
]

---
#### Topic 3: Getting started with R and RStudio
#### Logicals

It can be useful to know whether our values meet certain conditions.

In addition to **character values** (which we saw when we called `print("Hello world!")`), R also allows **logical values**, or `TRUE` and `FALSE`.

For example, we can check which numbers in our "n" vector are double digit:

.tiny[

```r
n
```

```
[1]  2  3  5  8 13 21 34 55
```

```r
is_double_digit <- n > 9
is_double_digit
```

```
[1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
```
]

---
class: inverse

#### Review - Getting started with R and RStudio:
* RStudio has four main windows to keep things **organized**.

* R is **case-sensitive**.

* R is a calculator.

* R can be used to store objects (**assignments**) that can be used with other functions, or compared to other objects.

---
class: center, middle, inverse

#### Topic 4: Objects in R

---
#### Topic 4: Objects in R

R largely revolves around two things: **objects** and **functions**.

<center>Define objects > Apply functions > Repeat!</center>

For example, we can define a simple object called "n":

.tiny[

```r
n <- c(2, 3, 5, 8, 13, 21, 34, 55)
```
]

We can then apply a function to our object. Lets say we're interested in the average, so we'll apply the mean function:

.tiny[

```r
avg.n <- mean(n)
avg.n
```

```
[1] 17.625
```
]

---
#### Topic 4: Objects in R

* Objects come in many different shapes and sizes - like a number, a dataset, or the results of a statistical test.

* Objects are essentially *data* that have a particular **type** and **structure**.

* There are six basic **types** (classes) of data in R:
 1. Logical
 2. Double
 3. Integer
 4. Complex
 5. Character
 6. Factors - special case of Integer with Character labels

---
#### Topic 4: Objects in R

* Six basic **types** (classes) of data in R:

.panelset[

.panel[.panel-name[Logical]

Objects often created via comparison(s).

.tiny2[

```r
x <- 1; y <- 2 # Create sample values x and y
z <- x > y # Is x larger than y? 
z # Print the logical value 
```

```
[1] FALSE
```
]

.tiny2[

```r
typeof(z)               # Print the data type of z
```

```
[1] "logical"
```
]
]

.panel[.panel-name[Double*]

Numbers, often approximated; default data type in R*.

.tiny2[

```r
x <- 10.5 # Define object x
x # Print x
```

```
[1] 10.5
```
]

.tiny2[

```r
typeof(x)               # Print the data type of x
```

```
[1] "double"
```
]
]

.panel[.panel-name[Integer]

Whole numbers; a number that is not a fraction.

.tiny2[

```r
x <- 10 # Define object x
x # Print x
```

```
[1] 10
```
]

.tiny2[

```r
typeof(x)                               # Print the data type of x
```

```
[1] "double"
```
]

.tiny2[

```r
y <- as.integer(10) # Declare as integer
z <- 10L # Declare as integer by appending with "L"
paste(typeof(x),typeof(y),typeof(z)) # Print the data type of x
```

```
[1] "double integer integer"
```
]
]

.panel[.panel-name[Complex]

Any number that can be written as a + bi, where *i* is the imaginary unit and a and b
are real numbers.

.tiny2[

```r
x <- 1 + 2i # Create a complex number x
x # Print the value of x 
```

```
[1] 1+2i
```
]

.tiny2[

```r
typeof(x)               # Print the data type of x
```

```
[1] "complex"
```
]
]

.panel[.panel-name[Character]

Used to represent string values in R.

.tiny2[

```r
name <- "Jeremy Mack" # Assign character string
name # Print the character string 
```

```
[1] "Jeremy Mack"
```
]

.tiny2[

```r
x <- as.character(3.14) # Declare number as character string
y <- "3.14" # Declare as character with " "
c(x,y) # Print the character strings
```

```
[1] "3.14" "3.14"
```
]

.tiny2[

```r
c(typeof(x),typeof(y))        # Print the data type of x
```

```
[1] "character" "character"
```
]
]

.panel[.panel-name[Factor]

Fixed set of possible values (categorical variables); displayed as characters stored as integers.

.tiny[

```r
x <- c("A","B","C","D") # Create a vector of factor levels
x <- as.factor(x) # Declare as factor
x # Print the value of x 
```

```
[1] A B C D
Levels: A B C D
```
]

.tiny[

```r
typeof(x)                     # Print the data type of x
```

```
[1] "integer"
```
]

.tiny[

```r
str(x)                        # Print the structure of x
```

```
 Factor w/ 4 levels "A","B","C","D": 1 2 3 4
```
]
]

]

---
#### Topic 4: Objects in R

* Objects come in many different shapes and sizes - like a number, a dataset, or the results of a statistical test.

* Objects are essentially *data* that have a particular **type** and **structure**.

* There are six basic **types** (classes) of data in R:
 1. Logical
 2. Double
 3. Integer
 4. Complex
 5. Character
 6. Factors - special case of Integer with Character labels

---
#### Topic 4: Objects in R

* Objects come in many different shapes and sizes - like a number, a dataset, or the results of a statistical test.

* Objects are essentially *data* that have a particular **type** and **structure**.

* There are four basic **structures** of data in R:
 1. Scalar
 2. Vector
 3. Matrix
 4. Data frames (and Tibbles)
 
---
#### Topic 4: Objects in R

* Four basic **structures** of data in R:

.pull-left[
![R objects](./images/objects.png)
]

.pull-right[
* Scalar
   
* Vector

* Matrix

* Data frames (and Tibbles)
]

---
#### Topic 4: Objects in R

* Four basic **structures** of data in R:

.pull-left[
![R objects](./images/objects2.png)
]

.pull-right[
**Scalar objects:**
1. Hold only one value at a time.
 
2. Can be used to build more complex objects.

]

---
#### Topic 4: Objects in R

* Four basic **structures** of data in R:

.pull-left[
![R objects](./images/objects2.png)
]

.tiny6.pull-right[
**Scalar objects:**

```r
x <- 10.5 
x
```

```
[1] 10.5
```

```r
str(x)
```

```
 num 10.5
```

]

---
#### Topic 4: Objects in R

* Four basic **structures** of data in R:

.pull-left[
![R objects](./images/objects3.png)
]

.pull-right[
**Vector objects:**
1. Hold several values stored as a single object.
 
2. Can be either numeric or character (not both!).

]

---
#### Topic 4: Objects in R

* Four basic **structures** of data in R:

.pull-left[
![R objects](./images/objects3.png)
]

.tiny6.pull-right[
**Vector objects:**

```r
n <- c(2,3,5,8,13,21,34,55) 
n
```

```
[1]  2  3  5  8 13 21 34 55
```

```r
str(n)
```

```
 num [1:8] 2 3 5 8 13 21 34 55
```

]

---
#### Topic 4: Objects in R

* Four basic **structures** of data in R:

.pull-left[
![R objects](./images/objects3.png)
]

.tiny6.pull-right[
**Vector objects:**

```r
n <- c(2,3,5,8,13,21,34,"55") 
n
```

```
[1] "2"  "3"  "5"  "8"  "13" "21" "34" "55"
```

```r
str(n)
```

```
 chr [1:8] "2" "3" "5" "8" "13" "21" "34" "55"
```

]

---
#### Topic 4: Objects in R

* Four basic **structures** of data in R:

.pull-left[
![R objects](./images/objects4.png)
]

.pull-right[
**Matrix objects:**
1. Large data structure.
 
2. Has 2-dimensions, representing its height (rows) and width (columns).
 
3. Can be either numeric or character (not both!).

] 
---
#### Topic 4: Objects in R

* Four basic **structures** of data in R:

.pull-left[
![R objects](./images/objects4.png)
]

.tiny6.pull-right[
**Matrix objects:**

```r
x <- 1:5
y <- 6:10
z <- 11:15
m <- cbind(x,y,z)
class(m)
```

```
[1] "matrix" "array" 
```

```r
str(m)
```

```
 int [1:5, 1:3] 1 2 3 4 5 6 7 8 9 10 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:3] "x" "y" "z"
```

] 
---
#### Topic 4: Objects in R

* Four basic **structures** of data in R:

.pull-left[
![R objects](./images/objects4.png)
]

.pull-right[
**Data frame objects:**
1. Large data structure.
 
2. Has 2-dimensions, representing its height (rows) and width (columns).
 
3. Can be a mix of data types.

] 
---
#### Topic 4: Objects in R

* Four basic **structures** of data in R:

.pull-left[
![R objects](./images/objects4.png)
]

.tiny6.pull-right[
**Data frame objects:**

```r
survey <- data.frame(
"id" = c(1,2,3,4,5),
"sex" = c("m","m","m","f","f"),
"age" = c(99,46,23,54,23))
class(survey)
```

```
[1] "data.frame"
```

```r
str(survey)
```

```
'data.frame':	5 obs. of  3 variables:
 $ id : num  1 2 3 4 5
 $ sex: chr  "m" "m" "m" "f" ...
 $ age: num  99 46 23 54 23
```

]

---
#### Topic 4: Objects in R

* Four basic **structures** of data in R:

.pull-left[
![R objects](./images/objects4.png)
]

.pull-right[
**Tibble objects:**
1. Large data structures.
 
2. Has 2-dimensions, representing its height (rows) and width (columns).
 
3. Can be a mix of data types.
{{content}}
]

4. "Lazy data frames"
{{content}}
  * Do less and complain more.

---
#### Topic 4: Objects in R

* Four basic **structures** of data in R:

.pull-left[
![R objects](./images/objects4.png)
]

.tiny6.pull-right[
**Tibble objects:**

```r
pacman::p_load(tibble)
survey <- tibble(
"id" = c(1,2,3,4,5),
"sex" = c("m","m","m","f","f"),
"age" = c(99,46,23,54,23))
class(survey)
```

```
[1] "tbl_df"     "tbl"        "data.frame"
```

```r
str(survey)
```

```
tibble [5 × 3] (S3: tbl_df/tbl/data.frame)
 $ id : num [1:5] 1 2 3 4 5
 $ sex: chr [1:5] "m" "m" "m" "f" ...
 $ age: num [1:5] 99 46 23 54 23
```

]

---
#### Topic 4: Objects in R

* Four basic **structures** of data in R:

---
class: inverse

#### Review - Objects in R:
* Define objects > Apply functions > Repeat!

* Objects are data that have a type and structure.

* There are six basic types of data:
   1. Logical
   2. Double - default data type in R
   3. Integer
   4. Complex
   5. Character
   6. Factors - special case of Integer with Character labels

* There are four basic structures:
   1. Scalar
   2. Vector
   3. Matrix
   4. Data frames (and Tibbles)

---
class: center, middle, inverse

#### Topic 5: Functions in R

---
#### Topic 5: Functions in R

R largely revolves around two things: **objects** and **functions**.

<center>Define objects > Apply functions > Repeat!</center>

For example, we can define a simple object called "n":

.tiny[

```r
n <- c(2, 3, 5, 8, 13, 21, 34, 55)
```
]

We can then apply a function to our object. Lets say we're interested in the average, so we'll apply the mean function:

.tiny[

```r
avg.n <- mean(n)
avg.n
```

```
[1] 17.625
```
]

---
#### Topic 5: Functions in R
* Functions are procedures that typically take one or more objects as **arguments** (i.e., inputs), does something with them, then returns a new object (i.e., result).

* Cooking analogy:

* Functions (recipe) + Arguments (ingredients) = Result (meal)

* There are two basic types of functions in R:
 1. User-defined functions

.tiny6[

```r
ages <- c(99,46,23,54,23)

age_mean <- function(x) {
 summation <- sum(x)
 summation / length(x)
}
age_mean(ages)
```

```
[1] 49
```
]

---
#### Topic 5: Functions in R

* Functions are procedures that typically take one or more objects as **arguments** (i.e., inputs), does something with them, then returns a new object (i.e., result).

* Cooking analogy:

* Functions (recipe) + Arguments (ingredients) = Result (meal)

* There are two basic types of functions in R:
 1. User-defined functions
 
 2. Built-in functions that are loaded via **Packages** (Community development!)

---
#### Topic 5: Functions (and Packages) in R
.pull-right5.footnote[
<img src="./images/Rpackages.png" width=200 alt="ggplot" align="right"</>
]

Packages in R:

* R packages are used to install built-in functions into the R Environment.

* In addition to functions, packages also include data sets, help documentation, and how-to examples (i.e., vignette).

* When you install R for the first time, you are installing [base R](<https://stat.ethz.ch/R-manual/R-devel/library/base/html/00Index.html>), which includes functions written by the original authors of the R language.

* Additional packages are developed by the [R community](<https://cran.case.edu>).

* How do we get packages loaded into R?

---
#### Topic 5: Functions (and Packages) in R

How do we get packages loaded into R?

* Two step process:
 1. Install the package - do this once
 
 2. Load the package into R - do this every time

.panelset[

.panel[.panel-name[Programmatically]

.tiny6[

```r
# Traditional methods
install.packages("ggplot2")    # Install - quotes are necessary
library("ggplot2")             # Load into R environment - quotes are optional
library(ggplot2)

# More efficient way
install.packages("pacman")     # Install pacman package
library(pacman)                # Load into R environment
p_load(ggplot2, dplyr)         # Use p_load function to install and load multiple packages
```
]

.panel[.panel-name[R Studio IDE]

]]]

---
#### Topic 5: Functions (and Packages) in R

* Tidyverse - collection of R packages for data science
 
 * Underlying design philosophy, grammar, and data structures
 
 * [Supported by RStudio](<https://www.tidyverse.org)

.right[
<img src="./images/tidyverse.png" alt="Tidyverse" height=300 </> &nbsp; <img src="./images/tidyverse2.png" alt="Tidyverse" height=300 </>
]

---
#### Topic 5: Functions (and Packages) in R

---
#### Topic 5: Functions (and Packages) in R

---
#### Topic 5: Functions (and Packages) in R

---
#### Topic 5: Functions (and Packages) in R

* Tidyverse packages can be installed and loaded individually:

.tiny6[

```r
# Install and load dplyr package (classic, two-step way)
install.packages("dplyr"); library(dplyr)
```
]

--
 
 * Or, in bulk, with the **tidyverse** package:

.tiny6[

```r
# Install and load tidyverse packages (classic, two-step way)
install.packages("tidyverse"); library(tidyverse)

# List tidyverse packages
tidyverse_packages(include_self=FALSE)
```

```
 [1] "broom"         "cli"           "crayon"        "dbplyr"       
 [5] "dplyr"         "dtplyr"        "forcats"       "googledrive"  
 [9] "googlesheets4" "ggplot2"       "haven"         "hms"          
[13] "httr"          "jsonlite"      "lubridate"     "magrittr"     
[17] "modelr"        "pillar"        "purrr"         "readr"        
[21] "readxl"        "reprex"        "rlang"         "rstudioapi"   
[25] "rvest"         "stringr"       "tibble"        "tidyr"        
[29] "xml2"         
```
]

---
class: inverse

#### Review - Functions (and Packages) in R:
* Define objects > Apply functions > Repeat!

* Functions are used to work with objects in R and are loaded via packages.

* Standard functions (**base R**) load automatically when R is opened.

* There is also a **large community** of users that develop packages for R (18,900+ and counting!).

* Tidyverse - collection of R packages for data science, supported by RStudio.

* Packages can be loaded both programmatically and with the RStudio IDE.

* Two step process:
  1. Install package
  
  2. Load package into library

---
class: center, middle, inverse

#### Topic 6: Data and Data Wrangling

---
#### Topic 6: Data and Data Wrangling

---
#### Topic 6: Data and Data Wrangling

---
#### Topic 6: Data and Data Wrangling

Basic steps to working with data in R:

* Check and/or set a working directory.

* Load data.
 
 * Wrangle data (Explore, Summarize, and Analyze)!

---
#### Topic 6: Data and Data Wrangling

Basic steps to working with data in R:

* Check and/or set a working directory.

.panelset[

.panel[.panel-name[Programmatically]

.tiny6[

```r
getwd()              # Prints current working directory
```

```
[1] "h:/"
```
]

.tiny6[

```r
setwd("h:/Lehigh")   # Sets path to working directory
getwd()              # Prints current working directory
```

```
[1] "h:/Lehigh"
```
]
]

.panel[.panel-name[RStudio IDE]

]
]

---
#### Topic 6: Data and Data Wrangling

Basic steps to working with data in R:

* Check and/or set a working directory.

* Load data.

---
#### Topic 6: Data and Data Wrangling

Basic steps to working with data in R:

* Check and/or set a working directory.

* Load data.

.panelset[

.panel[.panel-name[Programmatically]

.tiny6[

```r
# base package options (reads to a data frame)
read.table()   # Reads tabular data
read.csv()     # Reads comma separated files
read.delim()   # Reads tab separated files

# readr package options (reads to a tibble)
read_table()   # Reads tabular data
read_csv()     # Reads comma separated files
read_delim()   # Reads tab separated files

# readxl package options (reads to a tibble)
read_xlsx()    # Reads .xlsx files
read_xls()     # Reads .xls files
```
]

.panel[.panel-name[R Studio IDE]

]]]

---
#### Topic 6: Data and Data Wrangling

Basic steps to working with data in R:

* Check and/or set a working directory.

* Load data.
 
 * Wrangle data (Explore, Summarize, and Analyze)!

---
#### Topic 6: Data and Data Wrangling

Exploring data in R

* Let's use a dataset from the web to practice saving and loading data:
 
.tiny6[

```r
# Set url link for data download
url <- "https://raw.githubusercontent.com/jeremymack-LU/rprog/master/mpg.csv"

# Download file to working directory
download.file(url, "data/mpg.csv")

# Read data into R
mpg <- read.csv("data/mpg.csv")
```

]

---
#### Topic 6: Data and Data Wrangling

Exploring data in R
 
 * Helpful functions for exploring data in R:
 
.tiny6[

```r
View(x)      # View the dataset in a spreadsheet

str(x)       # Print the structure of the data frame
  
head(x)      # Print the first few rows
  
tail(x)      # Print the last few rows
  
nrow(x)      # Print the number of rows
  
ncol(x)      # Print the number of columns
  
dim(x)       # Print the dimensions (rows x columns)
  
rownames(x)  # Print row names
  
colnames(x)  # Print column names
```
]

---
#### Topic 6: Data and Data Wrangling

Exploring data in R

.tiny6[  * Let's check to make sure our data loaded correctly:

```r
head(mpg, 10)
```

```
   manufacturer      model displ year cyl      trans drv cty hwy fl   class
1          audi         a4   1.8 1999   4   auto(l5)   f  18  29  p compact
2          audi         a4   1.8 1999   4 manual(m5)   f  21  29  p compact
3          audi         a4   2.0 2008   4 manual(m6)   f  20  31  p compact
4          audi         a4   2.0 2008   4   auto(av)   f  21  30  p compact
5          audi         a4   2.8 1999   6   auto(l5)   f  16  26  p compact
6          audi         a4   2.8 1999   6 manual(m5)   f  18  26  p compact
7          audi         a4   3.1 2008   6   auto(av)   f  18  27  p compact
8          audi a4 quattro   1.8 1999   4 manual(m5)   4  18  26  p compact
9          audi a4 quattro   1.8 1999   4   auto(l5)   4  16  25  p compact
10         audi a4 quattro   2.0 2008   4 manual(m6)   4  20  28  p compact
```
]

---
#### Topic 6: Data and Data Wrangling

Exploring data in R

.tiny6[ * Next, we'll check our data structure:

```r
str(mpg)
```

```
'data.frame':	234 obs. of  11 variables:
 $ manufacturer: chr  "audi" "audi" "audi" "audi" ...
 $ model       : chr  "a4" "a4" "a4" "a4" ...
 $ displ       : num  1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
 $ year        : int  1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
 $ cyl         : int  4 4 4 4 6 6 6 4 4 4 ...
 $ trans       : chr  "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
 $ drv         : chr  "f" "f" "f" "f" ...
 $ cty         : int  18 21 20 21 16 18 18 18 16 20 ...
 $ hwy         : int  29 29 31 30 26 26 27 26 25 28 ...
 $ fl          : chr  "p" "p" "p" "p" ...
 $ class       : chr  "compact" "compact" "compact" "compact" ...
```
]

---
#### Topic 6: Data and Data Wrangling

Exploring data in R

.tiny6[  * Selecting specific data - R reads data as row x column, using brackets [r,c]:

```r
mpg[1,1]  # Print value in row 1, column 1
```

```
[1] "audi"
```

```r
mpg[6,10] # Print value in row 6, column 10
```

```
[1] "p"
```
]

---
#### Topic 6: Data and Data Wrangling

Exploring data in R

.tiny6[
* Selecting specific data - R reads data as row x column, using brackets [r,c]:

```r
mpg[1,]  # Print the first row 
```

```
  manufacturer model displ year cyl    trans drv cty hwy fl   class
1         audi    a4   1.8 1999   4 auto(l5)   f  18  29  p compact
```
]

.tiny6[

```r
mpg[c(1,5),]  # Print the first and fifth row
```

```
  manufacturer model displ year cyl    trans drv cty hwy fl   class
1         audi    a4   1.8 1999   4 auto(l5)   f  18  29  p compact
5         audi    a4   2.8 1999   6 auto(l5)   f  16  26  p compact
```
]

.tiny6[

```r
mpg[1:5,]  # Print the first through fifth row
```

---
#### Topic 6: Data and Data Wrangling

Exploring data in R

.tiny6[
* Selecting specific data - R reads data as row x column, using brackets [r,c]:

```r
mpg[,1]  # Print column 1 by number (as a vector)
```
]
.tiny6[

```
  [1] "audi"      "audi"      "audi"      "audi"      "audi"      "audi"     
  [7] "audi"      "audi"      "audi"      "audi"      "audi"      "audi"     
 [13] "audi"      "audi"      "audi"      "audi"      "audi"      "audi"     
 [19] "chevrolet" "chevrolet" "chevrolet" "chevrolet" "chevrolet" "chevrolet"
 [25] "chevrolet" "chevrolet" "chevrolet" "chevrolet" "chevrolet" "chevrolet"
 [31] "chevrolet" "chevrolet" "chevrolet" "chevrolet" "chevrolet" "chevrolet"
 [37] "chevrolet" "dodge"     "dodge"     "dodge"     "dodge"     "dodge"    
 [43] "dodge"     "dodge"     "dodge"     "dodge"     "dodge"     "dodge"    
 [49] "dodge"     "dodge"     "dodge"     "dodge"     "dodge"     "dodge"    
 [55] "dodge"     "dodge"     "dodge"     "dodge"     "dodge"     "dodge"    
 [61] "dodge"     "dodge"     "dodge"     "dodge"     "dodge"     "dodge"    
 [67] "dodge"     "dodge"     "dodge"     "dodge"     "dodge"     "dodge"    
 [73] "dodge"     "dodge"     "ford"      "ford"      "ford"      "ford"     
 [79] "ford"      "ford"      "ford"      "ford"      "ford"      "ford"     
 [85] "ford"      "ford"      "ford"      "ford"      "ford"      "ford"     
 [91] "ford"      "ford"      "ford"      "ford"      "ford"      "ford"     
 [97] "ford"      "ford"      "ford"      "honda"     "honda"     "honda"    
[103] "honda"     "honda"     "honda"    
```
]

---
#### Topic 6: Data and Data Wrangling

Exploring data in R

.tiny6[
* Selecting specific data - R reads data as row x column, using brackets [r,c]:

```r
mpg[,'manufacturer']  # Print column 1 by name (as a vector)
```
]

.tiny6[

---
#### Topic 6: Data and Data Wrangling

Exploring data in R

.tiny6[
* Selecting specific data - R reads data as row x column, using brackets [r,c]:

```r
mpg$manufacturer  # Print column 1 by name (as a vector)
```
]

.tiny6[

---
#### Topic 6: Data and Data Wrangling

Exploring data in R

.tiny6[
* Selecting specific data - R reads data as row x column, using brackets [r,c]:

```r
mpg[1]  # Print column 1 by number (as a data frame)
```
]

.tiny6[

```
   manufacturer
1          audi
2          audi
3          audi
4          audi
5          audi
6          audi
7          audi
8          audi
9          audi
10         audi
11         audi
12         audi
13         audi
14         audi
15         audi
16         audi
17         audi
```
]

---
#### Topic 6: Data and Data Wrangling

Exploring data in R

.tiny6[
* Selecting specific data - R reads data as row x column, using brackets [r,c]:

```r
mpg['manufacturer']  # Print column 1 by number (as a data frame)
```
]

.tiny6[

---
#### Topic 6: Data and Data Wrangling

Basic steps to working with data in R:

* Check and/or set a working directory.

* Load data.
 
 * Wrangle data (~~Explore,~~ Summarize, and Analyze)!

---
#### Topic 6: Data and Data Wrangling

Summarizing data in R
 
 * Helpful functions for summarizing data in R:
 
.tiny6[

```r
mean(x)        # Calculate and return the average of the input values

max(x)         # Return the maximum value of the input values
  
min(x)         # Return the minimum value of the input values
  
sd(x)          # Calculate and return the standard deviation of the input values
  
length(x)      # Print the set length of the input values
```
]

---
#### Topic 6: Data and Data Wrangling

Summarizing data in R

* What if we wanted to summarize the highway mpg data in our dataset?

.tiny6[

```r
hwy.avg <- mean(mpg$hwy) # Average value for hwy
hwy.max <- max(mpg$hwy) # Maximum value for hwy
hwy.min <- min(mpg$hwy) # Minimum value for hwy
hwy.sd <- sd(mpg$hwy) # Standard deviation for hwy
data.frame(hwy.avg, hwy.max, hwy.min, hwy.sd) # Combine objects to a data frame
```

```
   hwy.avg hwy.max hwy.min   hwy.sd
1 23.44017      44      12 5.954643
```

]

--
 
.tiny6[

```r
summary(mpg$hwy)                              # Quick summary for hwy
```

```
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  12.00   18.00   24.00   23.44   27.00   44.00 
```
]

---
#### Topic 6: Data and Data Wrangling

Summarizing data in R

* What happens if there are missing data?

.tiny6[

```r
mpg2 <- mpg
mpg2[1,"hwy"] <- NA
mpg2[1,]
```

```
  manufacturer model displ year cyl    trans drv cty hwy fl   class
1         audi    a4   1.8 1999   4 auto(l5)   f  18  NA  p compact
```
]
--
.tiny6[

```r
mean(mpg2$hwy)  # Average value for mpg
```

```
[1] NA
```
]
--
.tiny6[
* Need to pay attention to additional arguments

```r
mean(mpg2$hwy, na.rm=TRUE)
```

```
[1] 23.41631
```
]

---
#### Topic 6: Data and Data Wrangling

Summarizing data in R

* What if we wanted to add a new variable (i.e., column)?

.tiny6[

```r
# Let's add a new column of the average mpg
mpg$avg <- (mpg$cty+mpg$hwy)/2
```

```
  manufacturer model displ year cyl      trans drv cty hwy fl   class  avg
1         audi    a4   1.8 1999   4   auto(l5)   f  18  29  p compact 23.5
2         audi    a4   1.8 1999   4 manual(m5)   f  21  29  p compact 25.0
3         audi    a4   2.0 2008   4 manual(m6)   f  20  31  p compact 25.5
4         audi    a4   2.0 2008   4   auto(av)   f  21  30  p compact 25.5
5         audi    a4   2.8 1999   6   auto(l5)   f  16  26  p compact 21.0
```
]

---
#### Topic 6: Data and Data Wrangling

Basic steps to working with data in R:

* Check and/or set a working directory.

* Load data.
 
 * Wrangle data (~~Explore, Summarize,~~ and Analyze)!

---
#### Topic 6: Data and Data Wrangling

Analyzing data in R
 
 * Helpful functions for analyzing data in R:
 
.tiny6[

```r
lm(x)           # Apply a linear model

glm(x)          # Apply a generalized linear model
  
t.test(x)       # Perform a t-test for difference between means
  
aov(x)          # Analysis of variance test
  
prop.test(x)    # Test for a difference between proportions
```
]

---
#### Topic 6: Data and Data Wrangling

Analyzing data in R

* How does highway mileage change with engine displacement?
 
.tiny6[

```r
# Quick plot of the data
with(mpg, plot(x=displ, y=hwy))
```
]
<center><img src="./images/Rplot1.jpeg" alt="RStudio" height=300/></center>

---
#### Topic 6: Data and Data Wrangling

Analyzing data in R

* How does highway mileage change with engine displacement?
 
.tiny6[

```r
# Apply a simple linear model
mod <- lm(hwy~displ, data=mpg)
summary(mod)
```
]

.tiny6[

```

Call:
lm(formula = hwy ~ displ, data = mpg)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.1039 -2.1646 -0.2242  2.0589 15.0105

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 35.6977 0.7204 49.55 <2e-16 ***
displ -3.5306 0.1945 -18.15 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.836 on 232 degrees of freedom
Multiple R-squared: 0.5868,	Adjusted R-squared: 0.585 
F-statistic: 329.5 on 1 and 232 DF, p-value: < 2.2e-16
```
]

---
#### Topic 6: Data and Data Wrangling

Analyzing data in R

* Does highway mileage significantly differ between car classes?
 
.tiny6[

```r
# Quick plot of the data
with(mpg, plot(x=class, y=hwy))
```
]
<center><img src="./images/Rplot2.jpeg" alt="RStudio" height=300/></center>

---
#### Topic 6: Data and Data Wrangling

Analyzing data in R

* Does highway mileage significantly differ between car classes?
 
.tiny6[

```r
# Apply a simple linear model with an ANOVA test
mod2 <- lm(hwy~class, data=mpg)
aov2 <- aov(mod2)
summary(aov2)
```
]

.tiny6[

```
 Df Sum Sq Mean Sq F value Pr(>F) 
class 6 5683 947.2 83.39 <2e-16 ***
Residuals 227 2578 11.4 
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
]

---
class: inverse

#### Exercise - Summarizing Data:
 * The *USArrests* dataset provides data on the number of arrests per 100,000 residents for violent crimes (assault, murder, and rape) in each of the 50 US states in 1973.

* First assign that data to an object called "crime"
.tiny[

```r
crime <- USArrests
```
]

* Using that data, try the following:
 1. Calculate average number of arrests for assault.
 
 2. Identify maximum number of arrests for assault.
 
 3. Print the statistics for Pennsylvania.
 
 4. Was there a linear relationship between murders and assaults?
 
<div class="countdown" id="timer_6220cd2c" style="right:0;bottom:0.5;font-size:48px;" data-warnwhen="0">
<code class="countdown-time">05:00</code>
</div>

---
class: inverse

* First assign that data to an object called "crime"
.tiny[

```r
crime <- USArrests
```
]

* How did you do?

---
#### Exercise - Summarizing Data:

* Calculate average number of arrests for assault.

.tiny[

```r
mean(crime$Assault)
```

```
[1] 170.76
```
]

.tiny[

```r
mean(crime[,2])
```

```
[1] 170.76
```
]

.tiny[

```r
sum(crime$Assault)/length(crime$Assault)
```

```
[1] 170.76
```
]

---
#### Exercise - Summarizing Data:

* Identify maximum number of arrests for assault.

--
 
.tiny[

```r
max(crime$Assault)
```

```
[1] 337
```
]

.tiny[

```r
x <- sort(crime$Assault, decreasing=TRUE)
x[1]
```

```
[1] 337
```
]

.tiny[

```r
x <- sort(crime$Assault)
x[50]
```

```
[1] 337
```
]

---
#### Exercise - Summarizing Data:

* Print the statistics for Pennsylvania.

.tiny[

```r
crime["Pennsylvania",]
```

```
             Murder Assault UrbanPop Rape
Pennsylvania    6.3     106       72 14.9
```
]

.tiny[

```r
crime[38,]
```

```
             Murder Assault UrbanPop Rape
Pennsylvania    6.3     106       72 14.9
```
]

---
#### Exercise - Summarizing Data:

* Was there a linear relationship between murders and assaults?

.tiny[

```r
# Quick plot of the data
with(crime, plot(x=Assault, y=Murder))
```
]
<center><img src="./images/Rplot3.jpeg" alt="RStudio" height=300/></center>

---
#### Exercise - Summarizing Data:

* Was there a linear relationship between murders and assaults?

.tiny[

```r
# Apply a simple linear model
mod <- lm(Murder~Assault, data=crime)
summary(mod)
```
]

.tiny2[

```

Call:
lm(formula = Murder ~ Assault, data = crime)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.8528 -1.7456 -0.3979  1.3044  7.9256

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.631683   0.854776   0.739    0.464    
Assault     0.041909   0.004507   9.298  2.6e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.629 on 48 degrees of freedom
Multiple R-squared:  0.643,	Adjusted R-squared:  0.6356 
F-statistic: 86.45 on 1 and 48 DF,  p-value: 2.596e-12
```
]

---
class: center, middle, inverse, title-slide
#### Topic 6: Data and Data Wrangling
#### Tidyverse

---
#### Topic 6: Data and Data Wrangling - Tidyverse

* Core packages - dplyr, forcats, ggplot2, purrr, readr, tibble, tidyr, stringr

.pull-left4[
 * **dplyr** package
 
  + Introduces consistent set of functions (verbs)
  
  + Applied across all Tidyverse packages
     ]

.pull-right4[
 
<center><img src="./images/dplyr.png" alt="RStudio" height=150/></center>
]

---
#### Topic 6: Data and Data Wrangling - Tidyverse

* Helpful functions - **dplyr**:
 
.tiny6.pull-left2[

```r
filter(x)      # picks cases based on their values

select(x)      # picks columns based on their names

slice(x)       # picks rows by position
  
arrange(x)     # changes the ordering of rows
  
group_by(x)    # allows operations by groups
  
mutate(x)      # adds new variables to a dataset
  
summarise(x)   # summarise multiple values

count(x)       # counts number of rows in a group
  
add_row(x)     # add a row of data to a data frame
```
]

.pull-right2[
<center><img src="./images/dplyr_filter.jpg" alt="RStudio" width=400/></center>

<center><img src="./images/dplyr_mutate.png" alt="RStudio" width=250/></center>
]

---
#### Topic 6: Data and Data Wrangling - Tidyverse

Exploring data in R

* Let's check to make sure our data loaded correctly:

.panelset[

.panel[.panel-name[Tidyverse]

.tiny6[

```r
# Read in data and check first ten rows w/ Tidyverse functions
url <- "https://raw.githubusercontent.com/jeremymack-LU/rprog/master/mpg.csv"
mpg <- read_csv(url)
slice_head(mpg, n=10)
```

```
# A tibble: 10 × 11
 manufacturer model displ year cyl trans drv cty hwy fl class
 <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <chr> <chr>
 1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
 2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
 3 audi a4 2 2008 4 manu… f 20 31 p comp…
 4 audi a4 2 2008 4 auto… f 21 30 p comp…
 5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
 6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
 7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
 8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
 9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
```
]
]

.panel[.panel-name[base R]

.tiny6[

```r
# Read in data and check first ten rows w/ base R functions
url <- "https://raw.githubusercontent.com/jeremymack-LU/rprog/master/mpg.csv"
mpg <- read.csv(url)
head(mpg, n=10)
```

---
#### Topic 6: Data and Data Wrangling - Tidyverse

Summarizing data in R

* What if we wanted to summarize the highway mpg data in our dataset?

.panelset[

.panel[.panel-name[Tidyverse]

.tiny6[

```r
summarize(mpg,                 # Data
          hwy.avg=mean(hwy),   # Average value for hwy
          hwy.max=max(hwy),    # Maximum value for hwy
          hwy.min=min(hwy),    # Minimum value for hwy
          hwy.sd=sd(hwy))      # Standard deviation for hwy
```

```
# A tibble: 1 × 4
 hwy.avg hwy.max hwy.min hwy.sd
 <dbl> <dbl> <dbl> <dbl>
1 23.4 44 12 5.95
```
]
]

.panel[.panel-name[base R]

.tiny6[

```
   hwy.avg hwy.max hwy.min   hwy.sd
1 23.44017      44      12 5.954643
```
]
]
]

---
#### Topic 6: Data and Data Wrangling - Tidyverse

Summarizing data in R

* What if we wanted to add a new variable (i.e., column)?

.panelset[

.panel[.panel-name[Tidyverse]

.tiny6[

```r
# Let's add a new column of the average mpg
mpg <- mutate(mpg,
 avg=(cty+hwy)/2)
```

```
# A tibble: 5 × 12
 manufacturer model displ year cyl trans drv cty hwy fl class avg
 <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <chr> <chr> <dbl>
1 audi a4 1.8 1999 4 auto… f 18 29 p comp… 23.5
2 audi a4 1.8 1999 4 manu… f 21 29 p comp… 25 
3 audi a4 2 2008 4 manu… f 20 31 p comp… 25.5
4 audi a4 2 2008 4 auto… f 21 30 p comp… 25.5
5 audi a4 2.8 1999 6 auto… f 16 26 p comp… 21 
```
]
]

.panel[.panel-name[base R]

.tiny6[

```r
# Let's add a new column of the average mpg
mpg$avg <- (mpg$cty+mpg$hwy)/2
```

---
#### Topic 6: Data and Data Wrangling - Tidyverse

* Core packages - dplyr, forcats, ggplot2, purrr, readr, tibble, tidyr, stringr

.pull-left4[
 * **dplyr** package
 
 + Introduces consistent set of functions (verbs)
 
 + Applied across all Tidyverse packages
 
 
 
 
 + Imports pipe operator (%>%) from **magrittr** package
 
 + Forwards an object, into a function
]

.pull-right4[
 
<center><img src="./images/dplyr.png" alt="RStudio" height=150/></center>
 
<center><img src="./images/magrittr.jpg" alt="RStudio" height=170/></center>
]

---
#### Topic 6: Data and Data Wrangling - Tidyverse

Summarizing data in R

* What if we wanted to summarize the data frame we created earlier?

.tiny6.pull-left[

```r
# Summarise the hwy variable
summarize(mpg,               # Data
          hwy.avg=mean(hwy), # Average hwy
          hwy.max=max(hwy),  # Maximum hwy
          hwy.min=min(hwy),  # Minimum hwy
          hwy.sd=sd(hwy))    # Std. deviation
```

```
# A tibble: 1 × 4
 hwy.avg hwy.max hwy.min hwy.sd
 <dbl> <dbl> <dbl> <dbl>
1 23.4 44 12 5.95
```

]

.tiny6.pull-right[

```r
# Summarise the hwy variable
# Pipe mpg object into the summarize function
mpg %>% summarize(hwy.avg=mean(hwy),
                  hwy.max=max(hwy),
                  hwy.min=min(hwy),
                  hwy.sd=sd(hwy))
```

```
# A tibble: 1 × 4
 hwy.avg hwy.max hwy.min hwy.sd
 <dbl> <dbl> <dbl> <dbl>
1 23.4 44 12 5.95
```
]

---
#### Topic 6: Data and Data Wrangling - Tidyverse

* Core packages - dplyr, forcats, ggplot2, purrr, readr, tibble, tidyr, stringr

.pull-right4[
 
<center><img src="./images/dplyr.png" alt="RStudio" height=150/></center>
 
<center><img src="./images/magrittr.jpg" alt="RStudio" height=170/></center>
]

---
#### Topic 6: Data and Data Wrangling - Tidyverse

* For example, in our *mpg* dataset, let's say we're interested in the average highway mpg in cars in 2008, based on their number of cylinders.
 
  + Multiple objects approach:
  
.tiny6.pull-left[

```r
a <- filter(mpg, year==2008)
b <- group_by(a, cyl)
c <- summarize(b,
 Avg=mean(hwy))
d <- arrange(c, desc(Avg))
print(d)
```
]

.tiny6.pull-right[

```
# A tibble: 4 × 2
 cyl Avg
 <dbl> <dbl>
1 4 29.3
2 5 28.8
3 6 23.5
4 8 18 
```
]

---
#### Topic 6: Data and Data Wrangling - Tidyverse

* For example, in our *mpg* dataset, let's say we're interested in the average highway mpg in cars in 2008, based on their number of cylinders.
 
  + Nested approach:
  
.tiny6.pull-left[

```r
arrange(
   summarize(
       group_by(
        filter(mpg,year==2008),
        cyl),
       Avg = mean(hwy)),
   desc(Avg)
 )
```
]

.tiny6.pull-right[

```
# A tibble: 4 × 2
 cyl Avg
 <dbl> <dbl>
1 4 29.3
2 5 28.8
3 6 23.5
4 8 18 
```
]

---
#### Topic 6: Data and Data Wrangling - Tidyverse

* For example, in our *mpg* dataset, let's say we're interested in the average highway mpg in cars in 2008, based on their number of cylinders.
 
  + Piping approach:
  
.tiny6.pull-left[

```r
mpg %>%
   filter(year==2008) %>%
   group_by(cyl) %>%
   summarize(Avg=mean(hwy)) %>%
   arrange(desc(Avg))
```
]

.tiny6.pull-right[

```
# A tibble: 4 × 2
 cyl Avg
 <dbl> <dbl>
1 4 29.3
2 5 28.8
3 6 23.5
4 8 18 
```
]

---
class: inverse

#### Review - Data and Data Wrangling:

* Working directories and data can be set and loaded programmatically, or with the RStudio IDE.

* R identifies data by row (observation) then column (variable).

* Pay attention to function arguments - missing values can cause problems!

* Tidyverse packages provides a consistent language (functions) and grammar (arguments) that integrate nicely with a piping workflow.
 
 
.pull-right6[<img src="./images/tidyverse5.png" alt="RStudio" height=200/> <img src="./images/magrittr2.png" alt="RStudio" height=200/>]

---
class: center, middle, inverse

#### Topic 7: Extras - RStudio Projects,
#### Other things to do in R, and Resources

---
#### Topic 7: Extras - RStudio Projects, Other things to do in R, and Resources
  
Basic steps to working with data in R:
  
 * Check and/or set a working directory.

* Load data.

* Wrangle data (Explore, Summarize, and Analyze)!
 
---
#### Topic 7: Extras - RStudio Projects, Other things to do in R, and Resources

Basic steps to working with data in R:

* ~~Check and/or set a working directory.~~ Set up an RStudio Rroject.

* Load data.
 
 * Wrangle data (Explore, Summarize, and Analyze)!

---
#### Topic 7: Extras - RStudio Projects, Other things to do in R, and Resources

R Studio projects:

.pull-right2[
<center><img src="./images/cracked_setwd.png" alt="R timeline" height=125 </></center>
<center><img src="./images/rproject.png" alt="R timeline" width=200 </></center>
]

.pull-left2[
* **Projects** keep all files associated with a project together.

* "Home" directory of the project becomes the current working directory.

* Projects can **enhance reproducibility** if *paths within scripts are kept relative and not absolute*.
]

---
#### Topic 7: Extras - RStudio Projects, Other things to do in R, and Resources
Other things to do in R:

.right-column2[
<center><img src="./images/rmarkdown.png" height=200 alt="RStudio" </center>
 
<center><img src="./images/shiny.png" height=200 alt="RStudio" </center>
]

.left-column2[
* [R Markdown documents](<https://rmarkdown.rstudio.com>)
 - Documents
 - Websites
 - Books
 
 
 
* [Shiny Apps](<https://jeremymack.shinyapps.io/purpleair/>)
 - Web applications
 - Websites
 - Dashboards 
]

---
#### Topic 7: Extras - RStudio Projects, Other things to do in R, and Resources
  
Resources:
  
.pull-left4[
1. [R for Data Science](https://r4ds.had.co.nz/)

2. [RStudio Cheat Sheets](https://www.rstudio.com/resources/cheatsheets/)

3. [Twitter for R Programmers](https://www.t4rstats.com/follow-some-folks.html)
  ]

.pull-right4[
 <img src="./images/tidyverse2.png" alt="Tidyverse" height=250 </>
 <img src="./images/rtwitter_blank.png" alt="Tidyverse" </>
]

---
class: center, middle, inverse, title-slide

## Questions?
<img src="./images/contact.png" alt="RStudio" height=400/>