# R Cheat Sheet for Data Science

This is not an exhaustive reference for R language, I just wrote this according to my recalling of DataCamp courses and Data Science Specialization by Johns Hopkins University on Coursera, but this may be suitable for most data analysts and data scientists.

- Data types - R Objects
- Data types - Vectors and Lists
- Data types - Matrices
- Data types - Factors
- Data types - Missing Values
- Data types - Data Frames
- Data Types - Names Attribute
- Reading Tabular Data
- Reading Large Tables
- Connections: Interfaces to the Outside World
- Subsetting - Basics
- Subsetting - Lists
- Subsetting - Matrices
- Subsetting - Partial Matching
- Subsetting - Removing Missing Values
- Vectorized Operations
- Control Structures - Introduction
- Functions
- Scoping Rules - Symbol Binding
- Scoping Rules - R Scoping Rules
- Scoping Rules - Optimization Example (OPTIONAL)
- Coding Standards
- Dates and Times

# Data types - R Objects

There are 5 atomic data types (classes):

- character
- numeric (real numbers)
- integer
- complex
- logical

Vector can only contain same class of data, list can contain different ones.

Most numbers in R are treated as numeric (double precision real numbers), if you want explicitly an integer, specify a “L” suffix.

Inf is infinity, \(1/0=Inf\), \(1/Inf=0\).

NaN is “Not a number”, \(0/0=NaN\), Both Inf and NaN are numeric.

**Mixing objects**, when objects of different class are mixed in a vector, coercion occurs so that every element in the vector is of the same class.

```
c(1, "a") # character
c(1, TRUE) # numeric
c("a", TRUE) # character
```

# Data types - Vectors and Lists

Vector’s elements are atomic, when printed, embraced with single square brackets, List’s elements are recursive, when printed, embraced with double square brackets

# Data types - Matrices

Matrix can be constructed directly from vector.

```
x <- 1:6
dim(x) <- c(2, 3)
x
# 1 3 5
# 2 4 6
```

Another way to create matrix is column-binding or row-binding: `cbind`

, `rbind`

.

# Data types - Factors

```
factor(c("yes", "yes", "no", "yes"))
```

# Data types - Missing Values

`is.na()`

is used to test if objects are NA

`is.nan()`

for NaN

NA values also has class, like integer NA, character NA, etc.

NaN is also NA, but the converse is not true.

# Data types - Data Frames

```
data.frame(foo=1:4, bar=c(T,T,F,F))
```

# Data Types - Names Attribute

All R objects can also have names

```
x <- 1:3 # vector
names(x) <- c("foo", "bar", "baz")
x <- list(a=1, b=2, c=3) # list
m <- matrix(1:4, nrow=2, ncol=2) # matrix
dimnames(m) <- list(c("a", "b"), c("c", "d"))
```

# Reading Tabular Data

Commonly used functions:

Reading data

```
read.table, read.csv
readLines
source # inverse of dump
dget # inverse of dput
load
unserialize
```

Writing data

```
write.table
writeLines
dump
dput
save
serialize
```

# Reading Large Tables

Specifying parameter `colClasses`

can make `read.table`

much faster.

```
initial <- read.table("datatable.txt", nrows=100)
classes <- sapply(initial, class)
tabAll <- read.table("datatable.txt", colClasses=classes)
```

# Connections: Interfaces to the Outside World

Often used connections: `file, gzfile, bzfile, url`

.

# Subsetting - Basics

- [] returns element of the same class as original, can be used to extract more than one elements.
- [[]] is used to extract elements of list or data frame, and the returned objects will not necessarily be of the same class.
- $ is used to extract element of list or data frame by name, semantics are similar to [[]].

# Subsetting - Lists

```
x <- list(foo=1:4, bar=0.6)
x[1]
# $foo
# [1] 1 2 3 4
x[[1]]
# [1] 1 2 3 4
x$bar
# [1] 0.6
x[["bar"]]
# [1] 0.6
x["bar"]
# $bar
# [1] 0.6
```

[] can take a vector, [[]] can take integer sequence

```
x <- list(a=list(1,2,3), b=c(4,5,6))
x[[c(1,3)]]
# [1] 3
x[[1]][[3]]
# [1] 3
x[[c(2,1)]]
# [1] 4
```

# Subsetting - Matrices

```
x <- matrix(1:6, 2, 3)
x[1, 2]
# [1] 3
x[2, 1]
# [1] 2
x[1,]
# [1] 1 3 5
x[,2]
# [1] 3 4
x[1, 2, drop=FALSE]
# [,1]
# [1,] 3
x[1,, drop=FALSE]
# [,1] [,2] [,3]
# [1,] 1 3 5
```

# Subsetting - Partial Matching

```
x <- list(abc=1:5)
x$a
# [1] 1 2 3 4 5
x[["a"]]
# NULL
x[["a", exact=FALSE]]
# [1] 1 2 3 4 5
```

# Subsetting - Removing Missing Values

```
x <- c(1,2,NA,4,NA,6)
bad <- is.na(x)
x[!bad]
# [1] 1 2 4 6
y <- c("a", "b", NA, "d", NA, "f")
good <- complete.cases(x, y)
x[good]
# [1] 1 2 4 6
y[good]
# [1] "a" "b" "d" "f"
```

# Vectorized Operations

for R matrix objects

```
x * x # element-wise multiplication
x %*% x # true matrix multiplication
```

# Control Structures - Introduction

Common structures are:

- if, else
- for
- while
- repeat # execute an infinite loop
- break
- next # skip an iteration of a loop
- return

# Functions

You can mix positional matching with matching by name. When an argument is matched by name, it’s taken from the arguments list and the remaining unnamed arguments are matched in the order of the definition of the function.

… argument indicates a variable numbers of arguments that are passed on to other function, … is usually used to extend other function and you don’t want to copy the entire argument list of the original function.

The … argument is also used when the number of the arguments can’t be known in advance, think of the function `paste, cat`

.

# Scoping Rules - Symbol Binding

R will search symbols in `.GlobalEnv`

first, if doesn’t find the needed symbols, it will continues searching in packages loaded into workspace (print those packages by `search()`

).

R has separated namespaces for functions and nonfunctions, so you can have a function and a variable both called the same name.

# Scoping Rules - R Scoping Rules

When a free variable in a function can’t be found the function, R will search in parent environment, and down the sequence of parent environment till the variable is found (top environment is global environment or package namespace depending where the variable is defined). If still can’t find the variable, search in the `search()`

list, until all the .GlobalEnv and packages are searched. If still can’t find, an error will occur.

# Scoping Rules - Optimization Example (OPTIONAL)

R uses lexical scoping instead of dynamic scoping.

```
y <- 10
f <- function(x) {
y <- 2
y^2 + g(x)
}
g <- function(x) {
x*y
}
```

f(3) = 34, g only finds its free variable in defining environment, not the calling environment. Scheme, Perl, Python and Common Lisp all support lexical scoping.