R Basics

Learn the basics of R and RMarkdown

This post was written by Victoria Bolowsky as part of her coursework for RS 932 at the MGH Institute of Health Professions. Victoria created this tutorial to help introduce new users of R to some of the basic functions you’ll need to know to become more comfortable working in R.

Let’s Get Started!

First, let’s cover some basics you’ll need to know to work in R.

At its most basic level, R is a giant calculator. You can use it to add, subtract, multiple, or divide.

5*75
[1] 375

You were probably hoping to use R for more than just its calculator powers, so let’s move on to some of it others capabilties.

Function calls

At the heart of running analyses in R are Functions. The general form for calling R functions is

FunctionName(arg.1 = value.1, arg.2 = value.2, ..., arg.n - value.n)

Arguments can be matched by name; unnamed arguments will be matched by position. What you enter as arguments depends on what analyses you are trying to run (and the documentation for the function will tell you what information you need to provide to run the function).

Clear as mud, right? Here is an example that might help clarify things:

summary(mtcars)
  • The summary part of the above code is the function. It is telling R to give a summary of what is in the dataset.

  • The mtcars part is the argument required for the summary function, in this case, the name of the dataset. It is telling R what dataset to reference when running summary. In this example, we are using a dataset mtcars that comes pre-loaded with R. Normally, you’ll want to load your data into your environment before running analyses.

Assignment

In R, we can run a function and look at the results immediately. Run the following code:

sqrt(10)
[1] 3.162278

R calculates the square root of 10, and presents the results under the code chunk. But what if we want to save those results to use them in another function? You can save the results of a function call to an “object” through the “Assignment” operator.

  • The <- operator (less than followed by a dash) is used to save values
  • The name on the left (called an “object”) gets “assigned” the value on the right.
  • Names of objects should start with a letter, and contain only letters, numbers, underscores, and periods.

    sqrt(10) ## calculate square root of 10; result is not stored anywhere
    
    [1] 3.162278
    
    x <- sqrt(10) # assign result to a variable named x
    

Note that when you run the code for x <- sqrt(10) nothing appears in your document. This is because the results have been saved. If we want to see the saved object, we need to tell R that is what we want! If you want to see what is stored in an object, all you need to do is type the name of the object and run the code.

x
[1] 3.162278
  1. Create a object called “y” that is equal to 8*4.

You can also add objects together. Try adding x + y and storing it as z, and then printing the results stored in z.

y <- 8*4

z <- x + y 
z
[1] 35.16228

Data structures

There are two basic data structures in R: vectors and lists.

Vectors are of a particular type, e.g., integer, double, or character.

The concatenate function c() allows us to bind variables, numbers, or strings together.
Vectors can be created using the c function, like this:

x <- c(1, 2, 3) # numeric vector
x
[1] 1 2 3

Note that because we used the same variable name that we used earlier, we have overwritten what we previously saved to object “x”. Our “x” object is now a numeric vector. We can also create a character vector:

y <- c("1", "2", "3") # character vector
y
[1] "1" "2" "3"

Lists are not restricted to a single type and can be used to hold just about anything. They can be created using the list function, like this:

z <- list(1, c(1, 2, 3, 4), list(c(1, 2), c("a", "b")))
z
[[1]]
[1] 1

[[2]]
[1] 1 2 3 4

[[3]]
[[3]][[1]]
[1] 1 2

[[3]][[2]]
[1] "a" "b"

Creating a Dataframe

When working in R, you’ll probably work the most with dataframes. We can create a tiny sample dataset using the data.frame function. Here, I’ll create a dataframe with two variables, x and y and save it to an object called a_new_dataframe.

a_new_dataframe <- data.frame(x=2, y=4)

When you run the code above, a new object will appear in your environment called a_new_dataframe. If you click on it, you will then see the tiny dataset we have created.

We can use the $ operator to print specific elements of our dataframe. Here, we will print “x” from our dataframe.

a_new_dataframe$x
[1] 2
  1. How can you print y from the dataframe a_new_dataframe?

We can also use the c() function to create another dataframe that now has three numbers per variable.

another_dataframe <- data.frame(x = c(2,4,6), y = c(3,5,7))
  1. Take a look at the dataframe you just created. What do you see?

Loading Packages

So far, we have been working with functions that come pre-loaded with Base R. However, a lot of what you’ll likely want to do requires packages. Packages are designed by R users to perform different types of tasks and analyses. If you are looking for a package for a specific task, https://cran.r-project.org/web/views/ and https://r-pkg.org are good places to start.

You can install a package in R using the install.packages() function. Once a package is installed you have to “attach” (i.e., activate it) it in order to use the functions it contains. To attach a package, use the library function. You only need to install a package once, but you will need to attach it every time you open a new session of R. The code below installs the package ‘readr’ which can be used to load datasets.

install.packages("readr")  ## installs the package, only need to do once
library(readr)  ##attaches the package so it is ready to be used

I suggest installing packages directly in the console instead of within your .Rmd file. If you do install it from within your .Rmd file, be sure to comment out(#) the code so it doesn’t reinstall when you knit your file.

For this workshop, I have pre-installed all of the packages we will need into our RStudio Cloud environment. You can see what packages are available by looking in the “Packages” tab on the right.


Troubleshooting

One thing that you’ll need to get really comfortable with when using R is that things often go wrong and don’t run when you expect them to, or as you expect them to. There are often very simple fixes to get you back on track. Here are some things to help troubleshoot when things go wrong:

Do I have the necessary packages loaded to execute this function?

Make sure that you have the packages you needed loaded. If you have the packages loaded, it may be that you are experiencing what is called “package masking”. Sometimes, packages have different functions with the same name. If you have the conflicting packages loaded at the same time, R just has to guess which one you want (it goes with the last one you loaded), and that may not be the one you actually need. If you suspect that masking is occurring, you can specify which package you want to use for a specific function:

library(dplyr)
mtcars %>%
  dplyr::select(mpg, am)
                     mpg am
Mazda RX4           21.0  1
Mazda RX4 Wag       21.0  1
Datsun 710          22.8  1
Hornet 4 Drive      21.4  0
Hornet Sportabout   18.7  0
Valiant             18.1  0
Duster 360          14.3  0
Merc 240D           24.4  0
Merc 230            22.8  0
Merc 280            19.2  0
Merc 280C           17.8  0
Merc 450SE          16.4  0
Merc 450SL          17.3  0
Merc 450SLC         15.2  0
Cadillac Fleetwood  10.4  0
Lincoln Continental 10.4  0
Chrysler Imperial   14.7  0
Fiat 128            32.4  1
Honda Civic         30.4  1
Toyota Corolla      33.9  1
Toyota Corona       21.5  0
Dodge Challenger    15.5  0
AMC Javelin         15.2  0
Camaro Z28          13.3  0
Pontiac Firebird    19.2  0
Fiat X1-9           27.3  1
Porsche 914-2       26.0  1
Lotus Europa        30.4  1
Ford Pantera L      15.8  1
Ferrari Dino        19.7  1
Maserati Bora       15.0  1
Volvo 142E          21.4  1

When you load packages, R will also warn you about which functions are being masked due to naming conflicts.

Have I imported the dataset? Is it in my Global Environment?

Make sure you have the data loaded. If you have recently cleared your environment, you may need to reload the data before proceeding with your analyses.

Am I referencing the dataset correctly?

Spelling and capitalizaiton matter. Make sure you are typing the dataset name exactly as it appears in your environment.

Am I referencing the variables within the dataset correctly?

Again, spelling and capitalizaiton matter. Make sure you are typing the variable name(s) exactly as it appears in the dataset.

Do I have punctuation in the correct place?

Make sure that you have the correct number of open and close parantheses (), the correct number of open and close quotation marks " ", and that commas are in the right place.

What if my code is exactly right and I am totally sure of it?

Google, Google, Google. When I get an error message that I don’t recognize and I’ve checked the code itself, I copy and paste the error message directly into Google’s search bar. It is more often than not that one of the first five search results gives me a solution to the problem. If it doesn’t, I save my work, clear the Environment and Console, restart R, and try again.

Depending on the source of your error, you can also post a message to RStudio, StackOverflow, or even Twitter (#RStats, #RStudio). You’ll find that the large community of R users is more than willing to help you troubleshoot! I strongly suggest taking advantage of this.

I’ll also suggest following #RStats on Twitter. Many R users create websites, tutorials, and videos for how to do things in R. I bookmark anything that I think could be helpful in the future–it is a great resource and way to stay connected to the R Community.


When You Want to Know More

When I started using R, I frequently referenced Dr. Danielle Navarro’s book (and still do!). She offers in-depth and clear explanations of statistical concepts and how to use R. Her Introduction to R section is wonderful.

Otherwise, as I said just above, Google anything and everything that you need help with in R, and consider every error message as a learning opportunity to help yourself and future R-users.


Portions of this tutorial were adapted by Annie B. Fox from the "Introduction to R" workshop created by the Data Science Services team Harvard University. The original source is released under a Creative Commons Attribution-ShareAlike 4.0 Unported https://creativecommons.org/licenses/by-sa/4.0/

Portions of this tutorial were created by Annie B. Fox and Victoria Bolowsky at MGH Institute of Health Professions and adapted by Annie B. Fox for the "Intro to R" Workshop at MGH IHP. It is released under a Creative Commons Attribution-ShareAlike 4.0 Unported https://creativecommons.org/licenses/by-sa/4.0/.
Avatar
Annie B. Fox
Assistant Professor of Quantitative Methods

My research interests include mental illness stigma, women’s health, and quantitative methods