Skip to main content

R Packages

The Beginner’s Guide to R Packages

Now that we’ve installed R and R Studio and have a basic understanding of how they work together, we can get at what makes R so special: packages.


What is an R package?

So far, anything we’ve played around with in R uses the “base” R system. Base R, or everything included in R when you download it, has rather basic functionality for statistics and plotting but it can sometimes be limiting. To expand upon R’s basic functionality, people have developed packages. A package is a collection of functions, data, and code conveniently provided in a nice, complete format for you. At the time of writing, there are just over 19,266 packages available to download at CRAN- each with their own specialized functions and code, all for some different purpose. For a really in depth look at R Packages (what they are, how to develop them), check out Hadley Wickham’s book from O’Reilly, “R Packages.” Link will be provided in our blog post, so definitely heck that and please follow our blogpost too.

A package is not to be confused with a library (these two terms are often conflated in colloquial speech about R). A library is the place where the package is located on your computer. To think of an analogy, a library is, well, a library… and a package is a book within the library. The library is where the books/packages are located.

Packages are what make R so unique. Not only does base R have some great functionality but these packages greatly expand its functionality. And perhaps most special of all, each package is developed and published by the R community at large and deposited in repositories.

What are repositories?

A repository is a central location where many developed packages are located and available for download.

There are three big repositories:
1. 
CRAN (Comprehensive R Archive Network): R’s main repository (>19,000 packages available!)
2. BioConductor: A repository mainly for bioinformatic-focused packages
3. 
GitHub: A very popular, open source repository (not R specific!)

Take a second to explore the links above and check out the various packages that are out there!

How do you know what package is right for you?

So you know where to find packages… but there are so many of them, how can you find a package that will do what you are trying to do in R? There are a few different avenues for exploring packages.

First, CRAN groups all of its packages by their functionality/topic into 35 “themes.” It calls this its “Task view.” This at least allows you to narrow the packages you can look through to a topic relevant to your interests.


Second, there is a great website, RDocumentation, which is a search engine for packages and functions from CRAN, BioConductor, and GitHub (i.e.: the big three repositories). If you have a task in mind, this is a great way to search for specific packages to help you accomplish that task! It also has a “task” view like CRAN, that allows you to browse themes.

More often, if you have a specific task in mind, Googling that task followed by “R package” is a great place to start! From there, looking at tutorials, vignettes, and forums for people already doing what you want to do is a great way to find relevant packages.

How do you install packages?

Great! You’ve found a package you want… How do you install it?

Installing from CRAN

If you are installing from the CRAN repository, use the install.packages() function, with the name of the package you want to install in quotes between the parentheses (note: you can use either single or double quotes). For example, if you want to install the package “ggplot2”, you would use: install.packages("ggplot2")

Try doing so in your R console! This command downloads the “ggplot2” package from CRAN and installs it onto your computer.

If you want to install multiple packages at once, you can do so by using a character vector, like: install.packages(c("ggplot2", "devtools", "lme4"))

If you want to use RStudio’s graphical interface to install packages, go to the Tools menu, and the first option should be “Install packages…” If installing from CRAN, select it as the repository and type the desired packages in the appropriate box.

Various methods to install packages within R/R Studio

Installing packages from CRAN through R/R Studio

Installing from Bioconductor

The BioConductor repository uses their own method to install packages. First, to get the basic functions required to install through BioConductor, use: source("https://bioconductor.org/biocLite.R")

This makes the main install function of BioConductor, biocLite(), available to you. Following this, you call the package you want to install in quotes, between the parentheses of the biocLite command, like so: biocLite("GenomicFeatures")

Installing packages with BioConductor

Installing from GitHub

This is a more specific case that you probably won’t run into too often. In the event you want to do this, you first must find the package you want on GitHub and take note of both the package name AND the author of the package. Check out this guide for installing from GitHub, but the general workflow is:

  1. install.packages("devtools") - only run this if you don’t already have devtools installed. If you’ve been following along with this lesson, you may have installed it when we were practicing installations using the R console
  2. library(devtools) - more on what this command is doing immediately below this
  3. install_github("author/package") replacing “author” and “package” with their GitHub username and the name of the package.
Installing packages from GitHub

Loading packages

Installing a package does not make its functions immediately available to you. First you must load the package into R; to do so, use the library() function. Think of this like any other software you install on your computer. Just because you’ve installed a program, doesn’t mean it’s automatically running - you have to open the program. Same with R. You’ve installed it, but now you have to “open” it. For example, to “open” the “ggplot2” package, you would run: library(ggplot2)

NOTE: Do not put the package name in quotes! Unlike when you are installing the packages, the library() command does not accept package names in quotes!

Step one of getting a package is installing it, but to use it, you must load it using library(); similar to installing R and then loading it by opening the .exe file

There is an order to loading packages - some packages require other packages to be loaded first (dependencies). That package’s manual/help pages will help you out in finding that order, if they are picky.

If you want to load a package using the R Studio interface, in the lower right quadrant there is a tab called “Packages” that lists out all of the packages and a brief description, as well as the version number, of all of the packages you have installed. To load a package just click on the checkbox beside the package name

Using the R Studio interface to load a package

Updating, removing, unloading packages

Once you’ve got a package, there are a few things you might need to know how to do:

Checking what packages you have installed

If you aren’t sure if you’ve already installed a package, or want to check what packages are installed, you can use either of: installed.packages() or library() with nothing between the parentheses to check!

In R Studio, that package tab introduced earlier is another way to look at all of the packages you have installed.

Updating packages

You can check what packages need an update with a call to the function old.packages() This will identify all packages that have been updated since you installed them/last updated them.

To update all packages, use update.packages(). If you only want to update a specific package, just use once again install.packages("packagename")

Functions used to see what packages are installed and update them

Within the R Studio interface, still in that Packages tab, you can click “Update,” which will list all of the packages that are not up to date. It gives you the option to update all of your packages, or allows you to select specific packages.

Using the R Studio interface to update your packages

You will want to periodically check in on your packages and check if you’ve fallen out of date - be careful though! Sometimes an update can change the functionality of certain functions, so if you re-run some old code, the command may be changed or perhaps even outright gone and you will need to update your code too!

Unloading packages

Sometimes you want to unload a package in the middle of a script - the package you have loaded may not play nicely with another package you want to use.

To unload a given package you can use the detach() function. For example, detach("package:ggplot2", unload=TRUE) would unload the ggplot2 package (that we loaded earlier). Within the R Studio interface, in the Packages tab, you can simply unload a package by unchecking the box beside the package name.

Unloading or “detaching” a package

Uninstalling packages

If you no longer want to have a package installed, you can simply uninstall it using the function remove.packages(). For example, remove.packages("ggplot2")

(Try that, but then actually re-install the ggplot2 package - it’s a super useful plotting package!)

Within R Studio, in the Packages tab, clicking on the “X” at the end of a package’s row will uninstall that package.

Uninstalling packages

How do you know what version of R you have?

Sometimes, when you are looking at a package that you might want to install, you will see that it requires a certain version of R to run. To know if you can use that package, you need to know what version of R you are running!

One way to know your R version is to check when you first open R/R Studio - the first thing it outputs in the console tells you what version of R is currently running. If you didn’t pay attention at the beginning, you can type version into the console and it will output information on the R version you are running. Another helpful command is sessionInfo() - it will tell you what version of R you are running along with a listing of all of the packages you have loaded. The output of this command is a great detail to include when posting a question to forums - it tells potential helpers a lot of information about your OS, R, and the packages (plus their version numbers!) that you are using.

Ways to see what version of R you are running

Using the commands in a function

In all of this information about packages, we haven’t actually discussed how to use a package’s functions!

First, you need to know what functions are included within a package. To do this, you can look at the man/help pages included in all (well-made) packages. In the console, you can use the help() function to access a package’s help files. Try help(package = "ggplot2") and you will see all of the many functions that ggplot2 provides. Within the R Studio interface, you can access the help files through the Packages tab (again) - clicking on any package name should open up the associated help files in the “Help” tab, found in that same quadrant, beside the Packages tab. Clicking on any one of these help pages will take you to that functions help page, that tells you what that function is for and how to use it.

The help functions available to you

Once you know what function within a package you want to use, you simply call it in the console like any other function we’ve been using throughout this lesson. Once a package has been loaded, it is as if it were a part of the base R functionality.

If you still have questions about what functions within a package are right for you or how to use them, many packages include “vignettes.” These are extended help files, that include an overview of the package and its functions, but often they go the extra mile and include detailed examples of how to use the functions in plain words that you can follow along with to see how to use the package. To see the vignettes included in a package, you can use the browseVignettes() function. For example, let’s look at the vignettes included in ggplot2:browseVignettes("ggplot2") . You should see that there are two included vignettes: “Extending ggplot2” and “Aesthetic specifications.” Exploring the Aesthetic specifications vignette is a great example of how vignettes can be helpful, clear instructions on how to use the included functions.

How to browse vignettes for packages

Summary

In this lesson, we’ve explored R packages in depth. We examined what a packages is (and how it differs from a library), what repositories are, and how to find a package relevant to your interests. We investigated all aspects of how packages work: how to install them (from the various repositories), how to load them, how to check which packages are installed, and how to update, uninstall, and unload packages. We took a small detour and looked at how to check what version of R you have, which is often an important detail to know when installing packages. And finally, we spent some time learning how to explore help files and vignettes, which often give you a good idea of how to use a package and all of its functions.

If you still want to learn more about R packages, here are two great resources! R Packages: A Beginner’s Guide from Adolfo Álvarez on DataCamp and a lesson from the University of Washington, on an Introduction to R Packages from Ken Rice and Timothy Thornton. I’ll provide those links on my blogpost for this lecture.

Comments

Popular posts from this blog

Debugging Your R Code: Indications and Best Practices

The Beginner’s Guide to Debugging Tools: As with any programming language, it's important to debug your code in R to ensure it is functioning correctly. Here are some indications that there may be something wrong with your R code, along with examples of common mistakes that can cause these issues: Error messages:   If R encounters an error in your code, it will often provide an error message indicating the source of the problem. For example, if you forget to close a parenthesis, you may get an error message like "Error: unexpected ')' in 'my_function'". Here, R is indicating that there is a syntax error in your function. Unexpected output:  If the output of your code is unexpected or doesn't match your expectations, there may be an issue with your code. For example, if you are trying to calculate the mean of a vector of numbers, but the output is much higher or lower than expected, there may be an issue with the code you used to calculate the mean. L...

Getting Started with R Programming

The Beginner’s Guide to R Programming. I'm very excited to start R Programming and I hope you are too. This is the second course in the Data Science Specialization and it focuses on the nuts and bolts of using R as a programming language. The recommended background for this course is the course The Data Scientist's Toolbox . It is possible to take this class concurrently with that class but you may have to read ahead in the prerequisite class to get the relevant background for this class. For a complete set of course dependencies in the Data Science Specialization please see the course dependency chart , that has been posted on our blogpost. The primary way to interact with me and the other students in this course is through the discussion forums which in our case are comments section under the lectures, social media and blogpost . Here, you can start new threads by asking questions or you can respond to other people's questions. If you have a question about any aspect...

Mastering R Programming: Best Coding Practices for Readable and Maintainable Code

The Beginner’s Guide to Coding Standards: When it comes to programming, writing code that is easy to read and maintain is just as important as writing code that works. This is especially true in R programming, where it's common to work with large datasets and complex statistical analyses. In this blog post, we'll go over some coding standards that you should follow when writing R code to ensure that your code is easy to read and maintain . Indenting One of the most important coding standards to follow is to use consistent indenting. Indenting makes your code more readable by visually indicating the structure of your code. In R programming, it's common to use two spaces for each level of indentation. For example: if (x > y) {   z <- x + y } else {   z <- x - y } Column Margins Another important coding standard is to use consistent column margins. This means that you should avoid writing code that extends beyond a certain number of characters (often 80 or 100). Th...