How to make a boxplot in R

The complete beginner’s tutorial on boxplots in R

In this tutorial, I’m going to show you how to plot and customize boxplots (also known as box and whisker plots). Boxplots are a common type of graph that allow you to look at the relationships between a continuous variable and various categorical groups. They are super common in ecology because we often need to compare values between different categories.

An example boxplot showing all the different elements including color, axis labels, line type and weight, and boxplot orientation.

BTW, you can also follow along with a video tutorial of this blog post if you click on the image below:

Video thumbnail for how to make a boxplot

For this tutorial, we’re going to use the built-in R dataset PlantGrowth, which might seem familiar to you because we used it in a few other data visualization tutorials.

To refresh your memory, PlantGrowth has 30 rows and two columns. The “weight” column represents the dry biomass of each plant in grams, while the “group” column describes the experimental treatment that each plant was given.

# Load the data
data(PlantGrowth)

# View the data
head(PlantGrowth)
##   weight group
## 1   4.17  ctrl
## 2   5.58  ctrl
## 3   5.18  ctrl
## 4   6.11  ctrl
## 5   4.50  ctrl
## 6   4.61  ctrl

Let’s say we want to compare the weight of plants among the different treatments. A boxplot is perfect for this type of visualization.

We’ve already learned about the plot() function in our earlier scatterplot tutorial (see our previous blog post). Something neat about plot() is that if the X axis is a categorical variable, the function will recognize that and will automatically graph a boxplot for you instead of a scatterplot.

If we look at the levels in the “group” column, we can see that “group” is indeed a categorical variable, with three different levels:

# Look at the levels of the "group" column
levels(PlantGrowth$group)
## [1] "ctrl" "trt1" "trt2"

So if we plot weight as a function of group (y as a function of x), we should get a boxplot.

# Make a boxplot of weight as a function of treatment group
plot(weight ~ group, data = PlantGrowth)

Awesome! We can see plant weight across the three different treatment groups, allowing us to easily compare groups.

Boxplot components

Now, let’s quickly go over the components of a box plot.

  • The solid black line in the middle of each box represents the median of the data.
  • The grey box represents the “interquartile range” (IQR) of your data, or the range between the 1st and 3rd quartiles. Values below the 1st quartile represent the lowest 25% of your data points, while values above the 3rd quartile represent the highest 25% of your data. The interquartile range contains the middle 50% of your data points.
  • The “whiskers” of a box and whisker plot are the dotted lines outside of the grey box. These end at the minimum and maximum values of your data set, excluding outliers.
  • Sometimes, you will have outliers in your data that are shown as points in the plot. Outliers are points that are more than (1.5 * IQR) below the 1st quartile or above the 3rd quartile.

Image showing the different components of a boxplot.

Quick note about the Min and Max whiskers: The maximum and minimum whisker markers (the staples or “T"s) only indicate the maximum or minimum of the data if the 3rd quartile + 1.5 x IQR exceeds the maximum value or 1st quartile - 1.5 x IQR exceeds the minimum value, respectively. In other words, the whiskers exclude outliers, which are all points greater than 1.5 x IQR above or below the 3rd or 1st quartiles.

Modifying the axes

Now that we understand all the parts of a boxplot, let’s play around with the different components of the plot, starting with the axes. Customizing the axes is the same as for scatterplots, where we’ll use the arguments xlab and ylab to change the axis labels.

# Adding axis labels
plot(weight ~ group,
     data = PlantGrowth, 
     xlab = "Treatment Group", 
     ylab = "Dried Biomass Weight (g)")

Great, now we have axis labels! But the individual treatment group labels on our X axis are still worded pretty vaguely. To change this, let’s actually go back to our data. Let’s change “ctrl” to “Control”, “trt1” to “High light”, and “trt2” to “Low light”.

# Look at the levels of the group column
levels(PlantGrowth$group)
## [1] "ctrl" "trt1" "trt2"
# Change the names of the treatments in the data set itself
levels(PlantGrowth$group) <- c("Control", "High light", "Low light")

# View the group column again
PlantGrowth$group
##  [1] Control    Control    Control    Control    Control    Control   
##  [7] Control    Control    Control    Control    High light High light
## [13] High light High light High light High light High light High light
## [19] High light High light Low light  Low light  Low light  Low light 
## [25] Low light  Low light  Low light  Low light  Low light  Low light 
## Levels: Control High light Low light

Now that we’ve changed the names of our treatments, let’s run the plot again.

plot(weight ~ group,
     data = PlantGrowth,
     xlab = "Treatment Group", 
     ylab = "Dried Biomass Weight (g)")

Modifying the boxes and whiskers

Our plot is looking pretty good so far. Now let’s see how we can change the appearance of the boxes and whiskers. We can do this using the col argument, which accepts any color name or hex code in quotes. You can also set col to any number, which represents a predetermined color.

plot(weight ~ group,
     data = PlantGrowth, 
     xlab = "Treatment Group", 
     ylab = "Dried Biomass Weight (g)",
     col = 4) # or something like "blue" or a hex code like "#f234f9"

It can be fun to use colors, but it’s data visualization best-practice to keep your figures black and white (or grey-scale) unless you need to use colors to signify something in particular. Note that in the case of our figure, there isn’t really a reason to change the color of the boxes except for the purposes of demonstration here.

We can also change the appearance of the boxes' borders using boxlty, which stands for “box line type”. This argument can accept integers, which represent different line types. 1 corresponds to a normal line, 2 corresponds to a dashed line, and 0 corresponds to no line. You can test out other numbers, too! For now, let’s get rid of the box borders.

plot(weight ~ group,
     data = PlantGrowth, 
     xlab = "Treatment Group",
     ylab = "Dried Biomass Weight (g)",
     col = 4,
     boxlty = 0)

To change the whisker line type, you can use the argument whisklty, which works the same way as boxlty. You can also change whisker line thickness using whisklwd.

plot(weight ~ group,
     data = PlantGrowth, 
     xlab = "Treatment Group",
     ylab = "Dried Biomass Weight (g)",
     col = 4,
     boxlty = 0,
     whisklty = 3,
     whisklwd = 1.5)

Lastly, you can change the line thickness of the ends of the whiskers (these are called staples) using the staplelwd argument.

plot(weight ~ group,
     data = PlantGrowth, 
     xlab = "Treatment Group",
     ylab = "Dried Biomass Weight (g)",
     col = 4,
     boxlty = 0,
     whisklty = 3,
     whisklwd = 1.5,
     staplelwd = 1.5)

You’ll notice that the arguments boxlty and whisklty seem similar, and that whisklwd and staplelwd also seem similar. You might have already figured out that to change the different plot components and their attributes, you can just mix and match box, whisk, and staple with lty, lwd, and col (which changes the color).

Changing the boxplot orientation

The last thing you can modify is the orientation of the boxplot. Right now, the boxes and whiskers are oriented vertically. If you want them to be horizontal, you can just add the argument horizontal = TRUE. This can be especially helpful if you have a lot of groups that you want to compare.

plot(weight ~ group,
     data = PlantGrowth, 
     xlab = "Treatment Group",
     ylab = "Dried Biomass Weight (g)",
     col = 4,
     boxlty = 0,
     whisklty = 3,
     whisklwd = 1.5,
     staplelwd = 1.5,
     horizontal = TRUE)

And that’s it! Now we have a good-looking boxplot. In this tutorial I went over what the different parts of a boxplot mean, as well as how to modify the axes, the boxes and whiskers, and the orientation of the plot.

I hope you enjoyed this post! If you liked this and want learn more, you can check out my full course on the complete basics of R for ecology right here or by clicking the link below.



Check out my full course the Basics of R (for ecologists) here:

Also be sure to check out R-bloggers for other great tutorials on learning R

Related