Learning about data structures in R

How to use and manipulate data structures such as matrices, data frames, and vectors!

Last week, we posted a tutorial on the different types of data in R (check it out here). In this tutorial, we’re going to talk about the different structures that R provides to help you organize your data.

Data structures go hand-in-hand with data types, as both of these form the foundation for the work we do in R. You may have already worked with many of the structures that we describe in this blog post, but I wanted to take the time to describe them in depth and show you how they relate to or are different from one another.

Let’s jump in!

Image of file cabinets with text 'Data Structures in R'

The different data structures

R provides several data structures that we commonly use as ecologists:

  1. Vectors

  2. Lists

  3. Matrices

  4. Data frames

    4a. Tibbles

Vectors

Vectors are one of the most common data structures. You can create a vector using the function c(). c() combines all of its arguments into a vector like so:

# Create a vector
vec <- c("this", "is", "a", "vector")

# View the vector
vec
## [1] "this"   "is"     "a"      "vector"

You can create a vector using any data type (numeric, character, logical, etc). However, if you combine data types in a vector, R will force all elements to be the same type. The type that R chooses for the vector will be the most “flexible” data type. Data types in order from least to greatest flexibility are: logical, integer, numeric, and character. For example, in the vector below, I combined numbers and characters into one vector.

# Create a vector
ex <- c(1, "species", 10)

# View vector
class(ex)
## [1] "character"

When we check the data type of the vector, it says character because we can change 1 and 10 to be “1” and “10”, but we can’t change “species” into a number. What number would “species” represent?? So here, R has chosen the more flexible data type — characters.

You can also examine certain attributes of the vector such as length() (i.e., number of elements) or, if you have a character vector, number of characters in each element (nchar()).

# View vector
ex 
## [1] "1"       "species" "10"
# Length of vector
length(ex)
## [1] 3
# Number of characters
nchar(ex)
## [1] 1 7 2

Vector elements can also be given names. You do this by assigning a character vector to names(my.vector).

# Create a vector
crabs <- c(10, 15, 26)

# Give the vector names
names(crabs) <- c("Blue crab", "Mud crab", "Ghost crab")

# View named vector
crabs
##  Blue crab   Mud crab Ghost crab 
##         10         15         26

You can subset a vector by specifying the element number in square brackets. You could also subset a vector using the element name.

# Choose element number 3
crabs[3]
## Ghost crab 
##         26
# Choose element named "Ghost crab"
crabs["Ghost crab"]
## Ghost crab 
##         26

Lastly, you can view the structure of a vector using the str() function. This will tell us that the vector is a numeric vector with 3 elements: 10, 15, and 26. Below the vector, it also says that the attribute names for the vector is a character vector with the elements “Blue crab”, “Mud crab”, and “Ghost crab”.

str(crabs)
##  Named num [1:3] 10 15 26
##  - attr(*, "names")= chr [1:3] "Blue crab" "Mud crab" "Ghost crab"

Lists

Lists are similar to vectors, but are unique in that their elements do not all have to be the same type, and they can also be lists — in other words, it allows you to have vectors nested within other vectors.

To create a list, you use list() instead of c().

# Create a list
animals <- list(c("Eastern elliptio", "Diamondback terrapin", "Spring peeper", "American eel"),
                 c(25, 3, 0, 10),
                 "Maryland",
                 c(T, T, F, T))

# View the structure of the list
str(animals)
## List of 4
##  $ : chr [1:4] "Eastern elliptio" "Diamondback terrapin" "Spring peeper" "American eel"
##  $ : num [1:4] 25 3 0 10
##  $ : chr "Maryland"
##  $ : logi [1:4] TRUE TRUE FALSE TRUE

Here, my list contains a vector of animal names (character), a vector of numbers (integer), the U.S. state that these animals can be found in (character), and a logical vector. The vectors don’t all need to be the same length — the third element has only one value, “Maryland”, while all the other elements have a length of 4.

If we view the list, you’ll notice that each element is identified within double square brackets [[these]].

# View list
animals
## [[1]]
## [1] "Eastern elliptio"     "Diamondback terrapin" "Spring peeper"       
## [4] "American eel"        
## 
## [[2]]
## [1] 25  3  0 10
## 
## [[3]]
## [1] "Maryland"
## 
## [[4]]
## [1]  TRUE  TRUE FALSE  TRUE

You can subset elements of a list using double square brackets, and further subset that list element using single square brackets.

# View animal names (element 1 in the list)
animals[[1]]
## [1] "Eastern elliptio"     "Diamondback terrapin" "Spring peeper"       
## [4] "American eel"
# View the second animal name (element 2 of element 1 in the list)
animals[[1]][2]
## [1] "Diamondback terrapin"

As with vectors, you can give list elements names. Let’s create the same list that we did above, but give it some more descriptive names by writing name.of.element = element within the list() function. In the code below, I named the list elements “common.name”, “abundance”, “state”, and “presence”.

# Create a list
animals <- list(common.name = c("Eastern elliptio", "Diamondback terrapin", 
                                "Spring peeper", "American eel"),
                 abundance = c(25, 3, 0, 10),
                 state = "Maryland",
                 presence = c(T, T, F, T))

# View list
animals
## $common.name
## [1] "Eastern elliptio"     "Diamondback terrapin" "Spring peeper"       
## [4] "American eel"        
## 
## $abundance
## [1] 25  3  0 10
## 
## $state
## [1] "Maryland"
## 
## $presence
## [1]  TRUE  TRUE FALSE  TRUE

Now, instead of numbers inside of double square brackets, each element is identified by $name. You can still subset the list using the element number in square brackets, like this: [[1]], but you can also subset the list using this dollar sign notation:

# View whether the animals were present in our survey
animals$presence
## [1]  TRUE  TRUE FALSE  TRUE

Lists are really useful for storing lots of data, but it can get confusing if you have several lists nested in other lists. Naming your elements can help you keep things straight when subsetting your data.

Matrices

The next data structure I want to introduce is the matrix. Matrices are two-dimensional, rectangular objects that must contain elements of the same type, like a vector. These are most useful for mathematical operations, but are also common with species abundance/site data where column names are the species or sites and the rows are the other one. The cell values are the abundance of each species at every species x site combination — useful for multivariate analyses.

You can create matrices using matrix(data = your.data, nrow = num.rows, ncol = num.cols, byrow = T/F, dimnames = your.names).

data accepts a vector of the data you want to use. nrow is the number of rows you want in your matrix, while ncol is the number of columns you want. The byrow argument can be set to TRUE or FALSE depending on whether you want the matrix to fill your table by rows or by columns, though the default is FALSE. dimnames accepts a list of 2 elements that specifies names for the rows and columns of your matrix.

The byrow argument is best understood through demonstration:

# Create a matrix that is filled by rows
m1 <- matrix(data = 1:12, nrow = 4, ncol = 3, byrow = T)
m1
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
## [4,]   10   11   12
# Create a matrix that is filled by columns
m2 <- matrix(data = 1:12, nrow = 4, ncol = 3, byrow = F)
m2
##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12

You can see that the first bit of code fills in the table row by row — it fills it in from left to right, then moves down. The second chunk of code fills in the table by columns — it fills it in from top to bottom, then moves to the right.

You can access matrix elements using single square brackets where the first number represents the row, while the second represents the column. So m1[2,3] would access the element in the 2nd row and 3rd column. You could also type m1[2, ], leaving the column space blank. This will return the entire 2nd row of the matrix. Inversely, you could type m1[ , 3], which leaves the row space blank and returns the entire 3rd column of the matrix. Let’s see these in action.

# Return element in 2nd row, 3rd column
m1[2,3]
## [1] 6
# Return 2nd row
m1[2, ]
## [1] 4 5 6
# Return 3rd column
m1[ , 3]
## [1]  3  6  9 12

We can also look at the number of rows and columns of a matrix by using nrow() and ncol(); these functions are analogous to the length() function that we used for vectors. Alternatively, we can use dim(), which will tell us both the number of rows and columns.

# Number of rows
nrow(m1)
## [1] 4
# Number of columns
ncol(m1)
## [1] 3
# View matrix dimensions
dim(m1)
## [1] 4 3

Data frames

Data frames are the most common way to store and display tabular data in R and are the standard format for applying any analyses to your data. Like matrices, these are two-dimensional objects with rows and columns. But data frames are also like lists, in that you can have elements of several types within them. In fact, a data frame is a type of list where each list element has the same length (this is what makes them rectangular / tabular).

You have likely encountered data frames before, for example when importing data into R using functions such as read.csv().

You can create a data frame using the function data.frame(col1 = vector1, col2 = vector2, etc.), where each vector should be the same length. You could also have a vector of length 1 or a length that is a divisor of the other vector lengths — this shorter vector will then get recycled until it reaches the length of the other columns.

In the code below, I created a data frame of species, whether or not they were present, and their abundance. Each column consists of different data types. The 1st column is a character vector, the 2nd is logical, and the 3rd is numeric. This is really useful and allows us to store much more information than in a matrix.

# Create a data frame
species_dat <- data.frame(species = c("Callinectes sapidus", 
                                      "Sciaenops ocellatus",
                                      "Anchoa mitchilli",
                                      "Micropognias undulatus",
                                      "Menidia menidia"),
                          presence = c(T, F, T, F, T),
                          abundance = c(2, 0, 10, 0, 9))

# View data frame
species_dat
##                  species presence abundance
## 1    Callinectes sapidus     TRUE         2
## 2    Sciaenops ocellatus    FALSE         0
## 3       Anchoa mitchilli     TRUE        10
## 4 Micropognias undulatus    FALSE         0
## 5        Menidia menidia     TRUE         9

You also have the option to add an argument row.names = c("vector", "of", "names", "for", "rows"), though adding row.names is less common for data frames.

As with matrices, you can view number of rows and columns using nrow(my.dataframe) or ncol(my.dataframe), or use dim(my.dataframe) to view the full dimensions.

And like matrices, you can subset your data frame into its rows or columns using single square brackets: my.dataframe[row.num, col.num].

# View the third item in the first column
species_dat[3, 1]
## [1] "Anchoa mitchilli"
# View the first column
species_dat[ , 1]
## [1] "Callinectes sapidus"    "Sciaenops ocellatus"    "Anchoa mitchilli"      
## [4] "Micropognias undulatus" "Menidia menidia"
# View the third row
species_dat[3, ]
##            species presence abundance
## 3 Anchoa mitchilli     TRUE        10

Alternatively, you can subset your data frame in the same way as lists, by using the dollar sign symbol or double square brackets. Each column is essentially a list element, so you can easily choose a data frame column using my.dataframe$col.name.

# View the abundance column in three different ways
species_dat$abundance
## [1]  2  0 10  0  9
species_dat[[3]]
## [1]  2  0 10  0  9
species_dat[["abundance"]]
## [1]  2  0 10  0  9

The function str() is also useful. It shows you the structure of your data frame. This will tell you the number of rows and columns in your data frame and will tell you the data types of each column.

# View structure
str(species_dat)
## 'data.frame':	5 obs. of  3 variables:
##  $ species  : chr  "Callinectes sapidus" "Sciaenops ocellatus" "Anchoa mitchilli" "Micropognias undulatus" ...
##  $ presence : logi  TRUE FALSE TRUE FALSE TRUE
##  $ abundance: num  2 0 10 0 9

These are a few functions that are very useful for getting to know your data frames.

  • head() or tail() to view the first 6 or last 6 rows of your data frame

  • dim(), nrow(), or ncol() to view the number of rows or columns (or both!) of your data frame

  • rownames() or colnames() to view or set the row or column names of your data frame. Note that just names() will also give you the column names of a data frame.

  • str() to view the structure of your data frame

As you can see, data frames are very useful for organizing complex, multi-attribute data sets that contain data of different types. No wonder we use them so often!

Tibbles

I added in tibbles as a side data structure — even though it isn’t an official data structure in R, it’s something that comes up often if you use the tidyverse set of packages. Tibbles come with the tibble package (which comes with the tidyverse) and are basically data frames with a few added benefits!

Functionally, tibbles are the same as data frames when you manipulate them. They can do everything that data frames can do, but they have slightly different properties that make them more convenient. In fact, ‘tibble’ stands for ‘tidy table’ :) Let’s find out what makes tibbles different.

First, let’s load up the tidyverse set of packages.

library(tidyverse)

To create a tibble, all you have to do is use the function tibble(), which works the same way as the function data.frame(). When you’re creating a tibble, you can only use vectors that are either all the same length, or have length of 1. The vector with a length of 1 will just be recycled until it fills all of the rows in its column. Tibbles also don’t use row.names(), which keeps things simpler.

Let’s create the same species table that we did earlier, but this time as a tibble.

# Create a tibble
species_dat <- tibble(species = c("Callinectes sapidus", 
                                      "Sciaenops ocellatus",
                                      "Anchoa mitchilli",
                                      "Micropognias undulatus",
                                      "Menidia menidia"),
                          presence = c(T, F, T, F, T),
                          abundance = c(2, 0, 10, 0, 9))

# View the tibble and the class
species_dat
## # A tibble: 5 × 3
##   species                presence abundance
##   <chr>                  <lgl>        <dbl>
## 1 Callinectes sapidus    TRUE             2
## 2 Sciaenops ocellatus    FALSE            0
## 3 Anchoa mitchilli       TRUE            10
## 4 Micropognias undulatus FALSE            0
## 5 Menidia menidia        TRUE             9

When we print the tibble, it clearly tells us that it’s a tibble. It also tells us the table dimensions and the column names and data types.

You might be thinking: okay…and? The tibble doesn’t look that different from the data frame we originally created.

Let’s try another example.

This time, let’s load up an example data set that comes with the ggplot2 package. This data set is called msleep, and describes the sleep times and brain weights of several different types of mammals. This data set already comes as a tibble, so let’s turn it into a data frame for the purposes of demonstration, using the as.data.frame() function.

# Load data
data("msleep")

# Turn data into class data frame
msleep <- as.data.frame(msleep)

# View data
msleep
##                              name         genus    vore
## 1                         Cheetah      Acinonyx   carni
## 2                      Owl monkey         Aotus    omni
## 3                 Mountain beaver    Aplodontia   herbi
## 4      Greater short-tailed shrew       Blarina    omni
## 5                             Cow           Bos   herbi
## 6                Three-toed sloth      Bradypus   herbi
## 7               Northern fur seal   Callorhinus   carni
## 8                    Vesper mouse       Calomys    <NA>
## 9                             Dog         Canis   carni
## 10                       Roe deer     Capreolus   herbi
## 11                           Goat         Capri   herbi
## 12                     Guinea pig         Cavis   herbi
## 13                         Grivet Cercopithecus    omni
## 14                     Chinchilla    Chinchilla   herbi
## 15                Star-nosed mole     Condylura    omni
## 16      African giant pouched rat    Cricetomys    omni
## 17      Lesser short-tailed shrew     Cryptotis    omni
## 18           Long-nosed armadillo       Dasypus   carni
## 19                     Tree hyrax   Dendrohyrax   herbi
## 20         North American Opossum     Didelphis    omni
## 21                 Asian elephant       Elephas   herbi
## 22                  Big brown bat     Eptesicus insecti
## 23                          Horse         Equus   herbi
## 24                         Donkey         Equus   herbi
## 25              European hedgehog     Erinaceus    omni
## 26                   Patas monkey  Erythrocebus    omni
## 27      Western american chipmunk      Eutamias   herbi
## 28                   Domestic cat         Felis   carni
## 29                         Galago        Galago    omni
## 30                        Giraffe       Giraffa   herbi
## 31                    Pilot whale Globicephalus   carni
## 32                      Gray seal  Haliochoerus   carni
## 33                     Gray hyrax   Heterohyrax   herbi
## 34                          Human          Homo    omni
## 35                 Mongoose lemur         Lemur   herbi
## 36               African elephant     Loxodonta   herbi
## 37           Thick-tailed opposum    Lutreolina   carni
## 38                        Macaque        Macaca    omni
## 39               Mongolian gerbil      Meriones   herbi
## 40                 Golden hamster  Mesocricetus   herbi
## 41                          Vole       Microtus   herbi
## 42                    House mouse           Mus   herbi
## 43               Little brown bat        Myotis insecti
## 44           Round-tailed muskrat      Neofiber   herbi
## 45                     Slow loris     Nyctibeus   carni
## 46                           Degu       Octodon   herbi
## 47     Northern grasshopper mouse     Onychomys   carni
## 48                         Rabbit   Oryctolagus   herbi
## 49                          Sheep          Ovis   herbi
## 50                     Chimpanzee           Pan    omni
## 51                          Tiger      Panthera   carni
## 52                         Jaguar      Panthera   carni
## 53                           Lion      Panthera   carni
## 54                         Baboon         Papio    omni
## 55                Desert hedgehog   Paraechinus    <NA>
## 56                          Potto  Perodicticus    omni
## 57                     Deer mouse    Peromyscus    <NA>
## 58                      Phalanger     Phalanger    <NA>
## 59                   Caspian seal         Phoca   carni
## 60                Common porpoise      Phocoena   carni
## 61                        Potoroo      Potorous   herbi
## 62                Giant armadillo    Priodontes insecti
## 63                     Rock hyrax      Procavia    <NA>
## 64                 Laboratory rat        Rattus   herbi
## 65          African striped mouse     Rhabdomys    omni
## 66                Squirrel monkey       Saimiri    omni
## 67          Eastern american mole      Scalopus insecti
## 68                     Cotton rat      Sigmodon   herbi
## 69                       Mole rat        Spalax    <NA>
## 70         Arctic ground squirrel  Spermophilus   herbi
## 71 Thirteen-lined ground squirrel  Spermophilus   herbi
## 72 Golden-mantled ground squirrel  Spermophilus   herbi
## 73                     Musk shrew        Suncus    <NA>
## 74                            Pig           Sus    omni
## 75            Short-nosed echidna  Tachyglossus insecti
## 76      Eastern american chipmunk        Tamias   herbi
## 77                Brazilian tapir       Tapirus   herbi
## 78                         Tenrec        Tenrec    omni
## 79                     Tree shrew        Tupaia    omni
## 80           Bottle-nosed dolphin      Tursiops   carni
## 81                          Genet       Genetta   carni
## 82                     Arctic fox        Vulpes   carni
## 83                        Red fox        Vulpes   carni
##              order conservation sleep_total sleep_rem
## 1        Carnivora           lc        12.1        NA
## 2         Primates         <NA>        17.0       1.8
## 3         Rodentia           nt        14.4       2.4
## 4     Soricomorpha           lc        14.9       2.3
## 5     Artiodactyla domesticated         4.0       0.7
## 6           Pilosa         <NA>        14.4       2.2
## 7        Carnivora           vu         8.7       1.4
## 8         Rodentia         <NA>         7.0        NA
## 9        Carnivora domesticated        10.1       2.9
## 10    Artiodactyla           lc         3.0        NA
## 11    Artiodactyla           lc         5.3       0.6
## 12        Rodentia domesticated         9.4       0.8
## 13        Primates           lc        10.0       0.7
## 14        Rodentia domesticated        12.5       1.5
## 15    Soricomorpha           lc        10.3       2.2
## 16        Rodentia         <NA>         8.3       2.0
## 17    Soricomorpha           lc         9.1       1.4
## 18       Cingulata           lc        17.4       3.1
## 19      Hyracoidea           lc         5.3       0.5
## 20 Didelphimorphia           lc        18.0       4.9
## 21     Proboscidea           en         3.9        NA
## 22      Chiroptera           lc        19.7       3.9
## 23  Perissodactyla domesticated         2.9       0.6
## 24  Perissodactyla domesticated         3.1       0.4
## 25  Erinaceomorpha           lc        10.1       3.5
## 26        Primates           lc        10.9       1.1
## 27        Rodentia         <NA>        14.9        NA
## 28       Carnivora domesticated        12.5       3.2
## 29        Primates         <NA>         9.8       1.1
## 30    Artiodactyla           cd         1.9       0.4
## 31         Cetacea           cd         2.7       0.1
## 32       Carnivora           lc         6.2       1.5
## 33      Hyracoidea           lc         6.3       0.6
## 34        Primates         <NA>         8.0       1.9
## 35        Primates           vu         9.5       0.9
## 36     Proboscidea           vu         3.3        NA
## 37 Didelphimorphia           lc        19.4       6.6
## 38        Primates         <NA>        10.1       1.2
## 39        Rodentia           lc        14.2       1.9
## 40        Rodentia           en        14.3       3.1
## 41        Rodentia         <NA>        12.8        NA
## 42        Rodentia           nt        12.5       1.4
## 43      Chiroptera         <NA>        19.9       2.0
## 44        Rodentia           nt        14.6        NA
## 45        Primates         <NA>        11.0        NA
## 46        Rodentia           lc         7.7       0.9
## 47        Rodentia           lc        14.5        NA
## 48      Lagomorpha domesticated         8.4       0.9
## 49    Artiodactyla domesticated         3.8       0.6
## 50        Primates         <NA>         9.7       1.4
## 51       Carnivora           en        15.8        NA
## 52       Carnivora           nt        10.4        NA
## 53       Carnivora           vu        13.5        NA
## 54        Primates         <NA>         9.4       1.0
## 55  Erinaceomorpha           lc        10.3       2.7
## 56        Primates           lc        11.0        NA
## 57        Rodentia         <NA>        11.5        NA
## 58   Diprotodontia         <NA>        13.7       1.8
## 59       Carnivora           vu         3.5       0.4
## 60         Cetacea           vu         5.6        NA
## 61   Diprotodontia         <NA>        11.1       1.5
## 62       Cingulata           en        18.1       6.1
## 63      Hyracoidea           lc         5.4       0.5
## 64        Rodentia           lc        13.0       2.4
## 65        Rodentia         <NA>         8.7        NA
## 66        Primates         <NA>         9.6       1.4
## 67    Soricomorpha           lc         8.4       2.1
## 68        Rodentia         <NA>        11.3       1.1
## 69        Rodentia         <NA>        10.6       2.4
## 70        Rodentia           lc        16.6        NA
## 71        Rodentia           lc        13.8       3.4
## 72        Rodentia           lc        15.9       3.0
## 73    Soricomorpha         <NA>        12.8       2.0
## 74    Artiodactyla domesticated         9.1       2.4
## 75     Monotremata         <NA>         8.6        NA
## 76        Rodentia         <NA>        15.8        NA
## 77  Perissodactyla           vu         4.4       1.0
## 78    Afrosoricida         <NA>        15.6       2.3
## 79      Scandentia         <NA>         8.9       2.6
## 80         Cetacea         <NA>         5.2        NA
## 81       Carnivora         <NA>         6.3       1.3
## 82       Carnivora         <NA>        12.5        NA
## 83       Carnivora         <NA>         9.8       2.4
##    sleep_cycle awake brainwt   bodywt
## 1           NA 11.90      NA   50.000
## 2           NA  7.00 0.01550    0.480
## 3           NA  9.60      NA    1.350
## 4    0.1333333  9.10 0.00029    0.019
## 5    0.6666667 20.00 0.42300  600.000
## 6    0.7666667  9.60      NA    3.850
## 7    0.3833333 15.30      NA   20.490
## 8           NA 17.00      NA    0.045
## 9    0.3333333 13.90 0.07000   14.000
## 10          NA 21.00 0.09820   14.800
## 11          NA 18.70 0.11500   33.500
## 12   0.2166667 14.60 0.00550    0.728
## 13          NA 14.00      NA    4.750
## 14   0.1166667 11.50 0.00640    0.420
## 15          NA 13.70 0.00100    0.060
## 16          NA 15.70 0.00660    1.000
## 17   0.1500000 14.90 0.00014    0.005
## 18   0.3833333  6.60 0.01080    3.500
## 19          NA 18.70 0.01230    2.950
## 20   0.3333333  6.00 0.00630    1.700
## 21          NA 20.10 4.60300 2547.000
## 22   0.1166667  4.30 0.00030    0.023
## 23   1.0000000 21.10 0.65500  521.000
## 24          NA 20.90 0.41900  187.000
## 25   0.2833333 13.90 0.00350    0.770
## 26          NA 13.10 0.11500   10.000
## 27          NA  9.10      NA    0.071
## 28   0.4166667 11.50 0.02560    3.300
## 29   0.5500000 14.20 0.00500    0.200
## 30          NA 22.10      NA  899.995
## 31          NA 21.35      NA  800.000
## 32          NA 17.80 0.32500   85.000
## 33          NA 17.70 0.01227    2.625
## 34   1.5000000 16.00 1.32000   62.000
## 35          NA 14.50      NA    1.670
## 36          NA 20.70 5.71200 6654.000
## 37          NA  4.60      NA    0.370
## 38   0.7500000 13.90 0.17900    6.800
## 39          NA  9.80      NA    0.053
## 40   0.2000000  9.70 0.00100    0.120
## 41          NA 11.20      NA    0.035
## 42   0.1833333 11.50 0.00040    0.022
## 43   0.2000000  4.10 0.00025    0.010
## 44          NA  9.40      NA    0.266
## 45          NA 13.00 0.01250    1.400
## 46          NA 16.30      NA    0.210
## 47          NA  9.50      NA    0.028
## 48   0.4166667 15.60 0.01210    2.500
## 49          NA 20.20 0.17500   55.500
## 50   1.4166667 14.30 0.44000   52.200
## 51          NA  8.20      NA  162.564
## 52          NA 13.60 0.15700  100.000
## 53          NA 10.50      NA  161.499
## 54   0.6666667 14.60 0.18000   25.235
## 55          NA 13.70 0.00240    0.550
## 56          NA 13.00      NA    1.100
## 57          NA 12.50      NA    0.021
## 58          NA 10.30 0.01140    1.620
## 59          NA 20.50      NA   86.000
## 60          NA 18.45      NA   53.180
## 61          NA 12.90      NA    1.100
## 62          NA  5.90 0.08100   60.000
## 63          NA 18.60 0.02100    3.600
## 64   0.1833333 11.00 0.00190    0.320
## 65          NA 15.30      NA    0.044
## 66          NA 14.40 0.02000    0.743
## 67   0.1666667 15.60 0.00120    0.075
## 68   0.1500000 12.70 0.00118    0.148
## 69          NA 13.40 0.00300    0.122
## 70          NA  7.40 0.00570    0.920
## 71   0.2166667 10.20 0.00400    0.101
## 72          NA  8.10      NA    0.205
## 73   0.1833333 11.20 0.00033    0.048
## 74   0.5000000 14.90 0.18000   86.250
## 75          NA 15.40 0.02500    4.500
## 76          NA  8.20      NA    0.112
## 77   0.9000000 19.60 0.16900  207.501
## 78          NA  8.40 0.00260    0.900
## 79   0.2333333 15.10 0.00250    0.104
## 80          NA 18.80      NA  173.330
## 81          NA 17.70 0.01750    2.000
## 82          NA 11.50 0.04450    3.380
## 83   0.3500000 14.20 0.05040    4.230

Okay, wow. When we print the data frame it’s pretty overwhelming. Printing the data frame shows us all of our rows and columns. And because our columns don’t all fit on one row, they have to be carried over and added as extra rows, making the printed output even longer. This is a very messy and confusing way to view our data.

Let’s turn the data back into a tibble using the as_tibble() function, and let’s see what that looks like.

# Turn data into a tibble
msleep <- as_tibble(msleep)

# View data
msleep
## # A tibble: 83 × 11
##    name        genus   vore  order  conservation sleep_total
##    <chr>       <chr>   <chr> <chr>  <chr>              <dbl>
##  1 Cheetah     Acinon… carni Carni… lc                  12.1
##  2 Owl monkey  Aotus   omni  Prima… <NA>                17  
##  3 Mountain b… Aplodo… herbi Roden… nt                  14.4
##  4 Greater sh… Blarina omni  Soric… lc                  14.9
##  5 Cow         Bos     herbi Artio… domesticated         4  
##  6 Three-toed… Bradyp… herbi Pilosa <NA>                14.4
##  7 Northern f… Callor… carni Carni… vu                   8.7
##  8 Vesper mou… Calomys <NA>  Roden… <NA>                 7  
##  9 Dog         Canis   carni Carni… domesticated        10.1
## 10 Roe deer    Capreo… herbi Artio… lc                   3  
## # … with 73 more rows, and 5 more variables:
## #   sleep_rem <dbl>, sleep_cycle <dbl>, awake <dbl>,
## #   brainwt <dbl>, bodywt <dbl>

The printed tibble is much neater than the printed data frame! Although there are ways to print data frames more neatly, tibbles are automatically formatted so that the columns are abbreviated to fit on one row (or are not printed), and you only see the first ten rows of data instead of every single row. This makes it way more convenient to view your data sets.

Tibbles also reduce errors when subsetting your data. For example, when subsetting with single square brackets [ ], tibbles always return another tibble. In contrast, subsetting data frames will sometimes return a vector instead of another data frame.

And if you try to subset a tibble using a column that does not exist, you’ll receive a warning that the column does not exist. In contrast, subsetting a data frame using a column that doesn’t exist will only return NULL, and you don’t receive an explanation of why.

# See if msleep (the tibble) has a column called "abc"
msleep$abc
## Warning: Unknown or uninitialised column: `abc`.
## NULL
# Turn msleep into a data frame
msleep <- as.data.frame(msleep)

# See if msleep (the data frame) has a column called "abc"
msleep$abc
## NULL

One other advantage to tibbles is that they allow your column names to have spaces. Normally you wouldn’t go out of your way to add spaces to your column names since it’s much better practice to use underscores “_” in place of spaces to begin with. However, sometimes the data you upload into R will contain spaces in the column names. While regular data frames replace spaces with periods “.”, tibbles maintain the original column names surrounded by back ticks (also known as the acute or left quote, it’s the apostrophe-like thing usually located above your left tab key and with the tilde ‘~’ on your keyboard). When uploading data into R, you can upload directly as a tibble and ensure all column names are maintained as they were in the original CSV by using read_csv() (note the underscore between ‘read’ and ‘csv’ versus of the function “read.csv()”, which reads in your data as a data frame).

In short, tibbles make a number of changes to normal data frames that can help reduce errors in your data analysis. These improvements in printing and subsetting are small, but useful!

And that’s it for our blog post on data structures in R! I hope this post taught you a few useful tips and tricks for working with your data. Happy coding!



If you enjoyed this tutorial and want learn more about data frames and tibbles, and how to use them, you can check out Luka Negoita's full course on the complete basics of R for ecology here:

Also be sure to check out R-bloggers for other great tutorials on learning R

Related