# R Data types 101, or What kind of data do I have?

How to identify and convert between different data types in R

Most of us are pretty familiar with data types in our daily lives — we can easily tell that things like 1, 2, 3, and 4 are numbers (in this case, integers). 15.7 is still a number, but has a decimal. We know that every single word I’m typing in this sentence is composed of characters, and we know that in math, “true” and “false” are the answers to logical statements.

Just as we do in our heads, R also categorizes our data into different classes. These categories are similar to the real-life ones I described above, but can be a little different in terms of syntax and things to watch out for in your code.

To work in R and perform data analyses, you’ll need to have a solid understanding of data types. In this tutorial, I’m going to introduce several different types of data, explain how to use and manipulate each of them, and show you how to check what type of data you have. Let’s dive in.

## Types of data

There are five main types of data in R that you’d come across as an ecologist. I’ll discuss all of them below except complex numbers, which are rarely used for data analysis in R.

**Numeric**(`1.2, 5, 7, 3.14159`

)**Integer**(`1, 2, 3, 4, 5`

)**Complex**(`i + 4`

)**Logical**(`TRUE / FALSE`

)**Character**(`"a", "apple"`

)

I’m also going to discuss a sixth, related category that helps you work with categorical variables:

**Factor**

### Numeric

Numeric data types are pretty straightforward. These are just numbers, written as either integers or decimals. We can check if our vector is numeric by using the function `is.numeric()`

.

```
# Create a numeric vector
x <- c(3, 5, 6, 10.7)
# Is our vector numeric? Yes!
is.numeric(x)
```

```
## [1] TRUE
```

We can check our data type by using the functions `class()`

and `typeof()`

. `class()`

tells us that we’re working with numeric values, while `typeof()`

is more specific and tells us we’re working with doubles (i.e., numbers with decimals).

```
# Check the type of data class we have
class(x)
```

```
## [1] "numeric"
```

```
# Check the specific type of data that you have
typeof(x)
```

```
## [1] "double"
```

You can, of course, perform mathematical operations with numeric values.

```
# Add 4 to all the values in the vector
x + 4
```

```
## [1] 7.0 9.0 10.0 14.7
```

### Integer

You can also do math with integers, which represent numbers without decimal places. These are usually used if you’re counting something — for example, you can observe 7 butterflies in a plot, but you can’t observe 7.2 butterflies (or at least I hope not!).

If you create a vector manually and don’t have any decimal values, R will still identify your vector as the class “numeric”.

```
# Create a vector with only integers
x <- c(1, 4, 2, 7, 8)
# Look at the class
class(x)
```

```
## [1] "numeric"
```

You can change this vector to be an integer by using the function `as.integer()`

.

```
# Change the vector class
x <- as.integer(x)
# Look at the class
class(x)
```

```
## [1] "integer"
```

Alternatively, you can generate an integer vector like this. The “L” after each number tells R that you want it to be an integer.

```
# Create an integer vector
x <- c(1L, 2L, 5L, 3L, 10L)
# View vector
x
```

```
## [1] 1 2 5 3 10
```

```
# View class
class(x)
```

```
## [1] "integer"
```

You could also create an integer vector like this. The colon (`:`

) tells R to generate a sequence of vectors from 1 to 10, going up by 1 each time.

```
# Create a sequence of integers
x <- c(1:10)
# View vector
x
```

```
## [1] 1 2 3 4 5 6 7 8 9 10
```

```
# View data class
class(x)
```

```
## [1] "integer"
```

Some functions will also automatically generate integer vectors, like the function `sample()`

. This function randomly samples a certain number of integer values within a specified range. I asked `sample()`

to choose ten values between 1 and 10.

```
# Create a random sequence of integers from 1 to 10:
set.seed(123) # use set.seed to get the same random values as me
x <- sample(1:10, 10)
# View vector
x
```

```
## [1] 3 10 2 8 6 9 1 7 5 4
```

```
# View data class
class(x)
```

```
## [1] "integer"
```

### Complex

I’m not going to discuss this one because complex numbers aren’t used much in R for data analysis, though they exist. These are just numbers with real and imaginary components (containing the number *i*, or the square root of -1).

### Character

Characters are another common data type. These are used to store text in R (also called “strings”). To indicate something is a character, we put quotation marks around it `""`

.

```
# Create a vector of characters
x <- c("These", "are", "characters")
# View class
class(x)
```

```
## [1] "character"
```

Putting quotation marks around numbers will also turn them into characters, which can get confusing.

```
# Create a vector of characters
x <- c("1", "4", "5", "7", "8")
# View vector
x
```

```
## [1] "1" "4" "5" "7" "8"
```

You can’t do math with a vector of numbers that are classed as characters.

```
# Try to do math
mean(x)
```

```
## Warning in mean.default(x): argument is not numeric or logical: returning NA
```

```
## [1] NA
```

Why? Because R views them as text!

```
# View class
class(x)
```

```
## [1] "character"
```

You can turn this character vector of numbers into a numeric vector using the `as.numeric()`

function.

`as.numeric()`

is one way to resolve that issue. Any values that were character will be converted to `NA`

s. In that scenario you’ll probably want to go back and fix your raw CSV file, but at least now the NAs will help you find where the problem was.```
# Turn it into a numeric vector
x <- as.numeric(x)
# View vector
x
```

```
## [1] 1 4 5 7 8
```

```
# View class
class(x)
```

```
## [1] "numeric"
```

And then you can turn it back into a character using `as.character()`

.

```
# Turn it back into a character
x <- as.character(x)
# View vector
x
```

```
## [1] "1" "4" "5" "7" "8"
```

```
# View class
class(x)
```

```
## [1] "character"
```

### Logical

The logical class is represented by only two possible values: `TRUE`

or `FALSE`

(also can be written `T`

/ `F`

, but never `true`

/ `false`

or `t`

/ `f`

).

These values result from any logical statements that are made. For example, in the code below I asked R if the elements of my vector were greater than 5. This returns a logical vector where each element is either `TRUE`

or `FALSE`

.

```
# Create a vector
x <- c(1, 5, 6, 7, 2, 8)
# Are the elements of vector x greater than 5? Store results in vector y
y <- x > 5
# View y
y
```

```
## [1] FALSE FALSE TRUE TRUE FALSE TRUE
```

```
# View class
class(y)
```

```
## [1] "logical"
```

You can also create a vector of logical statements.

```
# Create logical vector
x <- c(T, F, T, F, F, T)
# View vector
x
```

```
## [1] TRUE FALSE TRUE FALSE FALSE TRUE
```

And you can convert logical values to numeric values, and back. `FALSE`

is the same as `0`

, while `TRUE`

is the same as `1`

.

```
# Convert to numeric vector
x <- as.numeric(x)
# View vector
x
```

```
## [1] 1 0 1 0 0 1
```

```
# Convert back to logical vector
x <- as.logical(x)
# View vector again
x
```

```
## [1] TRUE FALSE TRUE FALSE FALSE TRUE
```

This also means that you can do math with logical values. This is useful if, for example, you’re trying to see how many `TRUE`

values you have in your vector. In fact, applying any math operations to a logical vector will automatically convert the values to 1s and 0s.

```
# View vector
x
```

```
## [1] TRUE FALSE TRUE FALSE FALSE TRUE
```

```
# Count how many "TRUE" values there are. There are 3!
sum(x)
```

```
## [1] 3
```

### Factor

Factors are a special data type that is primarily used to represent repeating categories (i.e., categorical variables). When you specify an object as a factor, you’re telling R to think of it as a categorical variable, with different levels. This can be helpful when analyzing your data, as categorical variables and continuous variables are often handled differently in statistical analyses.

In the code below, I created a data frame showing the height and sex of five individuals.

```
# Create an example data frame
example <- data.frame(indiv = c("A", "B", "C", "D", "E"),
height = c(15, 10, 12, 9, 17),
sex = c("female", "female", "female", "male", "female"))
# View structure of data frame
str(example)
```

```
## 'data.frame': 5 obs. of 3 variables:
## $ indiv : chr "A" "B" "C" "D" ...
## $ height: num 15 10 12 9 17
## $ sex : chr "female" "female" "female" "male" ...
```

Right now, the `sex`

column is a character vector because I entered the data in quotation marks. But really what I want to do is tell R that `sex`

is a categorical variable, with “female” and “male” as levels. To do that, all I have to do is use the `as.factor()`

function.

```
# Change the sex column to be a factor
example$sex <- as.factor(example$sex)
# View the factor
example$sex
```

```
## [1] female female female male female
## Levels: female male
```

You can see that R listed the vector and then beneath that, has figured out on its own that the levels are “female” and “male”. When writing the levels, R will sort them in alphabetical order. That’s why the levels are `female male`

instead of `male female`

.

You may want to change the order of your factor levels (this can be useful when plotting your data and determining the order in which they appear).

For example, you might have a vector like this:

```
# Create vector
places <- factor(c("first", "first", "second", "third", "fifth", "fourth", "second"))
# View factor
places
```

```
## [1] first first second third fifth fourth second
## Levels: fifth first fourth second third
```

The order of the levels doesn’t make sense. We want it to go from first through fifth in the implied numeric order — not alphabetically. So let’s change the order using `factor(vector, levels = c("first", "second", "third", etc.))`

.

```
# Change level order
places <- factor(places, levels = c("first", "second", "third", "fourth", "fifth"))
# View factor
places
```

```
## [1] first first second third fifth fourth second
## Levels: first second third fourth fifth
```

Much better!

Factors don’t just have to be text. They can also be integers. For example, in the code below I created a data frame describing the stream width and order of several stream sites. Stream order is *not* a continuous variable, even though it’s represented by numbers. It would be best to convert stream order to a factor.

```
# Create data frame
example2 <- data.frame(stream = c("Patuxent", "Patapsco", "Deer Creek", "Town Creek", "Browns Branch"),
width = c(37, 42, 25, 32, 22),
order = c(6, 6, 4, 5, 3))
# View data frame structure
str(example2)
```

```
## 'data.frame': 5 obs. of 3 variables:
## $ stream: chr "Patuxent" "Patapsco" "Deer Creek" "Town Creek" ...
## $ width : num 37 42 25 32 22
## $ order : num 6 6 4 5 3
```

R sees stream order as being numeric, which makes sense. But let’s tell R that stream order is a factor.

```
# Change stream order to a factor
example2$order <- as.factor(example2$order)
# View stream order
example2$order
```

```
## [1] 6 6 4 5 3
## Levels: 3 4 5 6
```

Looks good. Since these are numbers, R just orders the levels in ascending order.

## How to check and manipulate data types

As demonstrated throughout this tutorial, it can be useful to check the type of data you’re working with and be able to change it to another type if you need. You might need this especially in situations where you’re reading in data from a .csv, and need to check that all your numbers are numeric instead of characters.

The main way to check your data type is to use the function `class()`

. If you have a data frame, another easy way to check data types is to use the `str()`

function. This displays the structure of your data frame and tells you what data type each of your columns is. The example below lists heights over time for five individuals.

```
# Create an example data frame
example <- data.frame(indiv = c("A", "B", "C", "D", "E"),
height_0 = c(15, 10, 12, 9, 17),
height_10 = c(20, 18, 14, 15, 19),
height_20 = c(23, 24, 18, 17, 26))
str(example)
```

```
## 'data.frame': 5 obs. of 4 variables:
## $ indiv : chr "A" "B" "C" "D" ...
## $ height_0 : num 15 10 12 9 17
## $ height_10: num 20 18 14 15 19
## $ height_20: num 23 24 18 17 26
```

You can see that the column `indiv`

is a character vector (abbreviated “chr”), while each successive column is numeric (abbreviated “num”).

You also noticed me using functions like `is.numeric()`

or `as.character()`

. All of the data types have `is.`

and `as.`

functions, where the first one is a logical statement to check the specific data type, asking “is this object of the class XXX?” and returns `TRUE`

or `FALSE`

. The `as.`

functions are actions that convert objects into a new data type. You may find yourself using these often when you’re first formatting your data and preparing it for analysis.

That’s it for data types in R! Keep an eye out for our next tutorial, which will go over different data structures in R like vectors, lists, data frames, and tibbles. I hope this tutorial was helpful! Happy coding!

Also be sure to check out **R-bloggers** for other great tutorials on learning R