Find and Replace in R, Part 1: recode in the library car

Background

Recoding a variable in R has never seemed like a straightforward task to me. In the next few entries, I will present the do’s and dont’s of different find/replace (recoding) solutions in R. Today: car::recode.

1. Recoding vectors:

Load the library car, and create a vector of random integers from 1 to 4:

library(car)
x <- sample(1:4, size = 10, replace = T)
x

##  [1] 4 2 3 1 1 1 4 1 1 3

Simply, the arguments of recode are:

  • The vector we wish to change
  • A list of the recodes, enclosed in single or double quotes, and separated by semi-colons.

Let’s start by recoding 1 2 3 4 in x to 2 5 8 10:

x1 <- recode(x, "1 = 2 ; 2 = 5 ; 3 = 8 ; 4 = 10")
x1

##  [1] 10  5  8  2  2  2 10  2  2  8

The same is true to recode 1 2 3 4 to A B C D:

x2 <- recode(x, "1 = "A" ; 2 = "B" ; 3 = "C" ; 4 = "D"")
x2

##  [1] "D" "B" "C" "A" "A" "A" "D" "A" "A" "C"

To recode 1 2 3 4 to 0 0 5 5, we can use several different approaches to achieve the same result:

x3.1 <- recode(x, '1 = 0 ; 2 = 0 ; 3 = 5 ; 4 = 5')     # specify individual values
x3.2 <- recode(x, 'c(1, 2) = 0 ; c(3, 4) = 5')         # vectors of values
x3.3 <- recode(x, 'lo : 2 = 0 ; 3 : hi = 5')           # lo and hi
x3.4 <- recode(x, '1 : 2 = 0 ; else = 5')              # specify some values and else for remaining values

Where all solutions give:

##  [1] 5 0 5 0 0 0 5 0 0 5

If we only need to change some of the values, only those values need to be specified. For example, to recode 1 2 3 4 to 1 2 3 5:

x4 <- recode(x, "4=5")
x4

##  [1] 5 2 3 1 1 1 5 1 1 3

We can do the same with character vectors. Let’s create one with some fish species:

y <- sample(c("Perch", "Goby", "Trout", "Salmon"), size = 10, replace = T)
y

##  [1] "Goby"   "Salmon" "Trout"  "Perch"  "Salmon" "Perch"  "Trout"
##  [8] "Goby"   "Perch"  "Salmon"

We wish to recode each fish into its biological order, i.e. Perch, Goby, Trout, Salmon will become Perciform, Perciform, Salmonid, Salmonid:

y1 <- recode(y, 'c("Perch", "Goby") = "Perciform" ; c("Trout", "Salmon") = "Salmonid"')
y1

##  [1] "Perciform" "Salmonid"  "Salmonid"  "Perciform" "Salmonid"
##  [6] "Perciform" "Salmonid"  "Perciform" "Perciform" "Salmonid"

2. Recoding Data Frames

Let’s create a simple dataframe, z:

z <- data.frame(Letter = sample(factor(letters[1:4]), size = 10, replace = T),
                Count1 = sample(c(1:4), size = 10, replace = T),
                Count2 = sample(c(1:4), size = 10, replace = T),
                Count3 = sample(c(1:4), size = 10, replace = T))
z

##    Letter Count1 Count2 Count3
## 1       a      1      2      3
## 2       a      2      1      3
## 3       a      3      4      4
## 4       c      3      3      2
## 5       b      1      2      4
## 6       a      4      3      3
## 7       d      3      4      3
## 8       b      3      2      4
## 9       a      4      3      4
## 10      b      3      3      2

In this data frame, we wish to recode 1 2 3 4 values to 2 5 8 10. Let’s do this by recoding the data frame to create a new object, z1:

z1 <- recode(z, "1 = 2 ; 2 = 5 ; 3 = 8 ; 4 = 10")

## Error: (list) object cannot be coerced to type 'double'

z1

## Error: object 'z1' not found

No, this didn’t work. It seems that recode only works with vectors.

We could try a for loop across all columns:

z1 <- z
for (i in 1:ncol(z)) z1[, i] <- recode(z1[, i], "1 = 2 ; 2 = 5 ; 3 = 8 ; 4 = 10")

gives:

z1

##    Letter Count1 Count2 Count3
## 1       a      2      5      8
## 2       a      5      2      8
## 3       a      8     10     10
## 4       c      8      8      5
## 5       b      2      5     10
## 6       a     10      8      8
## 7       d      8     10      8
## 8       b      8      5     10
## 9       a     10      8     10
## 10      b      8      8      5

By far, the best workaround is to use lapply

z1 <- lapply(z, FUN = function(foo) recode(foo, "1 = 2 ; 2 = 5 ; 3 = 8 ; 4 = 10"))
z1

## $Letter
##  [1] a a a c b a d b a b
## Levels: a b c d
##
## $Count1
##  [1]  2  5  8  8  2 10  8  8 10  8
##
## $Count2
##  [1]  5  2 10  8  5  8 10  5  8  8
##
## $Count3
##  [1]  8  8 10  5 10  8  8 10 10  5
##

But now we have lost our data frame! Let’s fix this with data.frame():

z1 <- data.frame(z1)
z1

##    Letter Count1 Count2 Count3
## 1       a      2      5      8
## 2       a      5      2      8
## 3       a      8     10     10
## 4       c      8      8      5
## 5       b      2      5     10
## 6       a     10      8      8
## 7       d      8     10      8
## 8       b      8      5     10
## 9       a     10      8     10
## 10      b      8      8      5

Conclusions

car::recode is a simple and powerful find/replace solution. However, it can be frustrating to write out recode lists when there are many values that need to be changed. In the next entry, I’ll present some alternative recoding solutions using gsub, indexing and others.

This entry was posted in R. Bookmark the permalink.

7 thoughts on “Find and Replace in R, Part 1: recode in the library car

  1. Susan

    You could just use apply(), with margin = 2 for columns, on the data.frame…..

    znew<-data.frame(apply(z, MARGIN = 2, FUN = function(foo) recode(foo, "1 = 2 ; 2 = 5 ; 3 = 8 ; 4 = 10")))

    Andrew

  2. My spouse and I stumbled over here by a different website and thought I may as well check things out.
    I like what I see so now i am following you. Look forward to finding out
    about your web page yet again.

  3. Hello Susan,
    Thank you for the tutorials. They are very practical and easy to follow.
    Question: What us the purpose of passing foo into the recode function within lapply?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s