Background
Recoding a variable in R has never seemed like a straightforward task to me. In the next few entries, I will present the do’s and dont’s of different find/replace (recoding) solutions in R. Today: car::recode
.
1. Recoding vectors:
Load the library car
, and create a vector of random integers from 1 to 4:
library(car) x <- sample(1:4, size = 10, replace = T) x
## [1] 4 2 3 1 1 1 4 1 1 3
Simply, the arguments of recode
are:
- The vector we wish to change
- A list of the recodes, enclosed in single or double quotes, and separated by semi-colons.
Let’s start by recoding 1 2 3 4
in x
to 2 5 8 10
:
x1 <- recode(x, "1 = 2 ; 2 = 5 ; 3 = 8 ; 4 = 10") x1
## [1] 10 5 8 2 2 2 10 2 2 8
The same is true to recode 1 2 3 4
to A B C D
:
x2 <- recode(x, "1 = "A" ; 2 = "B" ; 3 = "C" ; 4 = "D"") x2
## [1] "D" "B" "C" "A" "A" "A" "D" "A" "A" "C"
To recode 1 2 3 4
to 0 0 5 5
, we can use several different approaches to achieve the same result:
x3.1 <- recode(x, '1 = 0 ; 2 = 0 ; 3 = 5 ; 4 = 5') # specify individual values
x3.2 <- recode(x, 'c(1, 2) = 0 ; c(3, 4) = 5') # vectors of values
x3.3 <- recode(x, 'lo : 2 = 0 ; 3 : hi = 5') # lo and hi
x3.4 <- recode(x, '1 : 2 = 0 ; else = 5') # specify some values and else for remaining values
Where all solutions give:
## [1] 5 0 5 0 0 0 5 0 0 5
If we only need to change some of the values, only those values need to be specified. For example, to recode 1 2 3 4
to 1 2 3 5
:
x4 <- recode(x, "4=5") x4
## [1] 5 2 3 1 1 1 5 1 1 3
We can do the same with character vectors. Let’s create one with some fish species:
y <- sample(c("Perch", "Goby", "Trout", "Salmon"), size = 10, replace = T) y
## [1] "Goby" "Salmon" "Trout" "Perch" "Salmon" "Perch" "Trout" ## [8] "Goby" "Perch" "Salmon"
We wish to recode each fish into its biological order, i.e. Perch, Goby, Trout, Salmon
will become Perciform, Perciform, Salmonid, Salmonid
:
y1 <- recode(y, 'c("Perch", "Goby") = "Perciform" ; c("Trout", "Salmon") = "Salmonid"') y1
## [1] "Perciform" "Salmonid" "Salmonid" "Perciform" "Salmonid" ## [6] "Perciform" "Salmonid" "Perciform" "Perciform" "Salmonid"
2. Recoding Data Frames
Let’s create a simple dataframe, z
:
z <- data.frame(Letter = sample(factor(letters[1:4]), size = 10, replace = T), Count1 = sample(c(1:4), size = 10, replace = T), Count2 = sample(c(1:4), size = 10, replace = T), Count3 = sample(c(1:4), size = 10, replace = T)) z
## Letter Count1 Count2 Count3 ## 1 a 1 2 3 ## 2 a 2 1 3 ## 3 a 3 4 4 ## 4 c 3 3 2 ## 5 b 1 2 4 ## 6 a 4 3 3 ## 7 d 3 4 3 ## 8 b 3 2 4 ## 9 a 4 3 4 ## 10 b 3 3 2
In this data frame, we wish to recode 1 2 3 4
values to 2 5 8 10
. Let’s do this by recoding the data frame to create a new object, z1
:
z1 <- recode(z, "1 = 2 ; 2 = 5 ; 3 = 8 ; 4 = 10")
## Error: (list) object cannot be coerced to type 'double'
z1
## Error: object 'z1' not found
No, this didn’t work. It seems that recode
only works with vectors.
We could try a for
loop across all columns:
z1 <- z
for (i in 1:ncol(z)) z1[, i] <- recode(z1[, i], "1 = 2 ; 2 = 5 ; 3 = 8 ; 4 = 10")
gives:
z1
## Letter Count1 Count2 Count3 ## 1 a 2 5 8 ## 2 a 5 2 8 ## 3 a 8 10 10 ## 4 c 8 8 5 ## 5 b 2 5 10 ## 6 a 10 8 8 ## 7 d 8 10 8 ## 8 b 8 5 10 ## 9 a 10 8 10 ## 10 b 8 8 5
By far, the best workaround is to use lapply
z1 <- lapply(z, FUN = function(foo) recode(foo, "1 = 2 ; 2 = 5 ; 3 = 8 ; 4 = 10")) z1
## $Letter ## [1] a a a c b a d b a b ## Levels: a b c d ## ## $Count1 ## [1] 2 5 8 8 2 10 8 8 10 8 ## ## $Count2 ## [1] 5 2 10 8 5 8 10 5 8 8 ## ## $Count3 ## [1] 8 8 10 5 10 8 8 10 10 5 ##
But now we have lost our data frame! Let’s fix this with data.frame()
:
z1 <- data.frame(z1) z1
## Letter Count1 Count2 Count3 ## 1 a 2 5 8 ## 2 a 5 2 8 ## 3 a 8 10 10 ## 4 c 8 8 5 ## 5 b 2 5 10 ## 6 a 10 8 8 ## 7 d 8 10 8 ## 8 b 8 5 10 ## 9 a 10 8 10 ## 10 b 8 8 5
Conclusions
car::recode
is a simple and powerful find/replace solution. However, it can be frustrating to write out recode lists when there are many values that need to be changed. In the next entry, I’ll present some alternative recoding solutions using gsub, indexing and others.
Susan
You could just use apply(), with margin = 2 for columns, on the data.frame…..
znew<-data.frame(apply(z, MARGIN = 2, FUN = function(foo) recode(foo, "1 = 2 ; 2 = 5 ; 3 = 8 ; 4 = 10")))
Andrew
Hey Andrew, thanks for the comment! I had used lapply in the post above… is there a reason that apply would be preferable over lapply?
My spouse and I stumbled over here by a different website and thought I may as well check things out.
I like what I see so now i am following you. Look forward to finding out
about your web page yet again.
Reblogged this on PsycNotes: A Grad Student's Notebook.
I can’t seem to recreate some of your results. I’ve posed my question on stackoverflow here: http://stackoverflow.com/questions/27556717/recode-in-car-package-returns-unexpected-symbol-when-recoding-strings .
Your how-to is very helpful. Have package updates made previous commands obsolete?
.
Hi Matt – there was a mistake in my code. Have fixed it and answered your question in StackOverflow. Thanks for flagging.
Hello Susan,
Thank you for the tutorials. They are very practical and easy to follow.
Question: What us the purpose of passing foo into the recode function within lapply?