# Find and Replace in R, Part 1: `recode` in the library `car`

### Background

Recoding a variable in R has never seemed like a straightforward task to me. In the next few entries, I will present the do’s and dont’s of different find/replace (recoding) solutions in R. Today: `car::recode`.

### 1. Recoding vectors:

Load the library `car`, and create a vector of random integers from 1 to 4:

``````library(car)
x <- sample(1:4, size = 10, replace = T)
x
```
```##  [1] 4 2 3 1 1 1 4 1 1 3
``````

Simply, the arguments of `recode` are:

• The vector we wish to change
• A list of the recodes, enclosed in single or double quotes, and separated by semi-colons.

Let’s start by recoding `1 2 3 4` in `x` to `2 5 8 10`:

``````x1 <- recode(x, "1 = 2 ; 2 = 5 ; 3 = 8 ; 4 = 10")
x1
```
```##  [1] 10  5  8  2  2  2 10  2  2  8
``````

The same is true to recode `1 2 3 4` to `A B C D`:

``````x2 <- recode(x, "1 = "A" ; 2 = "B" ; 3 = "C" ; 4 = "D"")
x2
```
```##  [1] "D" "B" "C" "A" "A" "A" "D" "A" "A" "C"
``````

To recode `1 2 3 4` to `0 0 5 5`, we can use several different approaches to achieve the same result:

``````x3.1 <- recode(x, '1 = 0 ; 2 = 0 ; 3 = 5 ; 4 = 5')     # specify individual values
x3.2 <- recode(x, 'c(1, 2) = 0 ; c(3, 4) = 5')         # vectors of values
x3.3 <- recode(x, 'lo : 2 = 0 ; 3 : hi = 5')           # lo and hi
x3.4 <- recode(x, '1 : 2 = 0 ; else = 5')              # specify some values and else for remaining values
``````

Where all solutions give:

``````##  [1] 5 0 5 0 0 0 5 0 0 5
``````

If we only need to change some of the values, only those values need to be specified. For example, to recode `1 2 3 4` to `1 2 3 5`:

``````x4 <- recode(x, "4=5")
x4
```
```##  [1] 5 2 3 1 1 1 5 1 1 3
``````

We can do the same with character vectors. Let’s create one with some fish species:

``````y <- sample(c("Perch", "Goby", "Trout", "Salmon"), size = 10, replace = T)
y
```
```##  [1] "Goby"   "Salmon" "Trout"  "Perch"  "Salmon" "Perch"  "Trout"
##  [8] "Goby"   "Perch"  "Salmon"
``````

We wish to recode each fish into its biological order, i.e. `Perch, Goby, Trout, Salmon` will become `Perciform, Perciform, Salmonid, Salmonid`:

``````y1 <- recode(y, 'c("Perch", "Goby") = "Perciform" ; c("Trout", "Salmon") = "Salmonid"')
y1
```
```##  [1] "Perciform" "Salmonid"  "Salmonid"  "Perciform" "Salmonid"
##  [6] "Perciform" "Salmonid"  "Perciform" "Perciform" "Salmonid"
``````

### 2. Recoding Data Frames

Let’s create a simple dataframe, `z`:

``````z <- data.frame(Letter = sample(factor(letters[1:4]), size = 10, replace = T),
Count1 = sample(c(1:4), size = 10, replace = T),
Count2 = sample(c(1:4), size = 10, replace = T),
Count3 = sample(c(1:4), size = 10, replace = T))
z
```
```##    Letter Count1 Count2 Count3
## 1       a      1      2      3
## 2       a      2      1      3
## 3       a      3      4      4
## 4       c      3      3      2
## 5       b      1      2      4
## 6       a      4      3      3
## 7       d      3      4      3
## 8       b      3      2      4
## 9       a      4      3      4
## 10      b      3      3      2
``````

In this data frame, we wish to recode `1 2 3 4` values to `2 5 8 10`. Let’s do this by recoding the data frame to create a new object, `z1`:

``````z1 <- recode(z, "1 = 2 ; 2 = 5 ; 3 = 8 ; 4 = 10")
```
```## Error: (list) object cannot be coerced to type 'double'
```
```z1
```
```## Error: object 'z1' not found
``````

No, this didn’t work. It seems that `recode` only works with vectors.

We could try a `for` loop across all columns:

``````z1 <- z
for (i in 1:ncol(z)) z1[, i] <- recode(z1[, i], "1 = 2 ; 2 = 5 ; 3 = 8 ; 4 = 10")
``````

gives:

``````z1
```
```##    Letter Count1 Count2 Count3
## 1       a      2      5      8
## 2       a      5      2      8
## 3       a      8     10     10
## 4       c      8      8      5
## 5       b      2      5     10
## 6       a     10      8      8
## 7       d      8     10      8
## 8       b      8      5     10
## 9       a     10      8     10
## 10      b      8      8      5
``````

By far, the best workaround is to use `lapply`

``````z1 <- lapply(z, FUN = function(foo) recode(foo, "1 = 2 ; 2 = 5 ; 3 = 8 ; 4 = 10"))
z1
```
```## \$Letter
##  [1] a a a c b a d b a b
## Levels: a b c d
##
## \$Count1
##  [1]  2  5  8  8  2 10  8  8 10  8
##
## \$Count2
##  [1]  5  2 10  8  5  8 10  5  8  8
##
## \$Count3
##  [1]  8  8 10  5 10  8  8 10 10  5
##
``````

But now we have lost our data frame! Let’s fix this with `data.frame()`:

``````z1 <- data.frame(z1)
z1
```
```##    Letter Count1 Count2 Count3
## 1       a      2      5      8
## 2       a      5      2      8
## 3       a      8     10     10
## 4       c      8      8      5
## 5       b      2      5     10
## 6       a     10      8      8
## 7       d      8     10      8
## 8       b      8      5     10
## 9       a     10      8     10
## 10      b      8      8      5
``````

### Conclusions

`car::recode` is a simple and powerful find/replace solution. However, it can be frustrating to write out recode lists when there are many values that need to be changed. In the next entry, I’ll present some alternative recoding solutions using gsub, indexing and others.

# Creating a large scale map using ggplot2: a step by step guide.

* UPDATED 2nd SEPTEMBER 2013 *

Whenever I have needed a map in a presentation, I have always slipped in a cheeky screenshot from Google and prayed that the copyright police won’t burst into the door and arrest me (talks are stressful enough). However, I was determined to change this, and after extensive trawling of the web, I’ve put together a quick template to make a simple large scale map for talks, presentations and posters using GIS shape files and ggplot2.

Here is the finished object (full code at the end of this page):

````library(maptools) `
`gpclibPermit() `
`## [1] TRUE `
`worldmap <- readShapeSpatial("world_borders.shp") ````

*** NB: You will require the library `gpclib `to read in the shapefile – if gpclibPermit() returns FALSE, then try installing gpclib again. As of Sep 2013, gpclib was not working in R v3.0.1. Argh. ***

The `fortify` function in ggplot2 will convert this to a data frame:

``````library(ggplot2)
worldmap <- fortify(worldmap) ```
`## Using CAT to define regions. ````

Let’s have a quick look:

````head(worldmap) `
```## long lat order hole piece group id
## 1 -69.88 12.41 1 FALSE 1 1.1 1
## 2 -69.95 12.44 2 FALSE 1 1.1 1
## 3 -70.06 12.53 3 FALSE 1 1.1 1
## 4 -70.06 12.54 4 FALSE 1 1.1 1
## 5 -70.06 12.54 5 FALSE 1 1.1 1
## 6 -70.06 12.62 6 FALSE 1 1.1 1 ``````

Essentially, this data frame is a list of paths/polygons (each one is part of a unique ‘group’) that we can plot using `geom_path` and `geom_polygon`. There are more than 400,000 points in this map which makes plotting quite slow, so I created a smaller data frame, `worldmap2`, to create rough maps for initial tweaking and formatting.

``worldmap2 <- worldmap[seq(1, nrow(worldmap), 200), ] ``

A quick plot using `geom_polygon`, creating groups based on the variable ‘group’:

``````ggplot(worldmap2, aes(x = long, y = lat, group = group)) +
geom_polygon() ``````

The same with `geom_path`:

``````ggplot(worldmap2, aes(x = long, y = lat, group = group)) +
geom_path() ``````

And using both (Put geom_path second in order to plot the borders):

``````ggplot(worldmap2, aes(x = long, y = lat, group = group)) +
geom_polygon() +
geom_path() ``````

Changing the colour of the polygons and path is straightforward:

``````ggplot(worldmap2, aes(x = long, y = lat, group = group)) +
geom_polygon(fill = "darkseagreen") +
geom_path(colour = "grey40") ``````

Creating a blue blackground required delving further into the mysterious world of `theme()`. I also changed the colour of the plot border and major grid lines to reduce the contrast slightly:

``````ggplot(worldmap2, aes(x = long, y = lat, group = group)) +
geom_polygon(fill = "darkseagreen") +
geom_path(colour = "grey40") +
theme(panel.background = element_rect(fill = "lightsteelblue2", colour = "grey"),
panel.grid.major = element_line(colour = "grey90")) ``````

I wanted a map of Northern Europe, so I defined my latitude and longitude limits, and used `coord_cartesian` to zoom in onto the graph. With this mapfile, you can change `latlimits` and `loglimits` to your specific region of interest:

``````latlimits <- c(40, 75)
longlimits <- c(-25, 50)

ggplot(worldmap2, aes(x = long, y = lat, group = group)) +
geom_polygon(fill = "darkseagreen") +
geom_path(colour = "grey40") +```
```  theme(panel.background = element_rect(fill = "lightsteelblue2", colour = "grey"),
panel.grid.major = element_line(colour = "grey90")) +```
`  coord_cartesian(xlim = longlimits, ylim = latlimits) ````

The scales are still more suited to the full graph, so now we can define our scale to be more relevant.

``````ggplot(worldmap2, aes(x = long, y = lat, group = group)) +
geom_polygon(fill = "darkseagreen") +
geom_path(colour = "grey40") +```
```  theme(panel.background = element_rect(fill = "lightsteelblue2", colour = "grey"),
panel.grid.major = element_line(colour = "grey90")) ``````+
scale_y_continuous(breaks = seq(40, 70, 10)) +
scale_x_continuous(breaks = seq(-20, 40, 20)) ``````+
````  coord_cartesian(xlim = longlimits, ylim = latlimits) ````

Now time to customise the graph a little more: removing the axis titles, relabelling the axes, removing the axis ticks, and making the text larger. Then we can use the original `worldmap` data frame to produce the final product:

``````# Final code:

library(maptools)
require(gpclib)
gpclibPermit()

library(ggplot2)
worldmap <- fortify(worldmap)

latlimits <- c(40, 75)
longlimits <- c(-25, 50)

ggplot(worldmap, aes(x = long, y = lat, group = group)) +
geom_polygon(fill = "darkseagreen") +
geom_path(colour = "grey40") +
theme(panel.background = element_rect(fill = "lightsteelblue2", colour = "grey"),
panel.grid.major = element_line(colour = "grey90"),
panel.grid.minor = element_blank(),
axis.ticks = element_blank(),
axis.text.x = element_text (size = 14, vjust = 0),
axis.text.y = element_text (size = 14, hjust = 1.3)) +
coord_cartesian(xlim = longlimits, ylim = latlimits) +
scale_y_continuous(breaks=seq(40,70,10), labels = c("40ºN", "50ºN", "60ºN", "70ºN")) +
scale_x_continuous(breaks=seq(-20,40,20), labels = c("20ºW", "0º" ,"20ºE", "40ºE")) +
labs(y="",x="")

``````