R de jeu: Search and replace: Are you tired of nested `ifelse`?

Sunday, January 6, 2013

Search and replace: Are you tired of nested `ifelse`?

It happens all the time: you have a vector of fruits and you want to replace all bananas with apples, all oranges with pineapples, and leave all the other fruits as-is, or maybe change them all to figs. The usual solution? A big old nested `ifelse`:

Ok, that didn't look too bad, especially with the code and fruits nicely aligned. But what if I had a lot of fruits to change and little patience? Wouldn't it be nice if R had a built-in function for doing multiple search and replace? Someone please tell me if there is already such a function. If not, here is one I wrote that builds a nested `ifelse` function by recursion:

Note that I named the function after the `decode` SQL function. Here are a couple examples:

Feel free to use it with your favorite fruits or vegetables! Cheers!

P.S.: I wrote this function as an answer to this S.O. question. Thank you to Matthew Lundberg for sharing ideas.

Photo source: http://www.istockphoto.com/stock-photo-19534475-mixed-fruit.php

22 comments:

Tal GaliliJanuary 7, 2013 at 12:05 AM
Hi There,
Cool post.
The image was removed from r-bloggers since you didn't give any copyright credits. Please do so in the future.

Cheers,
Tal
ReplyDelete
Replies
AnonymousJanuary 7, 2013 at 12:31 AM
How about this two-liner:

WHICH <- sapply(c("banana", "orange"), function(x) grep(x, basket))

for (i in 1:length(WHICH)) basket[WHICH[[i]]] <- c("apple", "pineapple")[i]

Cheers,
Andrej
ReplyDelete
Replies
mpiktasJanuary 7, 2013 at 12:49 AM
You can use factors for that

decode<-function(x,i,o) {
x<-factor(x);
levels(x)[levels(x) %in% i] <- o;
as.character(x)
}

I suspect though that you may lose out in conversion from factor to character. But if you already have factor, then no problems.
ReplyDelete
Replies
AnonymousJanuary 7, 2013 at 6:19 AM
flodel,
Ignoring the StackOverflow question and addressing the problem you pose in this post (which is subtly different), there is a way to get around your nesting issue while still being easy to code/read. I have to do this every so often when recoding data, and I find using a transformation legend is a quick and straightforward approach for 1:1 replacements. I coded up a quick example function for your case:

# Function to replace a data array by evaluating each
# item individually and in series.
# data_Array is expected to be a list or array with
# all of the data to be replaced.
# original_List is expected to be a list or array with
# a list of item values in data_Array.
# replacement_List is expected to be a list or array with
# a list of new item values that correspond in placement
# with original_List.
# For example, to transform "banana" into "pineapple", we
# would see:
# original_List[n] = "banana" and
# replacement_List[n] = "pineapple"

# For this to work, you may have to use unlist(data_Array)
# if data_Array is using factors and replacement_List is
# introducing new values. Adding the new array into a
# data frame will then resolve the new factors automatically.

replace_By_Item <- function(data_Array, original_List, replacement_List){
for (i in 1:length(data_Array)){
data_Array[i] <- replacement_List[match(data_Array[i], original_List)]
}
return(data_Array)
}

# Demo of function.
# Makes a random list of "banana", "orange", and "fig"
# values, and replaces all "banana" with "apple" and
# all "orange" with "pineapple".
test_Basket <- sample(c("banana","orange","fig"),100, rep=TRUE)
new_Test_Basket <- replace_By_Item(test_Basket, c("banana","orange","fig"), c("apple","pineapple","fig"))

# Regarding your default value: I don't ever use default values
# when transforming my data. I find that it is too easy to
# overlook something a lose a bunch of data, but I've coded it
# up anyway as an example.
replace_By_Item <- function(data_Array, original_List, replacement_List, default_Value){
for (i in 1:length(data_Array)){
if (data_Array[i] %in% original_List){
data_Array[i] <- replacement_List[match(data_Array[i], original_List)]
} else {
data_Array[i] <- default_Value
}
}
return(data_Array)
}
ReplyDelete
Replies
AnonymousJanuary 7, 2013 at 7:46 AM
flodel,
I realized I should perhaps return here to address your question about built-in functions.

If you really want to use a built-in function to replace your nested ifelse statements, you might want to consider using a switch. Switches are essentially a series of nested ifelse statements simplified to a single command. In R, switches work two different ways, and for your fruit basket, we would use the "character string" method. In the following, we specify the switch values and what they should return. The final unnamed value is the default switch. These could be expressions, but for our case, we're just using the fruit names.

for (i in 1:length(data_Array)){
data_Array[i] <- switch(data_Array[i], banana="apple", orange="pineapple", "fig")
}

For the StackOverflow question, we would have to use the "integer" method for our switch. Note that the "integer" method switch does not have a default. I'm using plyr here purely for convenience, since ddply takes a data frame and returns a data frame automatically.

# z as specified in the SO question:
z <- data.frame(x=1:10, y=11:20, t=21:30)

library(plyr)
ddply(z, .(x), summarize, y=y, t=t, q=switch(x, 1, 2, 1, 4, 1, 1, 3, 1, 1, 1) * t)
ReplyDelete
Replies
GoldGuyJanuary 7, 2013 at 12:43 PM
There is also the "recode()" function in the car package that essentially does the same thing. It is meant to provide the untility of the RECODE statment of SPSS.
ReplyDelete
Replies
GoldGuyJanuary 7, 2013 at 2:41 PM
To do the same thing with the recode function of the car package:

library(car)
recode(basket, "'banana' = 'apple', 'orange' = 'pineapple', else = 'fig'")
ReplyDelete
Replies
UnknownJanuary 7, 2013 at 9:24 PM
There's a new function in plyr 1.8 called revalue:
revalue(basket, replace = c(banana="apple", orange="pineapple"))

It is implemented using the new mapvalues function, which you can use this way:
mapvalues(basket,
from = c("banana", "orange"),
to = c("apple", "pineapple"))

They both work with character vectors and, notably, factors. If you have a numeric vector, you'll have to use mapvalues(), because revalue() uses a named vector for the replacements, and the names are always strings, not numbers.
ReplyDelete
Replies
UnknownApril 5, 2013 at 6:35 AM
This comment has been removed by the author.
ReplyDelete
Replies
NettleMay 2, 2016 at 6:52 PM
I know this is an old post, but the function str_replace seems to do this (stringr package).
ReplyDelete
Replies
®γσ, ξηg（雷欧）November 3, 2016 at 6:57 AM
## weighted parameter estimation
#'@ mbase %<>% mutate(theta = suppressAll(
#'@ ifelse(Result == 'Win', 1,
#'@ ifelse(Result == 'Half Win', 0.5,
#'@ ifelse(Result == 'Push'|Result == 'Cancelled', 0,
#'@ ifelse(Result == 'Half Loss', -0.5,
#'@ ifelse(Result == 'Loss', -1, NA)))))),
#'@ dWin = ifelse(Result == 'Win', 1, 0),
#'@ dwhf = ifelse(Result == 'Half Win', 1, 0),
#'@ dpus = ifelse(Result == 'Push'|Result == 'Cancelled', 1, 0),
#'@ dlhf = ifelse(Result == 'Half Loss', 1, 0),
#'@ dlos = ifelse(Result == 'Loss', 1, 0))

mbase %<>% mutate(
theta = decode(c('Win', 'Half Win', 'Push', 'Cancelled', 'Half Loss', 'Loss'),
c(1, 0.5, 0, 0, -0.5, -1)),
dWin = ifelse(Result == 'Win', 1, 0),
dwhf = ifelse(Result == 'Half Win', 1, 0),
dpus = ifelse(Result == 'Push'|Result == 'Cancelled', 1, 0),
dlhf = ifelse(Result == 'Half Loss', 1, 0),
dlos = ifelse(Result == 'Loss', 1, 0))

## I tried to knit ifelse() in RMarkdown files and it working fine yesterday but not today, its working fine if I run it indepdently... due to the ifelse() is a vector handle conditional function therefore it is not single element conditional function if {} else {} unless we use if(x[i]==y[i]) {} else {}. When I try to use decode(), there prompt me the error message : "Error: character string is not in a standard unambiguous format".
ReplyDelete
Replies

Add comment