Sunday, March 31, 2013

R and the last comma

In R, every comma matters. When creating a vector, c(1, 2, 5) will do the right thing, but add one unfortunate comma and c(1, 2, 5,) will greet you with a deadly Error in c(1, 2, 5, ) : argument 4 is empty.

Other languages like Perl are less strict when defining basic data structures: having a comma after the last item is allowed. This can be particularly useful when items are specified on multiple lines as in this example:

my @cities = (
  "New York",
  "Washington",
  "Atlanta",
)

Because the last line item is syntactically no different than any other, I can comment, uncomment, add, remove, swap items anywhere within the definition without having to worry about which lines should have a comma or not (they all should.)

Can R's default behavior be overridden? Yes, using the following functional:

ok.comma <- function(FUN) {
function(...) {
arg.list <- as.list(sys.call())[-1L]
len <- length(arg.list)
if (len > 1L) {
last <- arg.list[[len]]
if (missing(last)) {
arg.list <- arg.list[-len]
}
}
do.call(FUN, arg.list)
}
}
view raw ok.comma.R hosted with ❤ by GitHub

The functional acts as a wrapper around any function: it creates an identical function but such that the last argument, if missing, is thrown out.

This way, I can call functions like c, list, data.frame indirectly and with an optional extra comma at the end:

cities <- ok.comma(c)(
  "New York",
  "Washington",
  "Atlanta",
)

I can even put the definition of ok.comma into my .Rprofile file and redefine functions

c          <- ok.comma(base::c)
list       <- ok.comma(base::list)
data.frame <- ok.comma(base::data.frame)

so I can seamlessly do:

cities <- c(
  "New York",
  "Washington",
  "Atlanta",
)

I hope you find this useful.

Sunday, January 6, 2013

Search and replace: Are you tired of nested `ifelse`?

It happens all the time: you have a vector of fruits and you want to replace all bananas with apples, all oranges with pineapples, and leave all the other fruits as-is, or maybe change them all to figs. The usual solution? A big old nested `ifelse`:

basket <- c("apple", "banana", "lemon", "orange",
"orange", "pear", "cherry")
ifelse(basket == "banana", "apple",
ifelse(basket == "orange", "pineapple",
basket)) # or "fig"))
# [1] "apple" "apple" "lemon" "pineapple"
# [4] "pineapple" "pear" "cherry"
view raw nested_ifelse.R hosted with ❤ by GitHub

Ok, that didn't look too bad, especially with the code and fruits nicely aligned. But what if I had a lot of fruits to change and little patience? Wouldn't it be nice if R had a built-in function for doing multiple search and replace? Someone please tell me if there is already such a function. If not, here is one I wrote that builds a nested `ifelse` function by recursion:

decode <- function(x, search, replace, default = NULL) {
# build a nested ifelse function by recursion
decode.fun <- function(search, replace, default = NULL)
if (length(search) == 0L) {
function(x) if (is.null(default)) x else rep(default, length(x))
} else {
function(x) ifelse(x == search[1L], replace[1L],
decode.fun(tail(search, -1L),
tail(replace, -1L),
default)(x))
}
return(decode.fun(search, replace, default)(x))
}
view raw decode.R hosted with ❤ by GitHub

Note that I named the function after the `decode` SQL function. Here are a couple examples:

decode(basket, search = c("banana", "orange"),
replace = c("apple", "pineapple"))
# [1] "apple" "apple" "lemon" "pineapple" "pineapple" "pear" "cherry"
decode(basket, search = c("banana", "orange"),
replace = c("apple", "pineapple"),
default = "fig")
# [1] "fig" "apple" "fig" "pineapple" "pineapple" "fig" "fig"

Feel free to use it with your favorite fruits or vegetables! Cheers!

P.S.: I wrote this function as an answer to this S.O. question. Thank you to Matthew Lundberg for sharing ideas.

Photo source: http://www.istockphoto.com/stock-photo-19534475-mixed-fruit.php