Sunday, March 31, 2013

R and the last comma

In R, every comma matters. When creating a vector, c(1, 2, 5) will do the right thing, but add one unfortunate comma and c(1, 2, 5,) will greet you with a deadly Error in c(1, 2, 5, ) : argument 4 is empty.

Other languages like Perl are less strict when defining basic data structures: having a comma after the last item is allowed. This can be particularly useful when items are specified on multiple lines as in this example:

my @cities = (
  "New York",
  "Washington",
  "Atlanta",
)

Because the last line item is syntactically no different than any other, I can comment, uncomment, add, remove, swap items anywhere within the definition without having to worry about which lines should have a comma or not (they all should.)

Can R's default behavior be overridden? Yes, using the following functional:

The functional acts as a wrapper around any function: it creates an identical function but such that the last argument, if missing, is thrown out.

This way, I can call functions like c, list, data.frame indirectly and with an optional extra comma at the end:

cities <- ok.comma(c)(
  "New York",
  "Washington",
  "Atlanta",
)

I can even put the definition of ok.comma into my .Rprofile file and redefine functions

c          <- ok.comma(base::c)
list       <- ok.comma(base::list)
data.frame <- ok.comma(base::data.frame)

so I can seamlessly do:

cities <- c(
  "New York",
  "Washington",
  "Atlanta",
)

I hope you find this useful.