Sunday, March 31, 2013

R and the last comma

In R, every comma matters. When creating a vector, c(1, 2, 5) will do the right thing, but add one unfortunate comma and c(1, 2, 5,) will greet you with a deadly Error in c(1, 2, 5, ) : argument 4 is empty.

Other languages like Perl are less strict when defining basic data structures: having a comma after the last item is allowed. This can be particularly useful when items are specified on multiple lines as in this example:

my @cities = (
  "New York",

Because the last line item is syntactically no different than any other, I can comment, uncomment, add, remove, swap items anywhere within the definition without having to worry about which lines should have a comma or not (they all should.)

Can R's default behavior be overridden? Yes, using the following functional:

The functional acts as a wrapper around any function: it creates an identical function but such that the last argument, if missing, is thrown out.

This way, I can call functions like c, list, data.frame indirectly and with an optional extra comma at the end:

cities <- ok.comma(c)(
  "New York",

I can even put the definition of ok.comma into my .Rprofile file and redefine functions

c          <- ok.comma(base::c)
list       <- ok.comma(base::list)
data.frame <- ok.comma(base::data.frame)

so I can seamlessly do:

cities <- c(
  "New York",

I hope you find this useful.

Sunday, January 6, 2013

Search and replace: Are you tired of nested `ifelse`?

It happens all the time: you have a vector of fruits and you want to replace all bananas with apples, all oranges with pineapples, and leave all the other fruits as-is, or maybe change them all to figs. The usual solution? A big old nested `ifelse`:

Ok, that didn't look too bad, especially with the code and fruits nicely aligned. But what if I had a lot of fruits to change and little patience? Wouldn't it be nice if R had a built-in function for doing multiple search and replace? Someone please tell me if there is already such a function. If not, here is one I wrote that builds a nested `ifelse` function by recursion:

Note that I named the function after the `decode` SQL function. Here are a couple examples:

Feel free to use it with your favorite fruits or vegetables! Cheers!

P.S.: I wrote this function as an answer to this S.O. question. Thank you to Matthew Lundberg for sharing ideas.

Photo source:

Sunday, November 11, 2012

How I cracked Troyis (the online flash game)

Troyis™ is an addictive online flash game where you move a chess knight through increasingly difficult puzzles. After hours and hours of playing, sometimes late into the night, I decided I'd waste even more of my time and write a little program that can play the game on its own. The result is quite amazing -- imagine Garry Kasparov riding Seabiscuit! See for yourself:

The rest of this post will show you how this little guy (a mix of R and AutoHotKey) works but first, let's review the rules and go through a little theory.

Game rules

The rules are quite simple: as the game starts, a chess knight is placed at the top left corner of a square board where only a few squares are painted in white. The goal is for you to move the knight with your mouse, and make it visit all the white squares, as quickly as possible.

As you progress through the levels, the size of the board increases and the time limit for solving each puzzle decreases.

Finding an optimal path

In the beginning, learning how the knight moves and how to go from one square to another faster and faster can get you far into the game. But eventually, as you start mastering the game, what will really make a difference is how much time you can save by not re-visiting squares you once visited before. In other words, how short your path to a solution can be. Ideally, the shortest path would be one that visits each square only once, as shown below.

In graph theory, the problem of finding a path that visits each vertex of an undirected graph exactly once is a well-studied problem known as the Hamiltonian path problem, a special case of another classic: the travelling salesman problem. The problem is NP-complete, a class of problems so difficult that there is no known efficient algorithm for systematically finding a solution, especially as the problem's size increases. Luckily for us, Troyis puzzles are always small enough (at most 46 vertices and a maximum degree of 8) that we can come with very decent algorithms that will do just fine in most situations.

What I implemented is a backtracking algorithm. All possible paths are explored recursively: at each step, a path of length n can give rise to multiple paths of length n+1 by considering all allowed moves. I also used a number of rules for abandoning bad paths as early as possible. For example, since it is easy to see that if a non-visited square has only one non-visited neighbour then it has to be the final vertex of a hamiltonian path, then we can cut short any path that has two such squares as it won't provide a solution. To increase the chances of finding a hamiltonian path quickly, I also included a simple heuristic: at each step of the recursion, we first try to move the knight to the square that has the fewest possible onward moves, as an attempt to preserve good overall graph connectivity.

It is only later, while researching for this blog post that I found my algorithm is an implementation of the Warnsdorff's rule for solving an eerily similar problem to ours: the Knight's tour. Oh well...

The program

To me, writing the Hamiltonian path solver seemed like the easy part. I did it in R. What I thought would be really hard is to embed the solver in a higher level program that can interface with my internet browser. That was until I found AutoHotKey, a simple programming language for writing automation scripts under Windows. The main program is thus an AutoHotkey script GUI (main.ahk) with two main actions:

  • SelectFrame which asks the user to locate the chessboard by drawing a rectangle in a click, hold and drag fashion. Why? I thought that asking the user where the chessboard is would save me a lot of work trying to figure it out programmatically from a full screenshot.
  • Solve which does all the heavy work, in this order:
    1. Takes a snapshot of the chessboard.
    2. Calls a R script (solver.R) that successively:
      • parses the chessboard snapshot into a matrix of booleans (to be visited or not),
      • finds a Hamiltonian path, i.e., an optimal path that visits each square only once,
      • writes a AutoHotkey script (a series of mouse clicks) for solving the puzzle.
    3. Runs the AutoHotkey script just created above.

The code


People interested in checking out the code are free to do so on github.

Comments welcome!

Monday, July 16, 2012

Optical Art with R

Last week, in a post entitled Bridget Riley exhibition in London, the author Markus Gesmann wrote an R script reproducing one of Riley's famous art pieces: Movement in Squares.

This reminded me of my own first "brush" with Op art. It was in art class years ago, our professor (Madame VitrĂ©) had asked that we recreate an interesting piece. The concept in itself was easy: pick two points and draw a number of lines through each of them. Where they intersect, the two sets of lines create a multitude of polygons that can be filled in black or white alternatively, for quite a dramatic result.

As you can imagine, it was tedious work on paper. Years later, I'd get a thrill at doing it in just a couple minutes using Paint on Windows. Well, here I am again in 2012, now a grown-up programmer:

The example above was run using P = 2, N = 30, and colors = c("black", "white"). And here is a nice series using three points and colors from the brewer palette:
P = 3, N = 12,
colors = brewer.pal(3, "Reds")
P = 3, N = 15,
colors = brewer.pal(3, "Purples")
P = 3, N = 15,
colors = brewer.pal(3, "Blues")
P = 3, N = 9,
colors = brewer.pal(3, "Greens")

Madame VitrĂ© would be proud. (or horrified?)

Tuesday, May 1, 2012

A gallery view for Craigslist

A gallery view for Craigslist

As much as I love Craigslist, I sometimes find the interface a bit limited.

My biggest wish? That there was an option for showing the search results as an image gallery, like eBay has. This could prove quite useful for browsing things like antiques, furniture, art: categories where I only need a quick glance at an object's picture and I immediately know if I am interested or not. Craigslist does have an option for embedding pictures to the search results, but their size and resolution are so small it is quite impractical.

Here is a short R program I have put together that does the job. As much as possible, I have tried to break it into small and meaningful modules so it will be easy to maintain and build upon.

First, a function for turning a query into a Craigslist URL:

Then a function that reads the search results and recombines them into an image gallery:

Finally, an interactive function for wrapping everything together:

Not quite a full R/Craigslist API, but one step in the right direction!

Thursday, April 5, 2012

The 50 most used R packages

Ask anyone what makes R a great language, one argument that often comes back is its very active community. Proof is the impressive number of packages contributed by developers from all horizons and backgrounds. The CRAN website alone lists 3,725 packages but are they all reliable or even useful? Certainly not.

The website is a great resource for finding out which packages may be more interesting or useful  than others. There, users can rate packages, provide reviews, also just list the ones they are using. While packages can be sorted alphabetically or by their average user rating, there is surprisingly no option for sorting them by number of users, a nice feature that could provide a "quick" answer to this bold question: "What are the 50 most used R packages?".

As I tried today to ask for help on, my question was politely turned down as not being constructive. Lesson learned! So I decided to take the bull by the horns and solve the question by myself, programmatically that is. And what a better tool than R for solving this? Here is what I came up with:

Voila! Personally, I find this little program so ridiculously short, I think it speaks for itself about how great it is to work with R and its contributed packages!

A comment: If you go on, you'll notice that each page is limited to 50 packages. By scraping only 10 pages, my resulting data.frame has 500 packages which is only a subset of the 4,020 packages currently listed on But since the pages I scraped had the packages sorted by user ratings, it should be plenty: it is safe to assume that the 50 packages with the most users are within the 500 packages with best ratings. This way, the script runs a bit faster and we're not using too much of the server's bandwidth.

As a bonus, the word cloud that illustrates this article was built as follows:

Last word. If you enjoyed this article, please consider joining and adding what packages YOU like to use. I think it will serve the whole community. (And if you were wondering, no, I am not affiliated; just a happy user!)