"R de jeu" is a play on word between R, the statistical programming language, and "aire de jeu", the French word for playground. Here, I will be posting a few articles about R programming, very irregularly as ideas come up and time allows.
Thursday, April 5, 2012
The 50 most used R packages
Ask anyone what makes R a great language, one argument that often comes back is its very active community. Proof is the impressive number of packages contributed by developers from all horizons and backgrounds. The CRAN website alone lists 3,725 packages but are they all reliable or even useful? Certainly not.
The crantastic.org website is a great resource for finding out which packages may be more interesting or useful than others. There, users can rate packages, provide reviews, also just list the ones they are using. While packages can be sorted alphabetically or by their average user rating, there is surprisingly no option for sorting them by number of users, a nice feature that could provide a "quick" answer to this bold question: "What are the 50 most used R packages?".
As I tried today to ask for help on stackoverflow.com, my question was politely turned down as not being constructive. Lesson learned! So I decided to take the bull by the horns and solve the question by myself, programmatically that is. And what a better tool than R for solving this? Here is what I came up with:
Voila! Personally, I find this little program so ridiculously short, I think it speaks for itself about how great it is to work with R and its contributed packages!
A comment: If you go on crantastic.org, you'll notice that each page is limited to 50 packages. By scraping only 10 pages, my resulting data.frame has 500 packages which is only a subset of the 4,020 packages currently listed on crantastic.org. But since the pages I scraped had the packages sorted by user ratings, it should be plenty: it is safe to assume that the 50 packages with the most users are within the 500 packages with best ratings. This way, the script runs a bit faster and we're not using too much of the server's bandwidth.
As a bonus, the word cloud that illustrates this article was built as follows:
Last word. If you enjoyed this article, please consider joining crantastic.org and adding what packages YOU like to use. I think it will serve the whole community. (And if you were wondering, no, I am not affiliated; just a happy user!)
flodel.
Labels:
R
Subscribe to:
Post Comments (Atom)
This comment has been removed by the author.
ReplyDeleteGreat post, thanks for sharing. data.table ranks first today (5 June 2016).
ReplyDeleteThis is great, have you ever data mined a github R repository(s) like Hadley Wickham's for the most frequently used r commands?
ReplyDeleteThis is great, have you ever data mined a github R repository(s) like Hadley Wickham's for the most frequently used r commands?
ReplyDelete