First go and read An R wish list for 2012. None of the wishes came through in 2012. Fix the R website? No, it is the same this year. In fact, it is the same as in 2005. Easy to find help? Sorry, next year. Consistency and sane defaults? Coming soon to a theater near you (one day). Thus my wish list for 2012 is, very handy, still the wish list for 2013.
R as social software
The strength of R is not the software itself, but the community surrounding the software. Put another way, there are several languages that could offer the core functionality, but the whole ‘ecosystem’ that’s another thing. Softening @gappy3000’s comment: innovation is (mostly) happening outside the core.
This prompts some questions: Why isn’t ggplot2 or plyr in the default download? I don’t know if some people realize that ggplot2 is now one of the main attractions for R as data visualization language. Why isn’t Hadley’s name in this page? (Sorry I’m picking on him, first name that came to mind). How come there is not one woman in that page? I’m not saying there is an evil plan, but I’m wondering if (and how) the site and core reflect the R community and the diversity of interests (and uses). I’m also wondering what is the process to express these questions beyond a blog post. Perhaps in the developers email list?
I think that, in summary, my R wish for 2013 is that ‘The R project’—whoever that is—recognizes that the project is much more than the core download. I wish the list of contributors goes beyond the fairly small number of people with writing access to the source. I’d include those who write packages, those who explain, those who market and, yes, those who sell R. Finally, I wish all readers of Quantum Forest Palimpsest a great 2013.
P.S. Just in case, no, I’m not suggesting to be included in any list.
11 responses to “An R wish list for 2013”
I can live with R being as is. Its not a bad idea to have a robust base package and provide the means and the freedom to extend the base functionality to all levels and directions.
What really bothers me is some paleolithic remnants still bugging the base core. Why is there no native multicore support for the base functions? Why should one have to use some external package in order to properly utilize the massive CPU resources cheaply available nowadays? Try to read, and manipulate a multi gigabyte CSV data (broadly available in the Big Data age we are entering). Even Excel PowerPivot data read-to-aggregation times are sometimes faster.
My wish list for 2013: Multicore base R functions.
Multicore in the core would be great.
My wish list; functions to bridge the gap between the world of vectors and lists without awkward conversion, and unicode support. And, even though it would break back compatibility I’d love to see the use of the ‘[[‘ operator swapped with the ‘[‘ operator.
Ha! that ‘[[‘ is one of my pet peeves too. Agree on the other wish too!
My R wish list for 2013 is for total CRAN integration into Debian/Ubuntu/Linux Mint, Red Hat/Oracle/CentOS/Fedora, SUSE/openSUSE and Gentoo package dependency structures. Much of CRAN is in Debian / Ubuntu already, a bit less is in Fedora and openSUSE. But, barring licensing issues, there’s no reason nearly all of CRAN can’t be in the major Linux distros’ packaging dependency structures.
I’d also like to see source packages for RStudio. I asked the RStudio team about this and they said it wasn’t on their priority queue. You can’t get into most distros without a source package.
Whose job is to get all of CRAN in those distros? Concerning Rstudio, isn’t the source available here? https://github.com/rstudio/rstudio
Have a look at C2D4U. Its pretty complete, for Ubuntu users at least.
For the casual user, better integration of GUIs into the core package would be useful. Perhaps the option of choosing one to download along with the core? Rstudio does an excellent job, for example, and support for interfaces would greatly increase R uptake. The R learning curve is worth the power it brings, but why not make the transition just a little easier?
How about compiling to machine code? like so: https://github.com/jtalbot/riposte for example. Also, the memory management could be modernized (R doesn’t defragment its memory after objects get deleted, so smaller and smaller memory objects can be created during a session; this eventually clutters memory and hampers working with data objects of say 200-500M [all depending on your PC’s memory off course, but you get my drift])
But some great things happened in 2012 as well!
– beginning support for 64-bit integers (yay to big matrices!)
– parallel package now in the core (or was that 2011?)
– the grid package is in the core now (and Paul Murrel is writing awesomely about it in the last couple of R-journals)
As for who is in the core team: Brian Ripley was quite clear about that during his useR!2010 keynote: people get included in the R core team when it is more work (for the core team) to leave them out than to include them. Although R is open source, the governance of its development certainly is not. See for example this interesting blog: http://www.r-bloggers.com/the-open-governance-index-results-for-the-r-project/
For me, my wishes for R came through in Python. No joking! I am on Python like an addict for 10 years now. I tried quitting multiple times, but only R got me close to almost a year of hardly looking at python. Now, though, we have Pandas.
Pandas gets a lot of stuff right, which R missed. Being “Pythonic”, Pandas isn’t as elegant in some regards as R, but more consistent, and a lot faster.
For some computations, numpy is still king. And the next big thing might be “blaze”, which claims to be a “next generation” of numpy, allowing parallel computations more easily, and being able to compute on disk, in a database or in memory.
The ‘contributors’ list is based on contributions to base R, which means that it’s largely about contributions in the 1995-2000 period before R got popular and reliable. If you look at the mailing lists for that time, there aren’t any obvious candidates for people who were missed. There certainly were women active in statistical computing and graphics in that period, but they weren’t involved with R. For example, Di Cook and Deborah Swayne and Heike Hofmann were all working on interactive graphics (which you couldn’t do with R).
I do agree it would be good to have some official pointer to people who have made major package contributions, such as Hadley. I had hoped that crantastic would grow into that function, but it hasn’t really taken off.