Category: r (Page 16 of 20)

Tall big data, wide big data

2011-12-13 / Luis

After attending two one-day workshops last week I spent most days paying attention to (well, at least listening to) presentations in this biostatistics conference. Most presenters were R users—although Genstat, Matlab and SAS fans were also present and not once I heard “I can’t deal with the current size of my data sets”. However, there were some complaints about the speed of R, particularly when dealing with simulations or some genomic analyses.

R, academia and the democratization of statistics

2011-12-12 / Luis

I am not a statistician but I use statistics, teach statistics and write about applications of statistics in biological problems.

Last week I was in this biostatistics conference, talking with a Ph.D. student who was surprised about this situation because I didn’t have any statistical training. I corrected “any formal training”. On the first day one of the invited speakers was musing about the growing number of “amateurs” using statistics—many times wrongly—and about what biostatisticians could offer as professional value-adding. Yes, he was talking about people like me spoiling the party.
Continue reading

On the (statistical) road, workshops and R

2011-12-03 / Luis

Things have been a bit quiet at Quantum Forest during the last ten days. Last Monday (Sunday for most readers) I flew to Australia to attend a couple of one-day workshops; one on spatial analysis (in Sydney) and another one on modern applications of linear mixed models (in Wollongong). This will be followed by attending The International Biometric Society Australasian Region Conference in Kiama.

I would like to comment on the workshops to look for commonalities and differences. First, both workshops heavily relied on R, supporting the idea that if you want to reach a lot of people and get them using your ideas, R is pretty much the vehicle to do so. It is almost trivial to get people to install R and RStudio before the workshop so they are ready to go. “Almost” because you have to count on someone having a bizarre software configuration or draconian security policies for their computer.
Continue reading

Generic radiata pine in Northern Tasmania.

Do we need to deal with ‘big data’ in R?

2011-11-22 / Luis

David Smith at the Revolutions blog posted a nice presentation on “big data” (oh, how I dislike that term). It is a nice piece of work and the Revolution guys managed to process a large amount of records, starting with a download of 70GB and ending up with a series of linear regressions.

I’ve spent the last two weeks traveling (including a visit to the trial below) and finishing marking for the semester, which has somewhat affected my perception on dealing with large amounts of data. The thing is that dealing with hotel internet caps (100MB) or even with my lowly home connection monthly cap (5GB) does get one thinking… Would I spend several months of internet connection just downloading data so I could graph and plot some regression lines for 110 data points? Or does it make sense to run a linear regression with two predictors using 100 million records?

Teaching with R: the tools

2011-11-02 / Luis

I bought an Android phone, nothing fancy just my first foray in the smartphone world, which is a big change coming from the dumb phone world^(*). Everything is different and I am back at being a newbie; this is what many students feel the same time they are exposed to R. However, and before getting into software, I find it useful to think of teaching from several points of view, considering that there are several user cases:
Continue reading