I try to be economical when writing code; for example, I tend to use single quotes over double quotes for characters because it saves me one keystroke. One area where I don’t do that is when typing TRUE and FALSE (R accepts T and F as well), just because it is clearer to see in code and syntax highlighting kicks in. That’s why I was surprised to read Jason Morgan’s post in that it is possible to redefine T and F and get undesirable behavior.
Playing around it is quite easy to redefine other fundamental constants in R. For example, I posted in Twitter:
> pi  3.141593 > pi <- 2 > pi*2  4
Ouch, dangerous! I tend to muck around with matrices quite a bit and, being a friend of parsimony, I often use capital letters to represent them. This would have eventually bitten me if I had used the abbreviated TRUE and FALSE. As Kevin Ushey replied to my tweet, one can redefine even basic functions like ‘+’ and be pure evil; over the top, sure, but possible.
9 responses to “R pitfalls #4: redefining the basics”
I was doing some Tukeys HSD results and innocently decided to call my vector letters, then wondered why my vector was 26 characters long rather than the 6 it should have been. Guess where R stores its alphabet! Easily done!
R is a very large language with lots of reserved (although redefinable) keywords. It pays to pay attention all the time.
It certainly does. I always check to make sure that i get what i wanted (which is how i discovered the letters reservation)
Yeah,well, this is true in pretty much every language out there. Heck, I knew folks who thought it was funny to add this line to their coworkers’ .login file: “alias ls=logout” .
I doubt that someone would change “pi”, but T and F might change by accidental reassignment.
Running some error checking on at least the most common possibilities might be of use. I guess, though that these statements would have to come at the end of your script to make sure nothing was overwritten. Something along the lines of:
if (!identical(T, TRUE)) stop("'T' has been reassigned to ", T)
I doubt that someone would do it on purpose, but I can think of a number of acronyms related to my area of work for which it would make sense to use pi as a variable name.
I’ve had some nasty bugs due to redefining T to a transition matrix in my early days of R programming; I was pretty annoyed when I found out the language would let me do something so dangerous.
One redefinition in R is very useful for cross-platform work, e.g. when you are developing a script on a Mac but someone else will also be using it on Windows. This allows all the calls to quartz() to open a new graph window to still function on Windows (and could be easily swapped to go the other way):
# when collaborating, need to swap windows() vs quartz() calls. This does that nicely:
# (courtesy of https://stat.ethz.ch/pipermail/r-help/2008-December/181899.html)
That’s handy indeed.