Category: rblogs (Page 21 of 22)

All combinations for levelplot

2011-10-08 / Luis

In a previous post I explained how to create all possible combinations of the levels of two factors using expand.grid(). Another use for this function is to create a regular grid for two variables to create a levelplot or a contour plot.

For example, let’s say that we have fitted a multiple linear regression to predict wood stiffness (stiff, the response) using basic density (bd) and a measure of microfibril angle (t) as explanatory variables. The regression equation could be something like stiffness = 3.439 + 0.009 bd - 0.052 t. In our dataset bd had a range of 300 to 700 kg m^-3, while t had a range from 50 to 70.
Continue reading

On R versus SAS

2011-10-07 / Luis

A short while ago there was a discussion on linkedin about the use of SAS versus R for the enterprise. I have thought a bit about the issue but, as I do not use Linkedin, I did not make any comments there.

Disclaimer: I did use SAS a lot between 1992 and 1997, mostly for genetic evaluation, heavily relying on BASE, STAT, IML and GRAPH. From that point on, I was a light SAS user (mostly STAT and IML) until 2009. The main reason I left SAS was that I started using ASReml in 1997 and, around two years ago asreml-R, the R package version of ASReml. Through my job I can access any statistical software; if the university does not have a license, I can buy an academic one without any issues.
Continue reading

Linear regression with correlated data

2011-10-06 / Luis

I started following the debate on differential minimum wage for youth (15-19 year old) and adults in New Zealand. Eric Crampton has written a nice series of blog posts, making the data from Statistics New Zealand available. I will use the nzunemployment.csv data file (with quarterly data from March 1986 to June 2011) and show an example of multiple linear regression with autocorrelated residuals in R.

A first approach could be to ignore autocorrelation and fit a linear model that attempts to predict youth unemployment with two explanatory variables: adult unemployment (continuous) and minimum wage rules (categorical: equal or different). This can be done using:
Continue reading

R pitfall #1: check data structure

2011-10-05 / Luis

A common problem when running a simple (or not so simple) analysis is forgetting that the levels of a factor has been coded using integers. R doesn’t know that this variable is supposed to be a factor and when fitting, for example, something as simple as a one-way anova (using lm()) the variable will be used as a covariate rather than as a factor.

There is a series of steps that I follow to make sure that I am using the right variables (and types) when running a series of analyses. I always define the working directory (using setwd()), so I know where the files that I am reading from and writing to are.
Continue reading

All combinations of levels for two factors

2011-10-04 / Luis

There are circumstances when one wants to generate all possible combinations of levels for two factors. For example, factor one with levels ‘A’, ‘B’ and ‘C’, and factor two with levels ‘D’, ‘E’, ‘F’. The function expand.grid() comes very handy here:

combo <- expand.grid(factor1 = LETTERS[1:3], factor2 = LETTERS[4:6])
combo

factor1 factor2
1       A       D
2       B       D
3       C       D
4       A       E
5       B       E
6       C       E
7       A       F
8       B       F
9       C       F

Omitting the variable names (factor1 and factor 2) will automatically name the variables as Var1 and Var2. Of course we do not have to use letters for the factor levels; if you have defined a couple of factors (say Fertilizer and Irrigation) you can use levels(Fertilizer) and levels(Irrigation) instead of LETTERS...