Evolving notes, images and sounds by Luis Apiolaza

Category: r (Page 19 of 20)

Operating on datasets inside a function

There are times when we need to write a function that makes changes to a generic data frame that is passed as an argument. Let’s say, for example, that we want to write a function that converts to factor any variable with names starting with a capital letter. There are a few issues involved in this problem, including:

  • Obtaining a text version of the name of the dataset (using the substitute() function).
  • Looping over the variable names and checking if they start with a capital letter (comparing with the LETTERS vector of constants).
  • Generating the plain text version of the factor conversion, glueing the dataset and variable names (using paste()).
  • Parsing the plain text version of the code to R code (using parse()) and evaluating it (using eval()). This evaluation has to be done in the parent environment or we will lose any transformation when we leave the function, which is the reason for the envir() specification.

Continue reading

All combinations for levelplot

In a previous post I explained how to create all possible combinations of the levels of two factors using expand.grid(). Another use for this function is to create a regular grid for two variables to create a levelplot or a contour plot.

For example, let’s say that we have fitted a multiple linear regression to predict wood stiffness (stiff, the response) using basic density (bd) and a measure of microfibril angle (t) as explanatory variables. The regression equation could be something like stiffness = 3.439 + 0.009 bd - 0.052 t. In our dataset bd had a range of 300 to 700 kg m-3, while t had a range from 50 to 70.
Continue reading

On R versus SAS

A short while ago there was a discussion on linkedin about the use of SAS versus R for the enterprise. I have thought a bit about the issue but, as I do not use Linkedin, I did not make any comments there.

Disclaimer: I did use SAS a lot between 1992 and 1997, mostly for genetic evaluation, heavily relying on BASE, STAT, IML and GRAPH. From that point on, I was a light SAS user (mostly STAT and IML) until 2009. The main reason I left SAS was that I started using ASReml in 1997 and, around two years ago asreml-R, the R package version of ASReml. Through my job I can access any statistical software; if the university does not have a license, I can buy an academic one without any issues.
Continue reading

Linear regression with correlated data

I started following the debate on differential minimum wage for youth (15-19 year old) and adults in New Zealand. Eric Crampton has written a nice series of blog posts, making the data from Statistics New Zealand available. I will use the nzunemployment.csv data file (with quarterly data from March 1986 to June 2011) and show an example of multiple linear regression with autocorrelated residuals in R.

A first approach could be to ignore autocorrelation and fit a linear model that attempts to predict youth unemployment with two explanatory variables: adult unemployment (continuous) and minimum wage rules (categorical: equal or different). This can be done using:
Continue reading

« Older posts Newer posts »

© 2024 Palimpsest

Theme by Anders NorenUp ↑