Author: Luis (Page 65 of 71)

On R, bloggers, politics, sex, alcohol and rock & roll

2011-10-20 / Luis

Yesterday morning at 7 am I was outside walking the dog before getting a taxi to go to the airport to catch a plane to travel from Christchurch to Blenheim (now I can breath after reading without a pause). It was raining cats and dogs while I was walking doggyo, thinking of a post idea for Quantum Forest; something that I could work on without a computer. Then I remembered that I told Tal Galili that I would ‘mention r-bloggers’ in a future post. Well, Tal, this is it.

Large applications of linear mixed models

2011-10-19 / Luis

In a previous post I summarily described our options for (generalized to varying degrees) linear mixed models from a frequentist point of view: nlme, lme4 and ASReml-R^†, followed by a quick example for a split-plot experiment.

But who is really happy with a toy example? We can show a slightly more complicated example assuming that we have a simple situation in breeding: a number of half-sib trials (so we have progeny that share one parent in common), each of them established following a randomized complete block design, analyzed using a ‘family model’. That is, the response variable (dbh: tree stem diameter assessed at breast height—1.3m from ground level) can be expressed as a function of an overall mean, fixed site effects, random block effects (within each site), random family effects and a random site-family interaction. The latter provides an indication of genotype by environment interaction.
Continue reading

Lattice when modeling, ggplot when publishing

2011-10-18 / Luis

When working in research projects I tend to fit several, sometimes quite a few, alternative models. This model fitting is informed by theoretical considerations (e.g. quantitative genetics, experimental design we used, our understanding of the process under study, etc.) but also by visual inspection of the data. Trellis graphics—where subsets of data are plotted in different ‘panels’ defined by one or more factors—are extremely useful to generate research hypotheses.

There are two packages in R that have good support for trellis graphics: lattice and ggplot2. Lattice is the oldest, while ggplot2 is probably more consistent (implementing a grammar of graphics) and popular with the cool kids and the data visualization crowd. However, lattice is also quite fast, while ggplot2 can be slow as a dog (certainly way slower than my dog).
Continue reading

Linear mixed models in R

2011-10-17 / Luis

A substantial part of my job has little to do with statistics; nevertheless, a large proportion of the statistical side of things relates to applications of linear mixed models. The bulk of my use of mixed models relates to the analysis of experiments that have a genetic structure.

A brief history of time

At the beginning (1992-1995) I would use SAS (first proc glm, later proc mixed), but things started getting painfully slow and limiting if one wanted to move into animal model BLUP. At that time (1995-1996), I moved to DFREML (by Karen Meyer, now replaced by WOMBAT) and AIREML (by Dave Johnson, now defunct—the program I mean), which were designed for the analysis of animal breeding progeny trials, so it was a hassle to deal with experimental design features. At the end of 1996 (or was it the beginning of 1997?) I started playing with ASReml (programed by Arthur Gilmour mostly based on theoretical work by Robin Thompson and Brian Cullis). I was still using SAS for data preparation, but all my analyses went through ASReml (for which I wrote the cookbook), which was orders of magnitude faster than SAS (and could deal with much bigger problems). Around 1999, I started playing with R (prompted by a suggestion from Rod Ball), but I didn’t really use R/S+ often enough until 2003. At the end of 2005 I started using OS X and quickly realized that using a virtual machine or dual booting was not really worth it, so I dropped SAS and totally relied on R in 2009.
Continue reading

Maximum likelihood

2011-10-14 / Luis

This post is one of those ‘explain to myself how things work’ documents, which are not necessarily completely correct but are close enough to facilitate understanding.

Background

Let’s assume that we are working with a fairly simple linear model, where we only have a response variable (say tree stem diameter in cm). If we want to ‘guess’ the diameter for a tree (y_i) our best bet is the average (μ) and we will have a residual (ε_i). The model equation then looks like:
Continue reading