Evolving notes, images and sounds by Luis Apiolaza

Category: stats (Page 7 of 8)

Teaching with R: the tools

I bought an Android phone, nothing fancy just my first foray in the smartphone world, which is a big change coming from the dumb phone world(*). Everything is different and I am back at being a newbie; this is what many students feel the same time they are exposed to R. However, and before getting into software, I find it useful to think of teaching from several points of view, considering that there are several user cases:
Continue reading

Covariance structures

In most mixed linear model packages (e.g. asreml, lme4, nlme, etc) one needs to specify only the model equation (the bit that looks like y ~ factors...) when fitting simple models. We explicitly say nothing about the covariances that complete the model specification. This is because most linear mixed model packages assume that, in absence of any additional information, the covariance structure is the product of a scalar (a variance component) by a design matrix. For example, the residual covariance matrix in simple models is R = I σe2, or the additive genetic variance matrix is G = A σa2 (where A is the numerator relationship matrix), or the covariance matrix for a random effect f with incidence matrix Z is ZZ σf2.
Continue reading

Maximum likelihood

This post is one of those ‘explain to myself how things work’ documents, which are not necessarily completely correct but are close enough to facilitate understanding.

Background

Let’s assume that we are working with a fairly simple linear model, where we only have a response variable (say tree stem diameter in cm). If we want to ‘guess’ the diameter for a tree (yi) our best bet is the average (μ) and we will have a residual (εi). The model equation then looks like:
Continue reading

Simulating data following a given covariance structure

Every year there is at least a couple of occasions when I have to simulate multivariate data that follow a given covariance matrix. For example, let’s say that we want to create an example of the effect of collinearity when fitting multiple linear regressions, so we want to create one variable (the response) that is correlated with a number of explanatory variables and the explanatory variables have different correlations with each other.

There is a matrix operation called Cholesky decomposition, sort of equivalent to taking a square root with scalars, that is useful to produce correlated data. If we have a covariance matrix M, the Cholesky descomposition is a lower triangular matrix L, such as that M = L L'. How does this connect to our simulated data? Let’s assume that we generate a vector z of random normally independently distributed numbers with mean zero and variance one (with length equal to the dimension of M), we can create a realization of our multivariate distribution using the product L z.
Continue reading

On R versus SAS

A short while ago there was a discussion on linkedin about the use of SAS versus R for the enterprise. I have thought a bit about the issue but, as I do not use Linkedin, I did not make any comments there.

Disclaimer: I did use SAS a lot between 1992 and 1997, mostly for genetic evaluation, heavily relying on BASE, STAT, IML and GRAPH. From that point on, I was a light SAS user (mostly STAT and IML) until 2009. The main reason I left SAS was that I started using ASReml in 1997 and, around two years ago asreml-R, the R package version of ASReml. Through my job I can access any statistical software; if the university does not have a license, I can buy an academic one without any issues.
Continue reading

« Older posts Newer posts »

© 2024 Palimpsest

Theme by Anders NorenUp ↑