Evolving notes, images and sounds by Luis Apiolaza

Category: linkedin (Page 1 of 13)

Time for correlations

A few posts ago I was talking about heritabilities (like here) and it’s time to say something about genetic correlations. This is how I explain correlations to myself or in meetings with colleagues. Nothing formal, mostly an analogy.

Say we have to draw a distribution of breeding values for one trait (X) and, rather than looking from the side, we look at it from the top. It looks like a straight line, where the length gives an idea of variability and the cross marks the mean. We can have another distribution (Y), perhaps not as long (so not so variable) or maybe longer.

Often variables will vary together (co-vary, vary at the same time) and we can show that by drawing the lines at an angle, where they cross at their means. If you look at the formula for the covariance (co-variance, because traits co-vary, get it?), we grab the deviation from the mean for the two traits for each of the observations, multiply them, add them all up and get their average. We get positive values for the product when both traits are above or below the mean; we get negative values when one trait is below the mean and the other above it. Covariances are a pain, as they can take any value. Instead we can use “standardised” covariances, that vary between -1 and 1: we call these things *correlations*.

If the angle between the distributions is less than 90 degrees, increasing the values of one of the traits is (on average) accompanied by an increase on the other trait. then we have a positive covariance and, therefore, a positive correlation. The smaller the angle, the closer to a correlation of 1.

If the angle is 0 degrees (or close to it), changing the value of one trait has no (or very little) effect on the other trait. Zero correlation.

If the angle is greater than 90 degrees, changing the value of one trait tends to reduce the values of the other trait. The closer the angle to 180 degrees (so the positive values of one distribution are closer to the negative values of the other distribution), the closer to a -1 correlation.

Why do we care about these correlations? We use them all over the place in breeding. Sometimes as a measure of trade-off, as in “if I increase X, what will happen with Y?” or correlated response to selection. We also use them to understand how much information in one trait is contained in another trait, as in “can I use X as a selection criteria for Y?”. And a bunch of other uses, as well. But that’s another post.

Diagram showing correlations as angles.

When heritability is high but the phenotype is dominated by the environment

I was reading a LinkedIn post that said “heritability is the extent to which differences in observed phenotypes can be attributed to genetic differences”.

There is this idea floating around assuming that if a trait is highly heritable, therefore genetics explains most differences we observe. I have seen it many times, both when people discuss breeding and even in political discussions. I vividly remember a think tank commentator stating that given IQ was highly heritable it is likely that millionaires make more money because their parents were more intelligent, or something along those lines.

I created the figure below using a dataset with wood basic density measurements (how much solid “stuff” you have in a set volume of wood) for trees growing in 17 different environments. The heritability of wood density is around 0.6; however, the differences between some environments are larger than the differences within environments.

We have to remember that heritabilities apply to specific populations and specific environments. Moreover, if we think of the mixed model analysis, we are fitting both fixed and random effects, so we are “correcting/controlling/putting individuals on the same footing” with our fixed effects, before having a look at the variation that is left over. We are then saying that out of that left over genetics explains a proportion of the variation (this is much smaller than the variation before accounting for other sources of variability).

In the case of wood density of radiata pine, the environment (particularly temperature explained by latitude and elevation and soil nutrients like boron) has a larger effect than genetics when looking across multiple trials. The trials with higher density are farther North in New Zealand, which is warmer. Once we are inside one of the trials, genetics explains 60% of the variability. In the same way, once we account for all other social differences, we are left with a much smaller level of variability to try explaining income differences with genetics.

Wood variability for trees in 17 progeny trials in New Zealand.

You realise that New Zealand is small when…

I was chatting with a colleague (Salvador Gezan) in April about teaching, learning and books, and he suggested “have a look at Rex Bernardo’s book”. I searched for it in my university library, no luck. I thought “well, just ask for it as an interlibrary loan. Surely we can borrow a copy from another university in the country and get it in a few days”.

I forgot all about the interloan for three months (!), when I received an email from my university library, saying something like “Hey, we couldn’t find the book you were looking for in NZ. However, we ordered a copy for you”. Then another email today, “Hey, please come and pick up your book from the central library”.

And here I am, at the library with THE copy of the book. We are that small in NZ.

P.S. Salvador and I both did our undergrad at the Universidad de Chile, just with a few years difference.

Start with the programming language and statistical approach used by your community

I have been very busy with the start of the semester, teaching regression modelling. The craziest thing was that the R installation was broken in the three computer labs I was allocated to use. It would not have been surprising if I were talking about Python ( 🤣 ), but the installation script had a major bug. Argh!

Anyhow, I was talking with a student who was asking me why we were using R in the course (she already knew how to use Python). If you work in research for a while, particularly in statistics/data analysis, you are bound to bump onto long-lived discussions. It isn’t the Text Editor Wars nor the Operating Systems wars. I am referring to two questions that come up all the time in long threads:

  1. What language should I learn or use for my analyses?
  2. Should I be a Bayesian or a Frequentist? You are supposed to choose a statistical church.

The easy answer for the first one is “because I say so”: it’s my course. A longer answer is that a Domain Specific Language makes life a lot easier, as it is optimised to tasks performed in that domain. An even longer answer points to something deeper: a single language is never enough. My head plays images of Minitab, SAS, Genstat, Splus, R, ASReml, etc that I had to use at some point just to deal with statistics. Or Basic, Fortran, APL (crazy, I know), Python, Matlab, C++, etc that I had to use as more general languages at some point. The choice of language will depend on the problem and the community/colleagues you end up working with. Along your career you become a polyglot.

As an agnostic (in my good days) or an atheist (in my bad ones) I am not prone to join churches. In my research, I tend to use mostly frequentist stats (of the REML persuasion) but, sometimes, Bayesian approaches feel like the right framework. In most of my problems both schools tend to give the same, if not identical results.

I have chosen to be an interfaith polyglot.

The purpose of a system is what it does (POSIWID)

This is a popular* dictum by systems theorist Stafford Beer, pointing out that the self-described purpose of a system (or an organisation) is not the same as its actual purpose. I am often reminded of POSIWID when companies or universities state their “values” but then we contrast them with what they actually value, via their applications of carrots and sticks.

Famously, Google used “Don’t be evil” in their corporate code of conduct, but fired employees complaining about the ethics of their AI projects. Or your organisation states that employee wellbeing is a priority, but it uses an “ambulance at the bottom of a cliff” approach; there is no prevention, but instead you are told to use mindfulness and meditation to reduce stress.

I tend to be sceptical about people and organisations insisting too much on their values; I rather see their results, which tend to reflect their true purpose in what they do.

*Popular in the sense of nerd popular, not pop-star popular.

Note: In the early 1970s Stafford Beer was involved in the development of Cybersyn, an attempt to plan the whole Chilean economy from a room connected to industry via 500 telex machines. Replica of Cybersyn in Centro Cultural La Moneda, Santiago.

« Older posts

© 2024 Palimpsest

Theme by Anders NorenUp ↑