Palimpsest

Evolving notes, images and sounds by Luis Apiolaza

Page 13 of 71

Even worse than pedigree errors: selecting for the wrong thing

Imagine this: you have been patiently assessing trees for wood density (selection criterion) thinking that you’re improving wood stiffness (objective trait). Stiffer trees produce stiffer and more dimensionally stable wood, higher value, more profit. Your choice of density sounds reasonable, on average denser wood tends to be stiffer… except that wood is not getting stiffer.

If you look in the literature, wood density is this sort of universal, canonical trait: it is correlated to pretty much everything, it’s easy to assess and highly heritable. However, there is a big problem, as the relationship between density and stiffness changes with age. This points to something important: it is essential to have a good understanding of the traits we want to breed for.

Wood nerdiness: Stiffness depends on the product of density by velocity^2. Velocity is a proxy for the angle of cellulose microfibrils in the secondary cell walls (MFA); the faster the velocity, the lower the angle, a small angle leads to higher stiffness. The variability of stiffness in young trees is dominated by the variability of MFA, while the variability of stiffness in older trees is dominated by the variability of density. This is the root of the problem:

We were selecting for density when to improve the stiffness for the first rings of the trees we needed to select for velocity. Wasted time, wasted effort. On the plus side, trees got denser, which was good news for carbon sequestration. A positive, unintended consequence.

In between plant and animal breeding 1

Let’s start with the obvious: trees are plants and—unless you are breeding Ents—they do not walk around. Therefore, the first obvious statement is that tree breeding heavily relies on experimental designs to account for environmental variability.

But trees are much larger and long-lived than corn or potatoes, so we need much larger trials, in land that’s often not flat, because if it were flat you’d be growing a plant crop. So our trials often need incomplete blocks, and within-trial spatial analysis tends to make sense.

Trees are also much closer to the wild populations. Most tree breeding programmes are not beyond a 4th generation from undomesticated trees, so we have shallow pedigrees, unlike plants and animals.

Another difference is that we normally don’t test a few cultivars, but we are testing thousands of genotypes, which puts us closer to animal breeding. Animal breeding breeders often assume a set of “standard” genetic parameters for genetic evaluation; instead, like other plant-people, we estimate genetic parameters from the trials under analysis. This means we may end up trying to run a multivariate analysis with, say, 100 trials (sites) with half a million genotypes, using an tree (animal) model BLUP, which doesn’t work, unless we move to a factor analytic, reduced tree (animal) model, etc with all sort of compromises. We are still looking for the right level of complexity for genetic analyses.

Trees also have a very clear distinction between the objective traits and selection criteria. Traits have an effect on profit and normally are expressed at rotation age (say ~25 years in radiata pine), like volume per ha, wood stiffness, stem form, etc. Criteria are easier, earlier and cheaper to assess, usually at 1/4 to 1/3 of rotation, like stem diameter and height, acoustic velocity, etc.

Trees are heterogeneous, anisotropic, hard to quantify sometimes. We invest substantial efforts developing phenotyping tools to describe these gigantic organisms. Sometimes, in the wind, they look just like Ents.

Do I have pedigree errors?

We just finished a genetic analysis, got breeding values for all our selection criteria, combined them with genetic parameters and economic weights in a selection index (I) that predicts the total genetic-economic value in dollars (H). We rank our trees from best to worst, perhaps with some constraint on relatedness, and go to the field to collect genetic material.

  1. Perhaps the trial is not well labelled, so we are in the wrong block.
  2. Or we are entering perpendicular to it, so the rows are the columns.
  3. Or the tree labels are gone, so we are picking up the wrong tree.
  4. Or we identified the right tree, but set the wrong label when collecting seeds or cuttings.
  5. We grafted the cuttings and set a new, incorrect label. Or we sowed the seeds in the wrong part of the nursery.
  6. Maybe the grafts ended up in the wrong part of our orchard, or with the wrong labels.
  7. Did we properly label the pollen we just collected. And dried. And stored.
  8. Is the pollen from the male parent going to the right female?
  9. Once we collected the cones (or capsules, or fruits) are they separate in the kiln, and the machine that removes the seeds?
  10. Are the seeds sown with the right labels?
  11. Are the seedlings going to the right place in the trial? Go to 1.

We could genotype the trees to check for pedigree errors. But see 1, 2, 3 & 4. Is the pedigree wrong, or the sample for genotyping, or both?

The question is not if we have errors in the pedigree. One would need superhuman luck to avoid having any errors in a small breeding programme and a real miracle if you have close to a million trees with records. The real questions are:

  • Can we reduce future errors?
  • Do the pedigree errors we have make a significant (in the sense of important, not statistically) difference to the breeding and deployment programmes?

For example, over the years I have seen huge improvements in Proseed (the largest tree seed producer in New Zealand). Multiple validation steps, QR codes for orchard blocks, pollen, cones under processing, etc. There are similar changes going in parts of the breeding programme.

What’s the percentage of errors that remain in the system? I don’t know. Does it make a substantial difference? Maybe.

Are there other problems that could be even bigger? Yes. Maybe breeding for the wrong thing, but that’s another post.

How old is your favourite language?

We often forget for how long we’ve been writing code in specific languages. For example, I started using SAS in 1992 for the analysis of progeny trials, Python to control Fortran sampling simulations in 1997, and R for general statistics in 1998. Your favourite language could be fairly old:

Fortran: 66 years old
COBOL: 64 yo
Lisp: 63 yo
BASIC: 59 yo
C: 51 yo
SAS: 51 yo
SQL: 49 yo
MATLAB: 44 yo
C++: 38 yo
Python: 32 yo
R: 30 yo
Java: 28 yo
Ruby: 28 yo
Javascript: 27 yo
Clojure: 16 yo
Julia: 11 yo
Elixir: 11 years old

All that glitters is not GxE interaction

We are going over the fifth version(*) of a manuscript with a Ph.D. student and colleagues, polishing some details before submission, thinking about one of the results: basically there is little, if any, Genotype by Environment (GxE) interaction.

There is a long history of studying GxE interaction in radiata pine, with parallels in the development of statistical techniques. From basic models with a few sites with common parents, to massive multivariate, factor analytic, animal (tree) model BLUP with huge levels of imbalance, showing substantial GxE in some cases.

This time, however, we have a number of extremely well connected sites, clonal replication, SNP-based pedigree, factor analytic covariance matrices, etc. And there is almost no GxE interaction for stem diameter (a low heritability trait) and wood basic density (a high heritability trait).

Could it be that a large-proportion of reported GxE interaction relates to data structure? If that’s the case, Is there much point on trying to explain the “interaction” with environmental variables? 🤔 I’m not saying that this happens all the times, but I have seen the issues quite a few times already.

(*) One of those cases where the journey—learning how to pitch the problem, emphasising the most relevant parts, etc.—is almost more important than the destination. We are training researchers 😉.

« Older posts Newer posts »

© 2024 Palimpsest

Theme by Anders NorenUp ↑