How does it fit?

There is a question, or group of questions, that often comes up when chatting with colleagues and students: How does X fit in the breeding program? or How do you keep track of X, Y and Z in the breeding program? or, even, How do you cope with so much stuff that keeps on coming to the breeding program? By the way, when I say breeding program, I mostly mean the genetic evaluation part of the program, which I consider the central (dare I say most important?) part.

Now, to answer the question I have to acknowledge that I’ more of a mathematics than a biology person; I like to learn a few basics from which I can derive everything else. Therefore, unsurprisingly, I think of only two relationships when talking about new(ish) technologies, drumroll…

the breeder’s equation (gain = intensity x accuracy x variability / time) and the mixed model equations (y = Xb + Za + … + e). Boring, I know.

I turn whatever new element in two questions:

How will it affect gain by potentially changing variability, accuracy, selection intensity or time to selection? and
Where does it enter the mixed model equations?

I have talked about the breeder’s equation before, and keep in mind that I often think of the multivariate version, because we are selecting for a selection index, and working on improving a multitrait breeding objective.

When talking about the mixed models, I think at two levels, single environment and multienvironment. At the single location, the model basically accounts for an experimental design, a mating design (via pedigree/genomics) and a potentially very long etc. Say we get LiDAR from drones then we can get more Xs in the model (topographic indices from the DTM, competition from other individuals from the crown height model, etc) but sometimes we get more Ys (tree heights, for example). Or we get new assessments from Near Infrared Spectroscopy from core samples: more Ys. Or we get hyperspectral imaging which can produce more Xs describing aspect of the terrain, and more Ys as selection criteria for drought or disease, etc.

Once we move across environments, there is access to a huge amount of environmental datasets (both climate and soil related) that can be used to explain changes of ranking GIVEN that we have good connectedness between environments. If we don’t, my assumption is that we are chasing statistical noise.

After trying this for a while, it turns into a reflex. Hey, what if we get data from “shiny new thing” now that’s a lot cheaper? Use one side of a napkin to figure out how it affects the breeder’s equation, and then the other side of the napkin to check implications on the genetic evaluation system.

In the old days you could have used an envelope or a cigarette pack for writing and figuring out the problem, but who posts stuff or smoke these days?