Palimpsest

Evolving notes, images and sounds by Luis Apiolaza

Are PhDs a pyramid scheme?

If you are a professor in academia you are supposed to form/train new PhDs. In the past, those new PhDs would go to other universities and then train new PhDs, and their PhD students would go to other universities repeating this process ad infinitum. At some point, 20 years ago, 30 years ago—I don’t really know—we reached PhD saturation in academia. There were way too many PhDs for the number of available positions in universities.

It should be obvious by now that thinking of the package [getting a PhD + job in academia] is close to believing in the Tooth Fairy, Santa Claus or whoever is your favourite imaginary character. This doesn’t mean that you should not study a PhD, but that there should be a clear consideration that you will (most likely) have to work outside academia.

I struggle with this as a supervisor. I cannot in good conscience supervise a project *unless* I can see that the project/topic will set the student with skills to work for industry or some other avenue that is not a university.

In my personal opinion, PhDs for future academics are indeed a pyramid scheme.

Why is this trait I like getting worse in the breeding programme?

The short answer: because the trait you like is not part of the breeding objective and, therefore, has not an economic weight assigned to it. And if it doesn’t have an economic weight it has 0 (zero) economic importance.

A longer answer: in breeding there is a distinction between objective traits (which have an impact on profit), and the selection criteria (variables that are easy and cheap to assess, and that are correlated with the objective traits). They may even happen at different ages. For example, in forestry stem volume and wood stiffness at rotation age (say 25 years) can be objective traits for the production of structural timber. Stem diameter, wood density and standing tree velocity at age 8 can be selection criteria.

Some of the confusion may come from when people like a selection criterion (like wood density) and think the breeding programme is trying to improve that. In this example, we weren’t (at least at that time). We cared about volume and stiffness. Sacrificing levels of some selection criterion while pursuing the objective traits is perfectly fine if I am maximising value. And in a modern breeding programme you are pretty much always looking at value, not at a single trait.

If you find this interesting, you may also like Why did my breeding values go down?

Last day of teaching

…for this semester. I was really trying to keep my head above water, but gulp, glug, I kept on taking water in. There is a pile of marking and two exams coming my way in a few weeks. Anyone that could invite me to their home in Rarotonga? I need to recover from the always brutal semester steam roller.

Despite all the teaching, ideas keep on living for free in my head. Where to next? This is a common question when working in research: at some point the project has to be completed. Perhaps everything went well and the objectives were achieved, the findings were published, the student completed their PhD, etc. Or, perhaps, the whole thing was messy, or unattainable, or the experiment didn’t work out, or we run out of money.

Last weekend one of our students submitted his PhD, with chapters either published or somewhere in the publication pipeline. There is a sense of Where to next? From an implementation point of view, it is a matter of using the results, perhaps tweaking things here and there, but now it is an operational breeding programme issue. That topic will have to wait before I revisit it.

In conversations with a colleague in Chile (A) we talk ideas. Another colleague (B)informs us that our frontrunner was “too applied” for funding. It could make a significant practical difference, at least in my opinion, but the funding body has a strong preference for more “fundamental” research. The same funding body that does not like forestry too much, because it is “too slow”. When you put fundamental + forestry is hard to get results in 3 years of funding. Go figure.

B suggested another idea in which I am still getting my head around. Not quite my topic BUT I am a sucker for interesting problems and learning. Now reading about stuff that’s new for me, and see if I can connect it in a meaningful way to #breeding and #woodquality, and I don’t have to go all the way to Kevin Bacon’s degrees of separation.

I dislike (or should I say hate?) the push of Large Language Models (LLM) for writing. I can’t see the point, because Where is the terapeutic value of asking ‘write 300 words in Luis’ style’? I can, pardon, I need to write this because I can’t stop writing. I have to empty my head: it is 4:30 pm, Friday afternoon, the last day of teaching of this semester. Phew! And that’s how #academia feels today, ladies and gentlemen.

Back of the envelope calculations: pulp mill

Imagine that someone stops you on the street and asks “How many hectares of plantations do we need for a pulp mill that produces 1 million tonnes per year of Eucalyptus pulp in Chile?” They don’t need a highly accurate result but a ballpark figure, the right order of magnitude. A Fermi estimate.

How many assumptions do we need?

  1. We need 4 cubic metres of wood for a metric tonne of pulp (wood density 0.5 ton/m3 and 0.5 pulp yield)
  2. Harvest age 12 years
  3. Productivity 25 m3/year/ha

Using 1. we need 4 m3/ton x 1,000,000 ton = 4,000,000 m3 of wood per year. Using 2. and 3. we see that 1 ha produces 25 m3/year/ha x 12 year = 300 m3/ha.

Therefore we need 4,000,000 m3/year / (300 m3/ha) = 13,333.33 ha/year and because we need the same amount in year 1, 2, …, 12 (Harvest age) and we keep on planting forever, the total is 13,333.33 ha/year x 12 year = 160,000 ha.

If you have been paying attention, you’ll notice that we divide and multiply by the rotation (12 years) so we can simplify the calculation back to: 

product conversion (4 m3/ton) x capacity (1,000,000 ton/year) / productivity (25 m3/year/ha) = 160,000 ha.

We know that none of those numbers is perfectly correct, but put together they give us an idea of the magnitude of the problem. We can play with them: change site productivity, conversion rate, add safety margins, etc.

Now let’s say that we read of people complaining because a Chilean company announces a 2.5 million tonnes short fibre mill in Brasil. That would need 160,000 ha x 2.5 = 400,000 ha. Massive. As a comparison, INFOR tells us that the whole Eucalyptus estate in Chile is about 900,000 ha and that’s already used by the existing pulp mills, bioenergy producers, etc.

Just from the resource access point of view, having a pulp mill that size would need increasing the country’s Eucalyptus forest estate by roughly 50%. That gives some context to the speculation about the reasons for the investment in Brasil.

Exposing rather than hiding complexity

In the mid-1990s I was at Massey University in Palmerston North, centre of the known universe, where I was doing my PhD. During a short course I met Arthur Gilmour, the creator of ASReml (plain vanilla version, there was no R package yet then). I was really impressed by two things: 1- the software was insanely fast, particularly compared to the SAS scripts I was used to, and 2. How strange the syntax was for anything but the simplest cases.

I was stuck while coding some multivariate analysis, hitting my head against the wall when I complained to Arthur about the syntax. He told me that my problem was not with the syntax but with the matrices. That the syntax represented direct sums and Kronecker products. After that I read the code again, thinking of matrices(*) and suddenly the syntax made sense: there was complexity because the underlying matrix operations were quite exposed in the notation. Exposing these operations was one of the keys that made ASReml so powerful.

Morals of the story:

  • It helps to have a clue of what the software is supposed to be doing.
  • Genetic analyses are turtles/matrices all the way down.
  • Ask if you don’t understand. There is no point on suffering in silence.

(*) Good thing that I had gone through Searle’s “Matrix Algebra Useful for Statistics” guided/pushed by Dorian Garrick. It was hard work, but excellent background for dealing with linear mixed models.

« Older posts

© 2024 Palimpsest

Theme by Anders NorenUp ↑