Evolving notes, images and sounds by Luis Apiolaza

Category: data curious(Page 1 of 3)

Imagine that someone stops you on the street and asks “How many hectares of plantations do we need for a pulp mill that produces 1 million tonnes per year of Eucalyptus pulp in Chile?” They don’t need a highly accurate result but a ballpark figure, the right order of magnitude. A Fermi estimate.

How many assumptions do we need?

1. We need 4 cubic metres of wood for a metric tonne of pulp (wood density 0.5 ton/m3 and 0.5 pulp yield)
2. Harvest age 12 years
3. Productivity 25 m3/year/ha

Using 1. we need 4 m3/ton x 1,000,000 ton = 4,000,000 m3 of wood per year. Using 2. and 3. we see that 1 ha produces 25 m3/year/ha x 12 year = 300 m3/ha.

Therefore we need 4,000,000 m3/year / (300 m3/ha) = 13,333.33 ha/year and because we need the same amount in year 1, 2, …, 12 (Harvest age) and we keep on planting forever, the total is 13,333.33 ha/year x 12 year = 160,000 ha.

If you have been paying attention, you’ll notice that we divide and multiply by the rotation (12 years) so we can simplify the calculation back to:

product conversion (4 m3/ton) x capacity (1,000,000 ton/year) / productivity (25 m3/year/ha) = 160,000 ha.

We know that none of those numbers is perfectly correct, but put together they give us an idea of the magnitude of the problem. We can play with them: change site productivity, conversion rate, add safety margins, etc.

Now let’s say that we read of people complaining because a Chilean company announces a 2.5 million tonnes short fibre mill in Brasil. That would need 160,000 ha x 2.5 = 400,000 ha. Massive. As a comparison, INFOR tells us that the whole Eucalyptus estate in Chile is about 900,000 ha and that’s already used by the existing pulp mills, bioenergy producers, etc.

Just from the resource access point of view, having a pulp mill that size would need increasing the country’s Eucalyptus forest estate by roughly 50%. That gives some context to the speculation about the reasons for the investment in Brasil.

Over the birdsite dumpster fire. Emily Harvey was asking:

do you know of any good guidelines/advice for what one should do to sense check and make sure they understand any data before using it?

I replied the following:

Typically, I might be very familiar with the type of data and its variables (if it is one of my trials) or chat/email multiple times with the owner of the dataset(s) so I can check that:

• units and recorded values match. If units are mm, for example, the magnitudes should make sense in mm.
• the order of assessments and experimental/sampling design match: people often get lost in trials or when doing data collection, recording the wrong sampling unit codes.
• dates are OK. I prefer 2023-04-07; anyway, this is often a problem when dealing with Excel data.
• if we are using environmental data that it matches my expectation about the site. Have found a few weather station problems doing that, where rainfall was too low (because there was a sensor failure).
• the relationship between variables are OK. Example of problems: tall and too skinny trees, fat and short ones, suspicious (unless broken, etc), diameter under bark smaller than over bark, big etc.
• levels of factor match planned levels (typically there are spelling mistakes and there are more levels). Same issue with locality names.
• map coverage/orientation is OK (sometimes maps are sideways). Am I using the right projection?
• joins retain the appropriate number of rows (I mean table joins using merge or left_join in R, etc).
• Missing values! Are NA coded correctly or with zeros, negative numbers? Are they “random”?
• If longitudinal data: are older observations larger (or do we get shrinking trees?)
• etc

Of course these questions are dataset dependent and need to be adapted to each separate situation. Finally: Do results make any sense?

There is a lot of talk about the skills needed for working in Statistics/Data Science, with the discussion often focusing on theoretical understanding, programming languages, exploratory data analysis, and visualization. There are many good blog posts dealing with how you get data, process it with your favorite language and then creating some good-looking plots. However, in my opinion, one important skill is curiosity; more specifically being data curious.

Often times being data curious doesn’t require statistics or coding, but just searching for and looking at graphs. A quick example comes from Mike Dickinson’s tweet: “This is extraordinary: within a decade, NZers basically stopped eating lamb. 160 years of tradition scrapped almost overnight.” Continue reading

After writing a blog post about the paper “Sustainability and innovation in staple crop production in the US Midwest” I decided to submit a formal comment to the International Journal of Agricultural Sustainability in July 2013, which was published today. As far as I know, Heinemann et al. provided a rebuttal to my comments, which I have not seen but that should be published soon. This post is an example on how we can use open data (in this case from the USDA and FAO) and free software (R) to participate in scientific discussion (see supplementary material below).

The text below the *** represents my author’s version provided as part of my Green Access rights. The article published in the International Journal of Agricultural Sustainability [copyright Taylor & Francis]; is freely available online at http://dx.doi.org/10.1080/14735903.2014.939842).
Continue reading

A few days ago I came across Jack Heinemann and collaborators’ article (Sustainability and innovation in staple crop production in the US Midwest, Open Access) comparing the agricultural sectors of USA and Western Europe. While the article is titled around the word sustainability, the main comparison stems from the use of Genetically Modified crops in USA versus the absence of them in Western Europe.

I was curious about part of the results and discussion which, in a nutshell, suggest that “GM cropping systems have not contributed to yield gains, are not necessary for yield gains, and appear to be eroding yields compared to the equally modern agroecosystem of Western Europe”. The authors relied on several crops for the comparison (Maize/corn, rapeseed/canolasee P.S.6, soybean and cotton); however, I am going to focus on a single one (corn) for two reasons: 1. I can’t afford a lot of time for blog posts when I should be preparing lectures and 2. I like eating corn. Continue reading

© 2024 Palimpsest

Theme by Anders NorenUp ↑