Evolving notes, images and sounds by Luis Apiolaza

Should I reject a manuscript because the analyses weren’t done using open source software?

“Should I reject a manuscript because the analyses weren’t done using open software?” I overheard a couple of young researchers discussing. Initially I thought it was a joke but, to my surprise, it was not funny at all.

There is an unsettling, underlying idea in that question: the value of a scientific work can be reduced to its computability. If I, the reader, cannot replicate the computation the work is of little, if any, value. Even further, my verification has to have no software cost involved, because if that is not the case we are limiting the possibility of computation to only those who can afford it. Therefore, the almost unavoidable conclusion is that we should force the use of open software in science.

What happens if the analyses were run using a point-and-click interface? For example SPSS, JMP, Genstat, Statistica, and a few other programs allow access to fairly complex analytical algorithms via a system of menus and icons. Most of them are not open source nor generate code for the analyses. Should we ban their use in science? One could argue that if users only spend the time and learn a programming language (e.g. R or Python) they will be free of the limitations of point-and-click. Nevertheless, we would be shifting accessibility from people that can pay for an academic license for a software to people that can learn and moderately enjoy programming. Are we better off as research community by that shift?

There is another assumption: open software will always provide good (or even appropriate) analytical tools for any problem. I assume that in many cases OSS is good enough and that there is a subset of problems where it is the best option. However, there is another subset where it is suboptimal. For example, I deal a lot with linear mixed models used in quantitative genetics, an area where R is seriously deficient. In fact, I should have to ignore the last 15 years of statistical development to run large problems. Given that some of the data sets are worth millions of dollars and decades of work, Should I sacrifice the use of best models so a hypothetical someone, somewhere can actually run my code without paying for an academic software license? This was a rhetorical question, by the way, as I would not do it.

There are trade-offs and unintended consequences in all research policies. This is one case where I think the negative effects would outweigh the benefits.

Gratuitous picture: I smiled when I saw the sign with the rightful place for forestry (Photo: Luis).
Gratuitous picture: I smiled when I saw the sign with the rightful place for forestry.

P.S. 2013-12-20 16:13 NZST Timothée Poisot provides some counterarguments for a subset of articles: papers about software.

5 Comments

  1. drannmaria

    When I saw this on twitter, I thought it was a joke. I couldn’t believe someone asked that question. When I review articles my comments are based on my evaluation of the scientific merit of the work – as my teenaged daughter would add – duh!

  2. Karl Haro von Mogel

    What’s wrong with Forestry?

    • Luis

      Nothing wrong. Usually we are in a different building but due to post-earthquake repairs we’ll be for over a year sharing a ‘hard’ sciences building.

  3. Tim (@tpoi)

    Question: how can you prove that point and click analyses have been done correctly? I think you are confusing computability and the ability to reproduce and validate results. I don’t care about computability, but I do care about the ability to reproduce science.

    • Luis

      In some cases you can; e.g. Genstat & Stata generate code from clicking in the menu, so it’s possible to run it exactly again. An obvious answer is that, even if you can rerun the analysis, that doesn’t prove the analysis is in fact correct, it only shows that you can get exactly the same numbers.

      Furthermore, you are confusing the ability to reproduce numbers with reproducing science. There are many components to science and the analysis is in fact one of the last parts. Replicating some of my research would require access to genetic material, ten years to grow it under my experimental design and $0.25M to assess the samples. I can give you the code for free, the rest, well that would be a bit of a problem.

© 2024 Palimpsest

Theme by Anders NorenUp ↑