Evolving notes, images and sounds by Luis Apiolaza

Category: python (Page 1 of 3)

Become an interfaith polyglot

I have been very busy with the start of the semester, teaching regression modelling. The craziest thing was that the R installation was broken in the three computer labs I was allocated to use. It would not have been surprising if I were talking about Python ( 🤣 ), but the installation script had a major bug. Argh!

Anyhow, I was talking with a student who was asking me why we were using R in the course (she already knew how to use Python). If you work in research for a while, particularly in statistics/data analysis, you are bound to bump onto long-lived discussions. It isn’t the Text Editor Wars nor the Operating Systems wars. I am referring to two questions that come up all the time in long threads:

  1. What language should I learn or use for my analyses?
  2. Should I be a Bayesian or a Frequentist? You are supposed to choose a statistical church.

The easy answer for the first one is “because I say so”: it’s my course. A longer answer is that a Domain Specific Language makes life a lot easier, as it is optimised to tasks performed in that domain. An even longer answer points to something deeper: a single language is never enough. My head plays images of Minitab, SAS, Genstat, Splus, R, ASReml, etc that I had to use at some point just to deal with statistics. Or Basic, Fortran, APL (crazy, I know), Python, Matlab, C++, etc that I had to use as more general languages at some point. The choice of language will depend on the problem and the community/colleagues you end up working with. Along your career you become a polyglot.

As an agnostic (in my good days) or an atheist (in my bad ones) I am not prone to join churches. In my research, I tend to use mostly frequentist stats (of the REML persuasion) but, sometimes, Bayesian approaches feel like the right framework. In most of my problems both schools tend to give the same, if not identical results.

I have chosen to be an interfaith polyglot.

Python not suitable platform for reproducible research

While [Active Papers] has achieved its mission of demonstrating that unifying computational reproducibility and provenance tracking is doable and useful, it has also demonstrated that Python is not a suitable platform to build on for reproducible research. Breaking changes at all layers of the software stack are too frequent.

Konrad Hinsen in Archiving Active Papers

I started using Python for my PhD around 1997, to control simulations I wrote using Fortran 90. I chose Python based on Konrad Hinsen’s writings at the time in a long-disappeared website. A few years later I moved all my work to R, which I found much more stable. I have some 20-year-old R base code that still runs. 😇

Incidentally, last year I wrote a series of posts on Some love for base R.

Flotsam 13: early July links

Man flu kept me at home today, so I decided to do something ‘useful’ and go for a linkathon:

Over and out.

Pythonic links

Before I forget: a few links about starting up in Python for scientific projects:

Now if we had a great Python library for linear mixed models life would be easier.

Late-April flotsam

It has been month and a half since I compiled a list of statistical/programming internet flotsam and jetsam.

  • Via Lambda The Ultimate: Evaluating the Design of the R Language: Objects and Functions For Data Analysis (PDF). A very detailed evaluation of the design and performance of R. HT: Christophe Lalanne. If you are in statistical genetics and Twitter Christophe is the man to follow.
  • Attributed to John Tukey, “without assumptions there can be no conclusions” is an extremely important point, which comes to mind when listening to the fascinating interview to Richard Burkhauser on the changes of income for the middle class in USA. Changes to the definition of the unit of analysis may give a completely different result. By the way, does someone have a first-hand reference to Tukey’s quote?
  • Nature news publishes RNA studies under fire: High-profile results challenged over statistical analysis of sequence data. I expect to see happening more often once researchers get used to upload the data and code for their papers.
  • Bob O’Hara writes on Why simple models are better, which is not positive towards the machine learning crowd.
  • A Matlab Programmer’s Take On Julia, and a Python developer interacts with Julia developers. Not everything is smooth. HT: Mike Croucher. ?
  • Dear NASA: No More Rainbow Color Scales, Please. HT: Mike Dickinson. Important: this applies to R graphs too.
  • Rafael Maia asks “are programmers trying on purpose to come up with names for their languages that make it hard to google for info?” These are the suggestions if one searches Google for Julia:
Unhelpful search suggestions.

That’s all folks.

« Older posts

© 2024 Palimpsest

Theme by Anders NorenUp ↑