Evolving notes, images and sounds by Luis Apiolaza

Category: r (Page 3 of 20)

Reading a folder with many small files

One of the tools we use in our research is NIR (Near-Infrared Spectroscopy), which we apply to thousands of samples to predict their chemical composition. Each NIR spectrum is contained in a CSV text file with two numerical columns: wavelength and reflectance. All files have the same number of rows (1296 in our case), which corresponds to the number of wavelengths assessed by the spectrometer. One last thing: the sample ID is encoded in the file name.

As an example, file A1-4-999-H-L.0000.csv’s contents look like:

8994.82461,0.26393
8990.96748,0.26391
8987.11035,0.26388
8983.25322,0.26402
8979.39609,0.26417
...

Continue reading

From character to numeric pedigrees

In quantitative genetic analyses we often use a pedigree to represent the relatedness between individuals, so this is accounted in the analyses, because the observations are not independent of each other. Often this pedigree contains alphanumeric labels, and most software can cope with that.

Sometimes, though, we want to use numeric identities because we would like to make the data available to third parties (other researchers, publication), and there is commercial sensitivity about them. Or just want to use a piece of software that can’t deal with character identities.

Continue reading

Reducing friction in R to avoid Excel

When you have students working in a project there is always an element of quality control. Some times the results just make sense, while others we are suspicious about something going wrong. This means going back to check the whole analysis process: can we retrace all the steps in a calculation (going back to data collection) and see if there is anything funny going on? So we sat with the student and started running code (in RStudio, of course) and I noticed something interesting: there was a lot of redundancy, pieces of code that didn’t do anything or were weirdly placed. These are typical signs of code copied from several sources, which together with the presence of setwd() showed unfamiliarity with R and RStudio (we have a mix of students with a broad range of R skills).

But the part that really caught my eye was that the script read many Near Infrared spectra files, column bound them together with the sample ID (which was 4 numbers separated by hyphens) and saved the 45 MB file to a CSV file. Then the student opened the file and split the sample ID into 4 columns, deleted the top row, saved the file and read it again into R to continue the process. Continue reading

Keeping track of research

If you search for data analysis workflows for research there are lots of blog posts on using R + databases + git, etc. While in some cases I may end up working with a combination like that, it’s much more likely that reality is closer to a bunch of emailed Excel or CSV files.

Some may argue that one should move the whole group of collaborators to work the right way. In practice, well, not everyone has the interest and/or the time to do so. In one of our collaborations we are dealing with a trial established in 2009 and I was tracking a field coding mistake (as in happening outdoors, doing field work, assigning codes to trees), so I had to backtrack where the errors were introduced. After checking emails from three collaborators, I think I put together the story and found the correct code values in a couple of files going back two years.

Continue reading

Calculating parliament seats allocation and quotients

I was having a conversation about dropping the minimum threshold (currently 5% of the vote) for political parties to get representation in Parliament. The obvious question is how would seat allocation change, which of course involved a calculation. There is a calculator in the Electoral Commission website, but trying to understand how things work (and therefore coding) is my thing, and the Electoral Commission has a handy explanation of the Sainte-Laguë allocation formula used in New Zealand. So I had to write my own seat allocation function: Continue reading

« Older posts Newer posts »

© 2024 Palimpsest

Theme by Anders NorenUp ↑