Category: stats (Page 4 of 8)

Odd ratios for the different levels of stomach inflammation.

GM-fed pigs, chance and how research works

2013-06-21 / Luis

Following my post on GM-fed pigs I received several comments, mostly through Twitter. Some people liked having access to an alternative analysis, while others replied with typical anti-GM slogans, completely ignoring that I was posting about the technical side of the paper. This post is not for the slogan crowd (who clearly are not interested in understanding), but for people that would like to know more about how one would evaluate claims from a scientific article. While I refer to the pig paper, most issues apply to any paper that uses statistics.

In general, researchers want to isolate the effect of the treatments under study (diets in this case) from any other extraneous influence. We want control over the experimental conditions, so we can separate the effects of interest from all other issues that could create differences between our experimental units (pigs in this case). What could create ‘noise’ in our results? Animals could have different genetic backgrounds (for example with different parents), they could be exposed to different environmental conditions, they could be treated differently (more kindly or harshly), etc.

Ordinal logistic GM pigs

2013-06-14 / Luis

This week another ‘scary GMO cause disease’ story was doing the rounds in internet: A long-term toxicology study on pigs fed a combined genetically modified (GM) soy and GM maize diet. Andrew Kniss, a non-smokable weeds expert, mentioned in Twitter that the statistical analyses in the study appeared to be kind of dodgy.

Curious, I decided to have a quick look and I was surprised, first, by the points the authors decide to highlight in their results, second, by the pictures and captioning used in the article and, last, by the way of running the analysis. As I’m in the middle of marking assignments and exams I’ll only have a quick go at part of the analysis. As I see it, the problem can be described as ‘there is a bunch of pigs who were fed either non-GM feed or GM feed. After some time (approximately 23 weeks) they were killed and went through a CSI-like autopsy’, where part of the exam involved the following process:

Gratuitous picture: building blocks for research (Photo: Luis, click to enlarge).

Matrix Algebra Useful for Statistics

2012-12-17 / Luis

I was having a conversation with an acquaintance about courses that were particularly useful in our work. My forestry degree involved completing 50 compulsory + 10 elective^† courses; if I had to choose courses that were influential and/or really useful they would be Operations Research, Economic Evaluation of Projects, Ecology, 3 Calculus and 2 Algebras. Subsequently my PhD was almost entirely research based but I sort of did Matrix Algebra: Dorian lent me his copy of Searle’s Matrix Algebra Useful for Statistics and passed me a pile of assignments that Shayle Searle used to give in his course in Cornell. I completed the assignments on my own pace and then sat a crazy take-home exam for 24 hours.

A word of caution: the sample may have an effect

2012-09-27 / Luis

This week I’ve tried to i-stay mostly in the descriptive statistics realm and ii-surround any simple(istic) models with caveats and pointing that they are very preliminary. We are working with a sample of ~1,000 schools that did reply to Fairfax’s request, while there is a number of schools that either ignored the request or told Fairfax to go and F themselves. Why am I saying this? If one goes and gets a simple table of the number of schools by type and decile there is something quite interesting: we have different percentages for different types of schools represented in the sample and the possibility of bias on the reporting to Fairfax, due to potential low performance (references to datasets correspond to the ones I used in this post):

summary(standards$school.type)
#         Composite (Year 1-10)          Composite (Year 1-15)        Contributing (Year 1-6)
#                             1                             29                            403
#       Full Primary (Year 1-8)    Intermediate (year 7 and 8) Restricted Composite (Yr 7-10)
#                           458                             62                              1
#         Secondary (Year 7-15)
#                            56

Now let’s compare this number with the school directory:

summary(factor(directory$school.type))
#         Composite (Year 1-10)          Composite (Year 1-15)        Contributing (Year 1-6)
#                             4                            149                            775
#         Correspondence School        Full Primary (Year 1-8)    Intermediate (year 7 and 8)
#                             1                           1101                            122
#Restricted Composite (Yr 7-10)         Secondary (Year 11-15)          Secondary (Year 7-10)
#                             4                              2                              2
#         Secondary (Year 7-15)          Secondary (Year 9-15)                 Special School
#                           100                            238                             39
#              Teen Parent Unit
#                            20

~~As a proportion we are missing more secondary schools~~. We can use the following code to get an idea of how similar are school types, because the small number of different composite schools is a pain. If

# Performance of Contributing (Year 1-6) and
# Full Primary (Year 1-8) looks pretty much the
# same. Composites could be safely merged
qplot(school.type, reading.OK, 
      data = standards, geom = 'jitter')

qplot(school.type, writing.OK, 
      data = standards, geom = 'jitter')

qplot(school.type, math.OK, 
      data = standards, geom = 'jitter')

# Merging school types and plotting them colored
# by decile
standards$school.type.4 <- standards$school.type
levels(standards$school.type.4) <- c('Composite', 'Composite', 'Primary',
'Primary', 'Intermediate',
'Composite', 'Secondary')

qplot(school.type.4, reading.OK, colour = decile,
      data = standards, geom = 'jitter')

Representation of different schools types and deciles is uneven.

Different participations in the sample for school types. This type is performance in mathematics.

I’m using jittering rather than box and whisker plots to i- depict all the schools and ii- get an idea of the different participation of school types in the dataset. Sigh. Another caveat to add in the discussion.

P.S. 2012-09-27 16:15. Originally I mentioned in this post the lack of secondary schools (Year 9-15) but, well, they are not supposed to be here, because National Standards apply to years 1 to 8 (Thanks to Michael MacAskill for pointing out my error.)

New Zealand school performance: beyond the headlines

2012-09-24 / Luis

I like the idea of having data on school performance, not to directly rank schools—hard, to say the least, at this stage—but because we can start having a look at the factors influencing test results. I imagine the opportunity in the not so distant future to run hierarchical models combining Ministry of Education data with Census/Statistics New Zealand data.

At the same time, there is the temptation to come up with very simple analyses that would make appealing newspaper headlines. I’ll read the data and create a headline and then I’ll move to something that, personally, seems more important. In my previous post I combined the national standards for around 1,000 schools with decile information to create the standards.csv file.
Continue reading