Some weeks ago I received a couple of emails the same day: one asking me to submit a paper to an open access journal, while the other one was inviting me to be the editor of an ‘special issue’ of my choice for another journal. I haven’t heard before about any of the two publications, which follow pretty much the same model: submit a paper for $600 and—if they like it—it will be published. However, the special issue email had this ‘buy your way in’ feeling: find ten contributors (i.e. $6,000) and you get to be an editor. Now, there is nothing wrong per-se with open access journals, some of my favorite ones (e.g. PLoS ONE) follow that model. However, I was surprised by the increasing number of new journals that look at filling the gap for ‘I need to publish soon, somewhere’. Surprised until one remembers the incentives at play in academic environments.
I’ve ignored my quantitative geneticist side of things for a while (at least in this blog) so this time I’ll cover some code I was exchanging with a couple of colleagues who work for other organizations.
It is common to use diallel mating designs in plant and tree breeding, where a small number of parents acts as both males and females. For example, with 5 parents we can have 25 crosses, including reciprocals and selfing (crossing an individual with itself). Decades ago this mating design was tricky to fit and, considering an experimental layout with randomized complete blocks, one would have something like y = mu + blocks + dads + mums + cross + error. In this model dads and mums were estimating a fraction of the additive genetic variance. With the advent of animal model BLUP, was possible to fit something like y = mu + blocks + individual (using a pedigree) + cross + error. Another less computationally demanding alternative (at least with unrelated parents) is to fit a parental model, overlaying the design matrices for parents with something like this y = mu + blocks + (dad + mum) + cross + error. Continue reading
This week I’ve tried to i-stay mostly in the descriptive statistics realm and ii-surround any simple(istic) models with caveats and pointing that they are very preliminary. We are working with a sample of ~1,000 schools that did reply to Fairfax’s request, while there is a number of schools that either ignored the request or told Fairfax to go and F themselves. Why am I saying this? If one goes and gets a simple table of the number of schools by type and decile there is something quite interesting: we have different percentages for different types of schools represented in the sample and the possibility of bias on the reporting to Fairfax, due to potential low performance (references to datasets correspond to the ones I used in this post):
Now let’s compare this number with the school directory:
summary(factor(directory$school.type))
# Composite (Year 1-10) Composite (Year 1-15) Contributing (Year 1-6)
# 4 149 775
# Correspondence School Full Primary (Year 1-8) Intermediate (year 7 and 8)
# 1 1101 122
#Restricted Composite (Yr 7-10) Secondary (Year 11-15) Secondary (Year 7-10)
# 4 2 2
# Secondary (Year 7-15) Secondary (Year 9-15) Special School
# 100 238 39
# Teen Parent Unit
# 20
As a proportion we are missing more secondary schools. We can use the following code to get an idea of how similar are school types, because the small number of different composite schools is a pain. If
# Performance of Contributing (Year 1-6) and
# Full Primary (Year 1-8) looks pretty much the
# same. Composites could be safely merged
qplot(school.type, reading.OK,
data = standards, geom = 'jitter')
qplot(school.type, writing.OK,
data = standards, geom = 'jitter')
qplot(school.type, math.OK,
data = standards, geom = 'jitter')
# Merging school types and plotting them colored
# by decile
standards$school.type.4 <- standards$school.type
levels(standards$school.type.4) <- c('Composite', 'Composite', 'Primary',
'Primary', 'Intermediate',
'Composite', 'Secondary')
qplot(school.type.4, reading.OK, colour = decile,
data = standards, geom = 'jitter')
I’m using jittering rather than box and whisker plots to i- depict all the schools and ii- get an idea of the different participation of school types in the dataset. Sigh. Another caveat to add in the discussion.
P.S. 2012-09-27 16:15. Originally I mentioned in this post the lack of secondary schools (Year 9-15) but, well, they are not supposed to be here, because National Standards apply to years 1 to 8 (Thanks to Michael MacAskill for pointing out my error.)
Eric and I have been exchanging emails about potential analyses for the school data and he published a first draft model in Offsetting Behaviour. I have kept on doing mostly data exploration while we get a definitive full dataset, and looking at some of the pictures I thought we could present a model with fewer predictors.
Suicide is a tragic and complex problem. This week New Zealand’s Chief Coroner released its annual statistics on suicide, which come with several tables and figures. One of those figures refers to monthly suicides in the Christchurch region (where I live) and comes with an interesting comment:
Suicides in the Christchurch region (Timaru to Kaikoura) have risen from 67 (2010/11) to 81 (2011/12). The average number of suicides per year for this region over the past four years is 74. The figure of 67 deaths last year reflected the drop in suicides post-earthquake. The phenomenon of a drop in the suicide rate after a large scale crisis event, such as a natural disaster, has been observed elsewhere. [my emphasis]