I’ve been having a conversation for a while with @kamal_hothi and @aschiff on maps, schools, census, making NZ data available, etc. This post documents some basic steps I used for creating a map on ethnic diversity in schools at the census-area-unit level. This “el quicko” version requires 3 ingredients:
- Census area units shape files (available from Statistics New Zealand for free here).
- School directory (directory-school-current.csv available for free here).
- R with some spatial packages (also free).
We’ll read the school directory data, aggregate ethnicity information to calculate the Herfindahl–Hirschman Index of diversity and then plot it.
# School directory direc <- read.csv('directory-school-current.csv', skip = 3) # Total number of students for each ethnicity by Census Area Unit hhi <- aggregate(cbind(European..Pakeha, Maori, Pasifika, Asian, MELAA, Other) ~ Census.Area.Unit, data = direc, FUN = sum, na.rm = TRUE) # Function to calculate index <- function(x) { total <- sum(x, na.rm = TRUE) frac <- x/total h <- sum(frac^2) hn <- if(total > 1) (h - 1/total)/(1 - 1/total) else 1 return(hn) } # Calculate the index for each area hhi$hn <- apply(hhi[,2:7], 1, index) # Write data to use in QGis later write.csv(hhi, 'diversity-index-census-area-unit.csv', quote = FALSE, row.names = FALSE)
Then I moved to create a map in R, for the sake of it:
library(rgdal) # for readOGR library(sp) # for spplot # Reading shapefile cau <- readOGR(dsn='/Users/lap44/Dropbox/research/2015/census/2014 Digital Boundaries Generlised Clipped', layer='AU2014_GV_Clipped') # Joining with school ethnicity data (notice we refer to @data, as cau contains spatial info as well) cau@data <- data.frame(cau@data, hhi[match(cau@data[,"AU2014_NAM"], hhi[,"Census.Area.Unit"]),]) # Limiting map to the area around Christchurch spplot(cau, zcol = "hn", xlim = c(1540000, 1590000), ylim= c(5163000, 5198000))
And we get a plot like this:
Just because it is Monday down under.
P.S. Using the diversity-index-census-area-unit.csv
and joining it with the shapefile in QGIS one can get something prettier (I have to work on matching the color scales):
Map rendering is so much faster in QGIS than in R! Clearly the system has been optimized for this user case.
Hi, I am having trouble trying to reproduce your code. Especifically, you define a “Function to calculate” and I get this error message:
Error: unexpected ‘,’ in:
” h = sum(frac^2)
hn = if(total > 1,”
I would appreciate if you could explain to me what is this line supposed to do: hn = if(total > 1, (h – 1/total)/(1 – 1/total), 1)
Thanks.
Hi Adriana,
The code was seriously mangled by a problem with a WordPress plugin; I have cleaned the code and it should now run. Thanks for letting me know of the problem. That line was supposed to calculate and index when total was greater than 1 or just use 1 as the index.