I was reading a piece by Graeme Edgeler who, near the end, asked “Where are New Zealand’s bellwether electorates?”. I didn’t know where the data came from or how was the “index of disproportionality for each electorate” calculated, but I saw it mostly as an opportunity to whip up some quick code to practice the use of R and look at other packages that play well with the tidyverse.
The task can be described as: fetch Wikipedia page with results of the 2014 parliamentary election, extract the table with results by electorate, calculate some form of deviation from the national results, get the top X electorates with lowest deviation from national results.
A web search revealed that this page contains a whole bunch of results for the 2014 election and that the specific results I’m interested in are in table number 17 of the list created by html_nodes('table')
. Besides the tidyverse, I needed the packages rvest
for web scraping, magrittr
for using %<>%
(pipe and assign to original data frame) and lucid
for pretty printing the final table.
library(tidyverse) library(rvest) library(magrittr) library(lucid) election14 <- read_html('https://en.wikipedia.org/wiki/New_Zealand_general_election,_2014') election14 %>% html_nodes('table') %>% .[[17]] %>% html_table() %>% filter(Electorate != 'Electorate') -> electorate14 glimpse(electorate14)
Rather than reading the national results directly from Wikipedia I just typed them in code, as I already had them from some other stuff I was working on. My measure of “disproportionality for each electorate” was as sophisticated as the sum of squared deviations.
# Results for National, Labour, Green, NZ First, Conservative, Internet Mana & Māori national_results <- c(47.04, 25.13, 10.7, 8.66, 3.99, 1.42, 1.32) electorate14 %>% mutate(total_vote = apply(.[,2:8], 1, sum), dev = apply(.[,2:8], 1, function(x) sum((x - national_results)^2))) %>% arrange(dev) %>% slice(1:15) %>% lucid
# A tibble: 15 x 10 # Electorate National Labour Green `NZ First` Conservative `Internet Mana` Māori total_vote dev ## 1 Ōtaki 49.1 24.8 9.46 9.96 4.41 0.65 0.44 98.8 9.02 # 2 Hamilton West 47.7 25.7 8.21 10.8 4.67 0.72 0.56 98.4 13.2 # 3 Hamilton East 50 23.8 11 7.14 4.81 1 0.64 98.4 14.5 # 4 West Coast-Tasman 44.8 23.5 13 8.71 5.12 0.76 0.28 96.2 15.7 # 5 Napier 49.4 26 8.77 7.43 6.23 0.6 0.44 98.8 17.9 # 6 Hutt South 45.3 28 12.8 7.48 3.57 0.72 0.53 98.3 18 # 7 East Coast 48.6 22.7 9.21 11.8 4.08 1.17 0.95 98.6 20.7 # 8 Nelson 44.4 24.7 14.1 7.67 5.5 0.83 0.33 97.6 23.4 # 9 Invercargill 49.5 25.1 7.57 11.2 3.68 0.62 0.32 97.9 23.7 #10 Whanganui 47.3 25.5 7.21 12 5.02 0.73 0.58 98.3 25.4 #11 Northcote 50.7 22.1 11.6 7.32 4.31 0.95 0.46 97.5 26.3 #12 Wigram 42.9 28.7 12.8 8.56 3.61 0.76 0.47 97.8 35.4 #13 Christchurch Central 44.7 26.2 15.8 7.19 3.11 1.03 0.46 98.5 37 #14 Tukituki 52 22.8 8.57 7.6 6.56 0.68 0.52 98.8 43.3 #15 Port Hills 47 23.9 17.1 6.62 3.11 0.75 0.4 98.8 48.7
I’m sure there must be a ‘more idiomatic’ way of doing the squared deviation using the tidyverse. At the same time, using apply
came naturally in my head when writing the code, so I opted for keeping it and not interrupting the coding flow. The results are pretty similar to the ones presented by Graeme in his piece.
I’m getting increasingly comfortable with this mestizo approach of using the tidyverse and base R for completing tasks. Whatever it takes to express what I need to achieve quickly and more or less in a readable way.