Tuesday 13 March 2018

Decision trees for UK voting

The next data-analysis method I'm playing with is decision-tree regression.

It's a method often said to be included in a field of statistical computing (called machine learning, statistical learning, artificial intelligence, data mining, supervised learning, etc). Decision trees usefully split up datasets into groups, often using YES/NO questions at each split.

I'm using data from Qriously (date 2017-06-07) in the run-up to the UK general election. I'm looking only at England & Wales, and I've only considered 3 regressors: gender (0=F, 1=M), age, income. I've considered a YES/NO voting intention for the 5 biggest political parties.

The trees are below. Here are some key aspects that jump out:
  • Age seems to be the most important regressor for most parties.
  • CON seems to get many votes from older voters (except if they're poor).
  • CON gets few votes from younger votes (especially poorer voters).
  • CON's best group were older females (not males as one might expect - maybe this is simply a bias of longer life expectancy for females).
  • LAB/CON results are fairly inverted (as we might expect), i.e. poorer and younger voters favouring LAB. 
  • LAB's best group were young, poor females.
  • LAB's worst group are the 65+.
  • LIB seems to do best from low- and middle-income voters, more-so for male voters.
  • LIB's two worst groups are from (a) elderly richer females, and (b) poorer older voters.
  • GRN's voters are generally younger (the one exception being wealthier older females) -- young males is one key group.
  • GRN does badly with (a) older, poorer voters and (b) older, richer males.
  • UKIP voters are generally poorer. One key group being poorer younger males.
There are so many assumptions and drawbacks in these sorts of analyses - but anyway, interesting all the same.