|Real-estate agents try to profit from the controversial candidates. |
Image source: fox13now.com
The morning after the Brexit vote in June, I woke up and went online to see the results. I wasn’t really worried and frankly looking forward to getting this over with. For me, it was a question of common sense that our fellow Europeans should vote “remain”. Unless you live without any access to modern or traditional media (in which case you would not be reading this) you know that I was wrong.
Let us pretend you are trying to predict the outcome of the American presidential election, which is only a few days away. For simplicity, you will take the role of both the pollster and the forecaster.
Conducting data collection for an election poll is hard work, and much more complicated than tweeting out a short questionnaire to a few (thousand) followers. Polls, much like typical social science questionnaires, are designed to capture what people think, how they act or – in the case of voting – how they intend to act based on their beliefs and the available options. But who are these “people”? In national polls, the researchers intend to access a representative selection of members from different groups – women, men, minorities, rural or urban citizens – who are likely to vote. There is the first difficulty: considering that you would need a meaningful sample of each group, how big will your final data set be, and what kind of data does it contain? And what exactly is a good measure of “likelihood to vote”?
Young and older citizens voted systematically different in the 2016 Brexit referendum.
Image source: catholicherald.co.uk.
You might have successfully addressed some of these questions in one way or another and you obtained your data. Now, the forecaster takes over and the fun really begins. The process of predicting, or forecasting, election outcomes could be simplified to take current data and past behaviour to predict the future. The first question is now: How do you decrease the biases in the sample? One way of doing so is to give more weight to some subsamples of data than to others. But what weight do you give to which respondents? For example, there might be more or less the same number of white, Christian, college educated men as women in your polling data. However, men and women are not equally likely to actually leave the house and vote. You could attribute more weight to female voting intentions than male voting intentions, as more women than men have voted in past presidential elections.
|Women and men participating in the US presidential elections. |
Source: Center for American Women and Politics, Rutgers.
But what if this year that isn’t true? It doesn’t look likely, but it is not impossible, so you might want to take movements specific to this election into account, such as Donald Trump's popularity amongst military veterans.
Based on your data, informed by your assumptions you then have to go on to create a model of the future. Again, the forecaster has to chose what will be part of this prediction. The resulting model is fairly complicated, including a variety of sophisticated statistical techniques. To create such a forecasting model, you will have to consider - again - a number of different choices, all of which will influence the resulting prediction. If you are interested to know, for example, how a third-party candidate changes the election forecast, this article by Nate Silver can provide you with some insight.
In total, we have three different steps where you can substantially influence your estimation of the 2016 US presidential election, simply by deciding on who and what you ask, how you generalise this to the entire population and what factors are important in your prediction model. Knowing all this, it doesn't seem surprising anymore that the tight pro-Brexit referendum vote was not unanimously predicted.