Sunday, January 27, 2008

South Carolina Poll Errors























The polls had a bad day on Saturday in grossly underestimating the support for Barack Obama, though they nailed the Clinton and Edwards votes quite well. Not one poll came within the ten-ring, and the final poll of the primary understated Obama's vote by nearly 15 points. Several polls flirted just inside the 20-ring, and one hapless example of the consequences of poor question wording, the Clemson University poll, understated Obama support by nearly 30 points. The Clemson poll allowed 36% to remain "undecided", hopelessly biasing downward their estimates of candidate support, but especially so for Obama.

My colleague Mark Blumenthal has posted a nice comparison of these South Carolina results with those "terrible" polls from New Hampshire. Below is the same poll error chart for New Hampshire, but scaled the same as the one for South Carolina above.























The New Hampshire results were mostly inside the 10 ring and all were inside the 15 ring. And a couple even touched the 5 ring. Judged by distance from the bullseye, New Hampshire doesn't look that bad, certainly not compared to South Carolina.

But there is another difference, and this is where New Hampshire was terribly wrong and South Carolina not so bad: All but one of the New Hampshire polls had the wrong leader. None of the South Carolina polls, not even Clemson's, got the leader wrong.

So while the distance from the bullseye was quite a bit worse in South Carolina, the creation of confounded expectations was not. It was the expectations that were created and then confounded that make New Hampshire a polling disaster, while there has been little said about the polling errors in South Carolina. (Except here, where we care about such things all the time!)

The other interesting comparison is the parallel that the number 2 finisher in both South Carolina and New Hampshire was quite well estimated. The SC polls got Clinton within normal margin of error. And the New Hampshire polls also got the 2nd place finisher there, Obama, within reasonable error.

The problem in both cases is in the substantial underestimate of the first place finishers vote. The final choices of late deciding voters is a challenge for all polling, and perhaps especially so in primaries where there is no "party identification" to come home to if you can't make up your mind. In New Hampshire the Clinton win rested on significantly more voters supporting her than expected. In South Carolina is was the magnitude of the victory, rather than first place itself, that confounded the polling.

Increases in voter turnout in this cycle may be part of the story (a 75% increase in South Carolina), but here we see those late deciders breaking for different candidates, and yet in both cases for the ultimate winner. Second place results may on average be slightly low compared to the polls, but the first place "bonus" seems quite strong. At least for the Democrats. In the Republican South Carolina primary, both first and second place finishers were a bit underestimated, so there was not the same asymmetric error for first place. The New Hampshire Republican race also about equally understated the votes for first and second. The relatively lightly polled Michigan Republican race shows somewhat greater underestimate of first place (Romney) and second place (McCain). And in Nevada, with only 3 late polls, Romney was dramatically underestimated, while Ron Paul finished second but was only moderately underestimated.

So perhaps these reflect pollsters' difficulty in discerning the likely behavior of undecided voters, or perhaps these are last minute decisions to vote by "not-so-likely" voters who are screened out of the sample but who turn out for the ultimate winner in larger than expected numbers.

Turning to 2nd and 3rd place, the chart below shows that the polls had a pretty good day predicting the Clinton and Edwards votes. Despite some chatter about a late Edwards surge and a Clinton fall (including some evidence in our sensitive trend estimates that such a movement was occurring) most of the late polls were within the five-ring for 2nd and 3rd place, and all got the order of finish right.