Surveys are unlikely to be a priority right now. We are more than four months away from the 2020 elections and we have almost 20 months to go until halftime.
With this in mind, it is the perfect time to release the latest update to our FiveThirtyEight polls that we released today! We encourage you to take a look at the ratings – as well as our redesign of the interactive single-page page for each pollster that is more detailed than ever on how we calculate ratings.
- 1 How the polls went in 2020
- 2 Which pollsters did the best in 2020?
- 3 Live caller polls do not outperform other methods.
- 4 Methodological notes
What are survey scores? | Polling 101 by FiveThirtyEight
No, but seriously … I think it’s nice to have a little distance from the heat of an election cycle when it comes to polls. When I first looked at the polls in November, it came after the election had just been scheduled for Joe Biden – and after several anxious days of states slowly reporting their postal ballots, resulting in a “blue shift in several states.” “That seemed to have been wrongly named by the polls at the beginning. Nor did we know that Democrats would win control of the US Senate thanks to two January runoff elections in Georgia. Meanwhile, then-President Donald Trump still refused to admit. In this environment one decided mediocre Confused year for polls with a awful one that did not necessarily justify this conclusion.
What does 2020 look like in retrospect – and with the possibility of comparing the surveys more comprehensively with the final results? The rest of this article consists of four parts:
- First, our review of the 2020 overall survey using the same format we traditionally used to update our survey scores.
- Second, a look at which polling stations did best and worst in 2020.
- Third, our assessment of the short-term and long-term performance of the surveys using various methodological categories. And we’re going to announce an important change in how our polls are calculated. It is no longer clear that live caller phone surveys outperform other methods so that they no longer receive privileged status in the polls and election models of FiveThirtyEight.
- Finally, some other, relatively minor, technical note on changes in how survey scores are calculated. Some of you may want to skip this last part.
How the polls went in 2020
Our polling ratings database tracks all polls conducted in the last 21 days of the presidential election since 2000, as well as general elections for the President, Governor, US Senate, and House of Representatives since 1998. It also includes polls on special and runoffs for those offices. Technically, the data you see below covers the entire 2019-20 election cycle, although the bulk of it comes from elections on November 3, 2020. We also classify the Georgia Senate runoff elections, which take place on January 1st. 5, 2021, as part of the 2019-20 cycle.
First, let’s start with our preferred method of assessing survey accuracy: calculating the average error observed in the surveys. To do this, we compare the margin between the two top finishers in the survey with the actual results. For example, if Biden leads Trump in a state by 2 percentage points and Trump actually wins by 4 points, that would be a 6 point error.
In the table below, we calculate the average error for all surveys in our database for the 2019-20 period and comparison with previous cycles, excluding survey companies banned by FiveThirtyEight and weighting according to the productivity of a pollster in a given cycle. We break out the query error also after office.
|cycle||Primary||General||governor||US Senate||US house||Combined|
|1998||– –||– –||8.1||7.5||7.1||7.7|
|2001-02||– –||– –||5.3||5.4||5.4||5.4|
|2005-06||– –||– –||5.1||5.2||6.5||5.7|
|2009-10||– –||– –||4.7||5.0||7.0||5.8|
|2013-14||– –||– –||4.5||5.3||6.8||5.3|
|2017-18||– –||– –||5.1||4.2||5.1||4.9|
For all surveys in the 2019-20 period, the surveys had an average error of 6.3 percentage points. This makes it the third worst of the 12 election cycles in our polls, better than 1998 (an average error of 7.7 points) and 2015-16 (6.8 points). However, don’t read too much about the difference between 2019-20 and 2015-16. Each subcategory of polls in 2015-16 (e.g., US Senate polls) was the same or more accurate than it was in 2019-20.
If you break the results down by election type, 2019-20 doesn’t look much better. It was the second worst of 12 gubernatorial cycles and the third worst of 12 US Senate cycles. In the races for the US house, the performance in 2020 was rather average. But 2020 had the highest average error of the six general presidential election cycles used in the polls (if only a tenth of a point worse than 2016). And it was tied to 2016 because it was the worst cycle for the president’s primary election … though the primary calendar offered some decent excuses as to why those races were difficult to vote on.
Will the stimulus package improve Democrats’ election prospects?
While the accuracy of the polls in 2020 was mediocre, it wasn’t a historical outlier either. The average total error of 6.3 points over the 2019-20 period is only slightly worse than the average error of all surveys since 1998, which is 6.0 points. There were also presidential years prior to the period our polls cover. like 1948 and 1980when the polls had significantly larger errors than in 2020.
While the survey industry is facing major challenges – including the fact that live caller phone surveys may no longer be the industry gold standard – it is also premature to conclude that the sky is falling. As you can see from the graph above, there isn’t a particularly clear statistical trend that shows that surveys have deteriorated over time. Yes, both 2016 and 2020 were rather bad years, but between them was an excellent year for the polls in 2018. And in their most recent test, the Georgia Senate drains, the polls were extremely accurate.
Of course there is a lot more to unpack here. Why have polls been fairly accurate in recent years in “emerging” swing states like Georgia and Arizona, but mostly terrible in the upper Midwest? Why did they do badly in 2016 and 2020, but did pretty well in the Trump-era elections – like the runoff in Georgia or the special election in the Alabama Senate in 2017 – than Trump himself didn’t took part? We don’t really have time to explore the landscape of theories in the middle of this already very long article, although these are topics that we’ve covered a lot at FiveThirtyEight. At the same time I hope that this view was helpful at the macro level and a development beyond the somewhat misinformed “Poll is broken! “Narrative.
Next, let’s review some other metrics to see how accurate the surveys were. One of them makes 2020 look a little better – while the other makes it look worse and gets to what we think is the biggest cause for concern for the future: not that the polls were necessarily The inaccurate, but that almost all errors went in the same direction and underestimated GOP support.
First, the hits and mistakes, or how often the polls “called” the winner. With this measure, the 2019-20 cycle was historically pretty average. The winner was correctly identified in 79 percent of the surveys throughout the cycle, which corresponds to our hit rate of 79 percent overall.
|cycle||Primary||General||governor||US Senate||US house||Combined|
|1998||– –||– –||85%||87%||49%||75%|
|2001-02||– –||– –||90||81||73||82|
|2005-06||– –||– –||90||92||72||84|
|2009-10||– –||– –||85||86||74||81|
|2013-14||– –||– –||80||77||74||77|
|2017-18||– –||– –||74||73||80||75|
In fact, this hit rate has been remarkably constant over time. With the exception of 2007-08, where a remarkable 88 percent of the polls identified the correct winner, between 75 percent and 84 percent of the winners have been correctly identified in every cycle since 1998. So, as a rough rule of thumb, you can assume that the polls are correct about four out of five times. Of course, that also means they miss about one in five times.
However, looking at hits and errors is not really our preferred way to gauge the accuracy of the query. Sure, Biden held on to win Wisconsin, for example, so the polls were technically “correct”. But no pollster should brag about it Biden wins by less than a full percentage point when the survey average increased it there by 8.4 points. Likewise, Biden won the national referendum and the Democrats won the referendum for the US House – but in both cases with a narrower than expected margin. Meanwhile, the polls got some of the closest states to the presidential race right, like Georgia and Arizona. But the polls aren’t always that lucky.
In any case, there is another, more important, metric that was more likely to impact survey performance in 2020. This is a statistical bias that does not calculate the size of the voting error, but in which direction (Democratic or Republican) the votes were missing.
|cycle||General||governor||US Senate||US house||Combined|
|1998||– –||R + 5.8||R + 4.5||R + 0.9||R + 3.8|
|1999-2000||R + 2.4||R + 0.2||R + 2.8||D + 1.2||R + 1.8|
|2001-02||– –||D + 3.5||D + 2.0||D + 1.4||D + 2.6|
|2003-04||D + 1.1||D + 1.9||D + 0.8||D + 2.1||D + 1.4|
|2005-06||– –||D + 0.4||R + 2.1||D + 1.1||D + 0.1|
|2007-08||D + 1.0||R + 0.1||D + 0.1||D + 1.4||D + 0.9|
|2009-10||– –||R + 0.2||R + 0.8||D + 1.3||D + 0.4|
|2011-12||R + 2.5||R + 1.6||R + 3.1||R + 3.2||R + 2.8|
|2013-14||– –||D + 2.3||D + 2.7||D + 3.9||D + 2.8|
|2015-16||D + 3.3||D + 3.1||D + 2.8||D + 3.4||D + 3.0|
|2017-18||– –||R + 0.9||EVEN||R + 0.8||R + 0.5|
|2019-20||D + 4.2||D + 5.6||D + 5.0||D + 6.1||D + 4.8|
|Every year||D + 1.3||D + 0.9||D + 0.7||D + 1.2||D + 1.1|
On average for the 2019-20 cycle, polls underestimated the Republican candidate’s performance by a whopping 4.8 percentage points! So the big problem in 2020 wasn’t that the polls were being conducted The imprecise – they were only slightly more imprecise than usual – but that they almost all missed in the same direction.
Interestingly, the tendency for Trump’s presidential race against Biden (4.2 points) was actually lower than that of races for Congress or the governor. But either way, that’s not a good accomplishment: it’s the largest bias in either direction in the cycles covered by our survey ratings database, beating the previous record of a Republican bias of 3.8 points in 1998.
If you went back before 1998, you could likely find years with more bias. Presidential elections and general election polls in Congress massively underrated Republicans 1980 for example – for example by about 7 points in the presidential race. And the final average of the general votes underestimated the Republicans by about 5 points in the GOP wave year 1994, we estimate.
In general, there wasn’t much consensus on which direction the trend was from year to year. For example, a Democratic overperformance compared to the polls in 2011-12 was followed by a Republican overperformance in 2013-14.
However, we believe that there is good reason to expect these types of mistakes to go one way or another – what we sometimes call systematic query errors – will be more of an issue in the future. From where? The systematic errors are not necessarily a function of the surveys themselves. Rather, this is due to the fact that in a time of intense political polarization and low allocation of tickets, the race results up and down the vote correlate strongly with each other.
In other words, mistakes – that overestimate the Democrat in one state and the Republican in another – are less likely to cancel each other out. If something about the polls made them overestimate the performance of the Democratic presidential candidate in Iowa, for example, they will likely do the same in a state similar to Wisconsin. If the polls overestimate the performance of the Democratic presidential candidate in Iowa, they will likely overestimate the performance of the Democratic Senate candidate in that state.
The old cliché that the electoral college really “50 separate competitionsIs very misleading in our nationalized, polarized electoral climate. It’s all interrelated, and for better or worse, it takes relatively fancy math to get a decent estimate of a party’s chance of winning the presidency or the Senate.
We know this is going to sound a little selfish as we are into electoral forecasting – and we’re not trying to turn it into an episode of Model Talk – but it is precisely because of this that electoral forecasting models are so valuable. They can help us understand how query errors work in real world conditions.
However, these correlations also make it difficult to assess the accuracy of the match. Why? Because they reduce the sample size. Technically, there were more than 500 races on November 3rd, when you count races for Congress, races for governor, and the votes of each state’s electoral college. That sounds like a lot of data. However, if all of the results are highly correlated, they may not tell you as much as you think.
For example, suppose you had a query error caused by the fact that Democrats were more likely to stay home during the COVID-19 pandemic and were therefore more likely to respond to surveys. This type of problem could result in your polls being democratic in almost all of these races. And what looked like many failures – underestimating Republicans in dozens of competitions! – really could only have had one root cause. So it may sound like writing November 3, 2020 off as “just a bad day” for pollsters – and even I wouldn’t go all so far – it’s closer to the truth than you might think.
However, it also begs the question of whether it is important that the polls continue to point in the same direction. In three of the last four cycles (2013-14, 2015-16, and 2019-20), all polls had a significant democratic bias. Again, this is a small sample size. (If you’d flipped a coin four times and it showed up three times, that wouldn’t be anything special at all.) It’s also worth noting that the polls in the cycle just prior to that, 2011-12, had a significant Republican bias.
Let me be clear – and this reflects my point of view as a journalist and avid consumer of surveys, not being a pollster myself – that from my place on the rafters, I don’t see 2020 as particularly remarkable. I think it’s mainly other critics and journalists (who may not have spent as much time comparing 2020 with previous elections as 1980) who lack perspective.
But the reason polls tended not to show a consistent The tendency over time is for people who actually do surveys to work really hard to keep it that way. Most pollsters won’t go into 2022 or 2024 thinking that 2020 was just bad luck. You will investigate the reasons for the query failure. It will be up to you to decide if more problems are likely or if much of the failure is due to certain circumstances like 2020, such as: B. COVID-19. And they will correct accordingly, or maybe even over-correct.
The industry will also make course corrections at the macro level. Techniques that worked comparatively well in 2020 are being mimicked. Comparable companies that have been comparatively successful will win more business.
And as I already indicated, our polls will also correct course – we will no longer be giving bonus points to live caller polls. But before we get into that, let’s take a quick look at how various pollsters fared in 2020.
Which pollsters did the best in 2020?
All right … so which pollsters made the most of a bad 2020? In an article from last year, we set out how pollers fared in the 2020 primaries. That is why I am sticking to the general election here. Here’s the average error, correct calls percentage, and statistical bias for all companies with 10 or more qualified surveys – plus ABC News / The Washington Post, which I’m including for transparency since ABC News owns FiveThirtyEight:
|Pollsters||Survey||Correctly called races||Statistical bias||Average Error|
|Data for progress||42||75||+5.0||+5.0|
|Public Policy Polling||31||63||+7.2||+7.2|
|Siena College / NYT conclusion||25th||76||+5.5||+5.5|
|Rasmussen Reports / POR||17th||68||+1.0||+2.8|
|Opinion Savvy / InsiderAdvantage||15th||47||+0.8||+3.5|
|Harris Insights & Analytics||13th||81||+3.2||+3.3|
|ABC News / The Washington Post *||7th||71||+5.5||+5.5|
First, let’s say greetings to those survey participants with the lowest average error. These were AtlasIntel (2.2 percentage points), Trafalgar Group (2.6 points), Rasmussen Reports / Pulse Opinion Research (2.8 points), Harris Insights & Analytics (3.3 points) and Opinion Savvy / InsiderAdvantage (3, 5 points).
These companies have some things in common. First, none of them are primarily survey respondents on live callers. Instead, they use different methods – online, IVR (or interactive voice response, which is an automated survey with recorded questions), and text messaging.
In fact, the live caller polls didn’t have a big general election. There aren’t that many of them in the table above. Of those who made the list, SSRS (an average error of 7.1 percentage points), Quinnipiac University (7.1 points), and Monmouth University (10.1 points) had poor general election cycles. Siena College / The New York Times Upshot (5.5 points) and ABC News / The Washington Post (5.5 points) did slightly better in comparison.
One thing you might notice about these non-live callers who have had a good 2020 is that some (if not all) have a reputation for being Trump or Republican. This is in part for reasons beyond the surveys themselves. For example, pollsters like to like appear on conservative talk shows or Conduct surveys on behalf of conservative outlets. But it’s also because they tended to show more favorable results for Trump in 2020 than the average poll.
However, here it is important to distinguish between Home effects and preload. House effects are the comparison of a survey with other surveys. Pre-tensioning This is how the survey compares with the actual election results. In the long run, it comes down to the bias. There is nothing wrong with having a home effect if you turn out to be right! In a year when most polls underestimated Trump and Republicans, the Trump-centric house-effects polls mostly turned out to be more accurate and less biased, although the Trafalgar Group still had a modest Republican bias (2.4 points).
Was Joe Biden lucky in 2020? | FiveThirtyEight Politics Podcast
One more note: some of these pollsters probably deserve a little more credit than they got. I say this even though not much love is lost between FiveThirtyEight and at least one of these polling stations: the Trafalgar Group. Trafalgar Group has Main problems with transparencyFor example, and we criticized them for it. But their poll was pretty good last cycle and they didn’t get much credit for it because they happened to label some of the narrow states “wrong”. These pollsters often showed that Biden narrowly lost to states like Wisconsin, Michigan, and Pennsylvania, and narrowly won instead. Still, a poll that found Biden Pennsylvania lost 2 points, for example, was actually a little closer to the mark than one that had him Win it with 7given Biden’s final lead there (1.2 points). Yes, in some cases these pollsters have been too optimistic about Republicans, but not as much as most other pollsters have been too optimistic about Democrats.
One could, of course, argue that these polling stations were lucky in other ways. If your polls are always Republican, you’ll look like a genius when poll averages fall short of Republican support. However, you will be one of the worst performers in other cycles.
I think that’s a valid point … but only if an electoral agency really has a long track record of leaning in the same direction over and over again. For example, this is an apt description for Rasmussen Reports / Pulse Opinion Research, which has been oriented towards Republicans for many years. The Trafalgar Group is relatively new, however – their first entry in our survey database is from the 2016 primaries. It’s hard to criticize them too much when they rightly showed better results for Trump than others’ consensus in at least 2016 and 2020 Survey. And whatever it’s worth, the latest Trafalgar Group polls rightly showed that Democrats won the Georgia runoff.
Perhaps one final lesson is that it makes sense to average, aggregate, and have integrative rules for which surveys are included. Some of the pollers mentioned above didn’t have particularly good poll numbers for the 2020 general election cycle, either because they were relatively new or because they had mixed records. But while assigning our survey averages something Surveys from companies with poorer poll scores have less weight, we include them, and they can still have a reasonable impact on our numbers. If we had limited our survey averages only to so-called “gold standard” pollers, they would have been less accurate. That brings us to our next topic.
Live caller polls do not outperform other methods.
Up until this update, FiveThirtyEight’s pollster ratings were based on a combination of a pollster’s past accuracy and two methodological questions:
- Does the pollster take part in industry groups or initiatives (defined in more detail below) that are associated with greater transparency?
- And does the pollster conduct his surveys using live phone calls, including calls to cell phones?
Essentially, respondents received bonus points for meeting these criteria – not out of generosity of our hearts (although we do think transparency is a good thing in itself) but because these properties have historically been associated with greater accuracy.
As I’ll describe below, the transparency criterion still works quite well. However, the standard for live callers using cell phones has become a major problem for several reasons.
For one thing, almost all live caller polls now include cell phone calls. On the one hand, that’s good news since then The clear majority of adults are now only wireless. However, it also removes a point of differentiation for us when calculating the survey scores. Retrieving cell phones is more expensive than retrieving landline phones. If some respondents included it and others didn’t, it acted as a proxy for a respondent’s overall accuracy in their retrieval process. But now that everyone who does live caller polls is calling cell phones, this proxy is no longer as useful.
Second, it no longer makes sense to label one entire election office based on the methodology used. Electoral offices change methods from time to time. Some former live caller pollers are go online, for example. In addition, many respondents mix and match methods over the course of an election cycle depending on the type of poll they are conducting. In other words, the methodology is really a feature of a survey and not a Pollsters, So that’s how we classify it now for the purposes of the polls.
Most importantly, there is little evidence that live caller surveys consistently outperform other methods in terms of survey accuracy.
In the table below, I’ve shown the expanded plus-minus value for all surveys in our database since 2016 based on their methodology. The extended plus-minus, described in detail here, compares the results of a survey with other surveys of the same choice Art (e.g. other primary elections for the president) or, if possible, the exact same election (e.g. different polls of the 2020 Iowa Democratic Caucus), controlling the sample size of the poll and the proximity of the polls conducted. The most important thing to understand here is that Negative advanced plus minus values are good;; They mean that a survey had fewer errors than expected because of these characteristics.
What type of survey worked best? It’s kind of a mess:
|Live phone calls||Survey||Advanced +/-|
|Any live phone component||1.202||-0.0|
|Live Phone Hybrid||210||+0.5|
|Live phone only||992||-0.2|
|IVR / automated phone calls|
|Any IVR component (Interactive Voice Response)||977||+0.1|
|Any online component||1,706||+0.4|
|Any text component||275||-0.1|
Umfragen, die eine Live-Telefon-Komponente enthalten (allein oder in Verbindung mit anderen Methoden), haben seit 2016 ein erweitertes Plus-Minus von 0,0, verglichen mit Umfragen mit einer IVR-Komponente, die eine Punktzahl von +0,1 haben. Das ist natürlich nicht viel anders. Dies bedeutet, dass die Live-Anrufer-Umfragen etwa einen Zehntelpunkt genauer waren. Umfragen mit einer Online-Komponente hatten inzwischen eine Punktzahl von +0,4. Das ist ein bisschen schlimmer, aber statistisch gesehen ist eine Unterscheidung nicht so aussagekräftig, da diese Kategorie tendenziell von einigen wenigen großen Wahlbüros dominiert wird, die ziemlich unterschiedliche Erfolge vorweisen können. Schließlich haben Umfragen mit einer SMS-Komponente ein erweitertes Plus-Minus von -0,1, obwohl dies eine relativ neue Methode und eine relativ kleine Stichprobe von Umfragen ist.
All dies wird natürlich durch die Tatsache erschwert, dass viele Umfragen jetzt a verwenden Mischung von MethodenB. die Kombination von IVR-Anrufen zum Festnetz mit einem Online-Panel. Die Mixed-Mode-Methode für Umfragen scheint ebenfalls in Ordnung zu sein. Es ist vielleicht nichts wert, dass reine IVR dies abfragt Not einschließlich einer Online-Komponente haben Probleme mit einem fortgeschrittenen Plus-Minus-Wert von +0,7 seit 2016. Dies kann daran liegen, dass solche Umfragen keine Möglichkeit haben, Wähler zu erreichen, die kein Festnetz haben, wie viele Staaten Verbieten Sie automatisierte Anrufe auf Mobiltelefone. Es könnte dann ein Argument dafür geben, Festnetzumfragen von unseren Durchschnittswerten auszuschließen, obwohl diese selten genug geworden sind, dass sie bald zu einem strittigen Punkt werden könnten.
Was ist, wenn wir unsere Stichprobe seit 1998 auf die gesamte Pollster-Bewertungsdatenbank erweitern? Bietet dies klarere methodische Gewinner und Verlierer?
|Live Phone Hybrid||258||+0.3|
|IVR / automatisierte Telefonanrufe|
|Beliebige IVR-Komponente (Interactive Voice Response)||3,219||-0.3|
Nein nicht wirklich. Live-Anrufer-Umfragen (allein oder in Kombination mit anderen Methoden) haben seit 1998 ein Plus von -0,1 gegenüber einem Wert von -0,3 für IVR-Umfragen. Ich denke du könntest kann sein argumentieren, dass Telefonumfragen im Allgemeinen (Live- oder IVR-Umfragen) erfolgreicher waren als Online-Umfragen, die über die gesamte Stichprobe ein erweitertes Plus-Minus von +0,3 aufweisen. Aber auch hier ist „online“ eine breite Kategorie, die eine breite Palette von Techniken umfasst – und einige Online-Meinungsforscher waren erheblich genauer als andere. Die wichtigste Erkenntnis scheint zu sein, dass die Methodik allein in einer Umgebung, in der nur wenige Wähler Festnetzanschlüsse nutzen, mit Ausnahme von Nur-Festnetz-Umfragen nicht allzu viel aussagt.
Aus all diesen Gründen geben wir Live-Anrufer-Umfrageteilnehmern in unseren Umfragewerten keinen Bonus mehr. Wir glauben nicht, dass dies eine besonders enge Entscheidung ist.
Aber das ist nachdrücklich Not das gleiche wie zu sagen, dass “alles geht” oder dass alle Umfragen gleich sind. Zum einen haben unsere Untersuchungen ergeben, dass Umfrageteilnehmer, die das Transparenzkriterium erfüllen, andere noch übertreffen, sodass wir dies weiterhin verwenden werden. Wir bezeichnen dies manchmal als den „NCPP / AAPOR / Roper“ -Standard, weil ein Meinungsforscher ihn erfüllt, indem er zum (jetzt weitgehend inaktiven) gehört. Nationaler Rat für öffentliche Umfragendurch die Teilnahme an der Amerikanische Vereinigung für Transparenzinitiative für öffentliche Meinungsforschung oder indem Sie Daten zur IPoll-Archiv des Roper-Zentrums für Meinungsforschung. (Sofern es nicht wieder aktiv wird, werden wir die Berechtigung aufgrund der NCPP-Mitgliedschaft bald einstellen.)
Since 2016, polls from firms that meet the NCPP/AAPOR/Roper criteria have an advanced-plus minus score of -0.1, considerably better than the score of +0.5 for polls from other firms. And across the entire sample, since 1998, polls from NCPP/AAPOR/Roper firms have an advanced-plus minus of -0.4, as compared with +0.1 for those from other pollsters. Transparency is a robust indicator of poll accuracy and still counts for a lot, in other words.
Another check on the idea that “anything goes” — which we probably haven’t emphasized enough when discussing pollster ratings in the past — is that our ratings are designed to be more skeptical toward pollsters for which we don’t have much data. In calculating our averages, a pollster that hasn’t had any polls graded in our pollster ratings database is assumed to be considerably below average if it doesn’t meet the NCPP/AAPOR/Roper criteria. But this “new pollster penalty” gradually phases out once a firm has conducted around 20 recent polls.
And, of course, in the long run, the most important factor in our pollster ratings is that a polling organization is getting good results. The more polls a pollster conducts, the more its rating is purely a function of how accurate its polls are and not any assumptions based on what its methodological practices are.
So congratulations to the pollsters who had largely accurate results despite a difficult environment in 2020. And my sympathies to the ones who didn’t. Polling remains vital to the democratic experiment, and although I’m not a pollster, I know how frustrating it can be to be producing polls for a media environment that sometimes doesn’t get that.
Most of you will probably want to drop off at this point; there are just a few, largely technical notes to follow. Before you go, though, here’s the link again to the new pollster ratings, and here’s where you can find the raw data behind them.
I thought I told you to leave and go enjoy the spring weather! But transparency is vital in our pollster ratings project, so we do want to note a few odds and ends that reflect changes in how the pollster ratings are calculated this year. These are in no particular order of importance:
- As described earlier, we’re now classifying methodology based on the individual poll rather than on the pollster. In some cases, for polls we entered in our database long ago and didn’t record the methodology, we had to go back and impute it based on the methodology that the pollster generally used at that time. If you see any methodologies that you think are listed incorrectly, drop us a note at [email protected]
- We’re now excluding presidential primary polls if a candidate receiving at least 15 percent in the poll dropped out, or if any combination of candidates receiving at least 25 percent in the poll dropped out. Previously, we only excluded polls because of dropouts if one of the top two candidates in the poll dropped out. Only a small number of polls are affected by this change.
- As described at length here, advanced plus-minus scores are calculated in several stages. In the first stage, we run a nonlinear regression analysis where we seek to predict the error in the poll based on some basic characteristics of the poll. The regression now includes the following factors: the poll’s margin of sampling error, the type of election (presidential general, presidential primary, U.S. House, U.S. Senate or governor), the number of days between the poll and the election, and the number of unique pollsters surveying the race. This set of variables has been slightly simplified from previous versions of our pollster ratings.
- Previously, in conducting the regression analysis described above, we fixed the coefficient associated with the poll’s margin of sampling error such that it matches the theoretical margin of sampling described Here. In theory, for example, in a poll of 500 voters where one candidate leads 55-45, you should know genau how much sampling error there is. Now, however, we’re deriving it from the regression rather than treating it as a fixed parameter. This is because we’ve discovered that empirically, a poll’s sample size is less important than it theoretically should be in contributing to a poll’s overall error. That is to say, for example, if you take a poll of 2,000 voters and compare it to one with 500 voters, it will tend to have less error, but there is not wie much of a reduction in error as you’d expect, holding other factors constant. Why is this the case? It’s probably for a combination of reasons.
- Demographic weights and other decisions the pollster makes provide information above and beyond what the sample size implies.
- Pollsters may fail to publish results stemming from polls with small sample sizes that they perceive to be outliers.
- Or they may herd toward other polls.
- The sample size alone does not account for design effects.
- And an increasing number of polls (especially online polls) use non-probability sampling, under which the assumptions of traditional margin-of-error formulas do not really apply.
In short, while you should pay attention to sample size and a pollster’s margin of sampling error, there are also a lot of things that these Not tell you. At some point, we will probably also change how sample sizes are used in determining the weights assigned to polls in our polling averages.
- Finally, we have slightly modified and simplified the formula for calculating predictive-plus minus, the final stage in our ratings, which is what the letter grades associated with each pollster are derived from. (Yeah, I know the formula below looks complicated, but it’s actually simpler than before.) The formula now is as follows:
In the formula, PPM stands for predictive plus-minus and APM stands for advanced plus-minus. The herding_penality is applied when pollsters show an unnaturally low amount of variation relative to other polls of the same race that had already been conducted at the time the poll was released; see description here.
disc_pollcount is the discounted poll count, where older polls receive a lower weight. A poll’s weight is calculated as
Thus, for example, a poll conducted in 2020 will get full weight, a poll conducted in 2012 will get a weight of 0.56, and one from 1998 will have a weight of 0.20.
Endlich, vor is calculated as follows:
Where _isncppaaporroper takes on a value of 1 if a pollster meets the NCPP/AAPOR/Roper transparency standard and 0 otherwise.
Das war’s Leute! Don’t hesitate to drop us a line if you have any other questions.
CORRECTION (March 25, 2021, 10:53 a.m.): Two tables in this article previously flipped the data for the primary and general elections. The two tables have been updated.