Methodology

Congress Compass’s polling model weights polls based on their sample, methodology, and age.

Sample

Polls with more respondents have lower sampling errors, and the model weights these polls accordingly.

Sample size is capped at 1000 respondents, and any poll with more respondents than that do not receiving corresponding gains in weighting. In tests on historical data since 2006, providing more weight to polls with more than 1000 respondents made the average error higher due to diminishing gains in sampling error reduction.

Polls boasting multiple voter models will only have the likely voter model most comparable to the default or historical methodology for that pollster included in the model.

Methodology

The model weights polls more if they adhere to certain methodological practices which have historically corresponded to more accurate results.

Firstly, the model weights polls more if the polling firm had interviewers contact respondents via mobile and home phones. Among the reasons for this is that firms that use voice recordings cannot distinguish between a child and an adult, nor are they capable of random selection within the household.

In addition, firms must call both mobile and home phones. A majority of households only use a cellphone according to data from the CDC, and respondents who would be missed if a pollster neglects to sample them.

Phone polls as a whole are also preferable to online polls as their samples are random while online polls are not. Some online polling firms have attempted to compensate for the lack of randomness through demographic weights or contacting respondents through both online and phone surveys.

Secondly, pollsters who adhere to the transparency and disclosure practices of the American Association for Public Opinion Research (AAPOR) Transparency Initiative, the National Council on Public Polls, or the Roper Center for Public Opinion Research are both more accountable and more accurate. Consequently, the model weights polls from these more transparent pollsters more than polls from other firms. FiveThirtyEight’s Pollster Ratings is the source of Congress Compass’s information for this measure of methodological transparency.

Age

The methodologically adjusted sample weight is then subject to a time decay formula.

The age of the poll is taken from the median of the dates it was conducted by the pollster. The model weights polls less the greater the distance between this median date and the election day is. Polls are gradually penalized for their age more the closer it gets to election day.

The model’s time decay formula is similar to the one used by FiveThirtyEight in its forecasts, with a crucial difference being that Congress Compass is much more aggressive in weighting later polls heavier to account for late breaking events and polling shifts.

Margin of Error

The margin of error produced by the Congress Compass model is the anticipated range for the results to fall within 80 percent of the time, or an 80 percent confidence interval. The margin of error for each poll is calculated by first taking the sampling margin of error and then applying additional methodological error based on historical margin of error for that pollster 21 days prior to the election. When that data is unavailable or insufficient, the average methodological error is used.

Primaries and Uncontested Elections

Congress Compass does not create marginal forecasts for any race which does not have two major party nominees, unless the incumbent is an independent or belongs to another party, or if a minor party or independent candidate is polling above 10 percent.

The two major parties are the Democratic Party and the Republican Party, which have together won almost every House seat in modern history. Absent polling, elections in which the incumbent faces no major party opponent are considered effectively uncontested.

The requirement for two major party nominees to produce a marginal forecast also necessarily excludes making forecasts for primaries. There are no marginal forecasts for Louisiana’s blanket primaries on the general election date, and ratings for these races are fundamentals based.

Fundamentals

The Congress Compass polling forecast does not take into account anything other than the aforementioned factors. This is to reduce the number of underlying assumptions and ward against overfitting and p-hacking.

Fundamentals models have a historically poor track record when applied to out of sample data. As FiveThirtyEight’s Nate Silver wrote, “Polling-based models are simpler and less assumption-driven, and simpler models tend to retain more of their predictive power when tested out of sample.”

In numerous noncompetitive races and a handful of competitive ones, the model does not have enough polling data to create a marginal forecast. The model’s minimum number of polls increases to two when there are 21 days until the election. Before that point, one credible poll is enough to create a rating.

However, when there is no polling, the model pools the latest forecasts from political scientists with the most extensive and accurate out of sample predictions on record to project an outcome.

Inside Elections and Sabato’s Crystal Ball have the most documentation and the highest success rate of any non-polling based forecast since 2010, and are thus favored the most by the fundamentals model. If Inside Election designates a unique outcome contrary to all other non-polling forecasters, the fundamentals model favors the IE prediction. The model also generally prefers the forecaster rating with the least certainty short of pure toss-up.

Probabilities

Congress Compass does not present its data via numeric probabilities in the way that FiveThirtyEight and the Huffington Post did in the 2016 elections. A numeric probability is when you assign a numeric value to a probabilistic forecast, e.g. “Hillary Clinton has a 71.4 percent chance to win the presidential election.”

While this is the norm for these forecasters, a study featured by the Pew Research Center in February suggested that the presentation of these numeric probabilities confused observers and could made them less likely to vote.

This study suggests that “Rep. Jim Banks is ahead by 11” more intuitively conveys the competitiveness of an election better than a numeric probability.

Nate Silver wrote, “Both probabilities and polls are usually listed as percentages, so people can confuse one for the other — they might mistake a forecast showing Clinton with a 70 percent chance of winning as meaning she has a 70-30 polling lead over Trump, which would put her on her way to a historic, 40-point blowout.”

Thus, Congress Compass forecasts do not render numeric probabilities for each race to minimize confusion and inhibit possible civic disengagement effects. Uncertainty is instead conveyed through language, such as the Tilt or Lean ratings, or by the margins themselves.

Presidential Model

No Online Pollsters

The Congress Compass presidential model is selective in its poll inclusion criteria, excluding all pollsters that exclusively conduct their surveys online, such as Civiqs, Google Surveys, Harris Insight, Ipsos, SurveyMonkey, YouGov, and Zogby. These online pollsters serve a vital role in Congressional races because of the paucity of polling in down ballot contests, but presidential elections are always high-profile and thus feature more polls to choose from. When there is a diversity of pollsters that use more traditional sampling methods as there are in presidential contests, pollsters that exclusively use online surveys only serve to increase the average error of the forecast and can even inhibit the model from correctly forecasting swing states.

This may be because exclusively online pollsters cannot have as random of a sample as a telephone survey can. In a telephone survey, all members of the public have a relatively equal chance of being selected, and thus that random sample can then be representative of a larger population. In an exclusively online survey, the respondents have to be selected by the pollster specifically or go to a certain website to be a part of the sample. This makes these polls less representative, because the factors that could make respondents more likely to go that specific website or participate in that pollster’s panel could also make them unlike the general population being sampled in ways that impact the resulting data.

Late Breaking Adjustments

Presidential elections can change rapidly based on national news, which means that a presidential model needs to be more sensitive to recent shifts in polling than a Congressional model. There are three methods the Congress Compass presidential model uses to give more weight to late-breaking polls than the Congressional model. The first method of giving more weight to late-breaking polls is direct, as more recent polls are weighted slightly more in the presidential model than in the Congressional model. The second method is that instead of taking the median of the first and last days a poll was conducted as in the Congressional model, only the first day is used to date the polls for the model. The third and final method is that if there are at least two polls conducted beginning in the last week prior to the election, all older polling is dropped entirely from the average for that state.

PVI for Uncertain Races

A partisan voting index (PVI) uses the results of a national election and compares them to statewide or district-wide races to measure the partisanship of those respective states or districts. For example, if Texas was 11 points more Republican than the national margin in 2016, and we only know that the Democratic candidate is ahead by 8 points nationwide in 2020, we could assume with PVI that the Democratic candidate is behind by 3 points in Texas based on the 2016 result. When there is dramatic uncertainty in statewide polling, such as if the combined total of the polling average is less than 92 percent, a simple PVI based on the last presidential election is combined with the national polling average and applied as a ‘poll’ in the statewide average. The ‘poll’ is dated to the time of the last national poll and has respondents equal to the sample size cap for presidential races of 1200. This may be the only ‘poll’ in a statewide average for places that have had no traditional polling in recent presidential elections such as the District of Columbia or Wyoming.

States where a minor party or independent candidate is polling above 10 percent do not have a PVI ‘poll’ added to the average, as the low two-party vote share does not reflect uncertainty. If a minor party or independent candidate is polling at more than 10 percent nationwide, the expected two party combined polling average is decreased correspondingly. That is to say, if an independent is polling at 11 percent nationwide, the two-party vote share in a statewide average would have to fall below 81 percent for a PVI ‘poll’ to be added to the average.

Sample Weighting

Polls with a greater sample size are weighted even heavier in the presidential model than the Congressional model if it is the late-breaking form of the average which only includes the last week of polling. The sample size cap is 1200, and any poll that exceeds that cap is weighted as if it has 1200 respondents.

Sample

Methodology

Age

Margin of Error

Primaries and Uncontested Elections

Fundamentals

Probabilities

Presidential Model

No Online Pollsters

Late Breaking Adjustments

PVI for Uncertain Races

Sample Weighting

Share this: