I'm tired of hearing journalists say (or seeing them write) something like this:
The latest MajorityWatch poll was taken from October 24-26, with 1,008 likely voters and a margin of error of 3.08%. It shows Seals with a narrow 48%-46% lead, within the margin of error and thus a statistical tie.
(I took this quote from this post on Blogger News Network &mdash for no other reason than it illustrated my point in just two sentences.)
It's clear to me that most people understand the phrase "statistical tie" or "statistical dead heat" to mean that, because the poll is within the margin of error, we really have no information about who is leading.
This is a complete misunderstanding of what the margin of error represents.
The margin of error represents two standard deviations away from the mean on a bell curve, and thus a 95% confidence that the "true" answer is within that margin. However, the highest likelihood (i.e. the most likely outcome) in the example above is that Seals wins 48% to 46%.
Let me rephrase that (something I love to do). The margin of error does not mean that any outcome within that range is just as likely as any other outcome. The most likely outcome is 48% to 46%. But because polling is not perfect, we calculate a range such that we are confident that there is a 95% chance that the actual result will be within that range — in this case plus/minus 3.08%.
(As an aside, the vertical lines in the bell curve on this site's logo represent two standard deviations — 95% of the population is between those lines.)
Yes, when the poll is within the margin of error, it dramatically increases the likelihood that the poll may be showing the wrong winner. No, it does not mean that the poll is providing no information about who is more likely to win.
Yes, if anyone read this blog, I would expect to hear comments back that I am wrong. No, no one reads this blog. ;)
Dr. Douglas A. Lonnstrom, professor of finance and statistics at Siena College and one of the Directors of the Siena Research Institute (the college's polling and survey department), wrote a short presentation about this very subject. His presentation starts out
With the worldwide proliferation of political polling, especially in the United States, the news media have coined the term "statistical dead heat." The intended meaning is to call a race a tie if the two candidates have percentages that fall within the sampling error. ... I say this is not so. In fact, it is my contention that the news media make four errors when they use the term "statistical dead heat."
He then goes on to discuss those four errors. The presentation is only a single page long, so I recommend anyone interested in this subject read it.
Just for kicks, here are some other examples from major news outlets:
From the New York Times: "Roanoke College poll shows GOP Sen. George Allen leading Democratic challenger Jim Webb, 45 percent to 42 percent, a statistical tie"
From FoxNews.com: "Burns comes in at 4 percent behind Democrat Jon Tester, a statistical tie because it is within the poll's 4.5 percent margin of error."
From a transcript of Lou Dobbs Tonight at CNN.com: "Now he's at 46 percent. Democrat Jim Webb, 50 percent. A statistical dead heat according to CNN's new poll conducted by Opinion Research Corporation."
Interestingly, CNN manages to just barely avoid making this mistake in articles on its website. While I found some transcripts from broadcasts that made this mistake, as well as quotes from other news organizations, I could not find anywhere where CNN.com articles referred to a single race as a "statistical tie" or "statistical dead heat". (My search was not exhaustive, but I looked for ten or twenty minutes.) There were times where they referred to a group of races as being in "statistical ties" but avoided referring to any one race this way.
Given a set of poll results, the question is whether with all that data there is some other concise figure as relevant as the mean. After reading your article, it occured to me that fitting to a bell curve, which has only two cumulants (mean and standard deviation), provides no other information at all.
To get more information, you'd need a fit that includes the third cumulant, skewness. The curve would have a peak that is not in the middle, and that peak (the mode) would be another relevant statistic. This could be done by fitting a quartic polynomial to the logarithm of the Fourier Cosine Transform of the data.
I think the phrase "Statistical Tie" could be kept, but with a new meaning: that the mean and the mode of this curve are on opposite sides of 50%.
Posted by: Collin | February 01, 2010 at 07:13 AM