“There are three types of lies – lies, damn lies, and statistics.” – Benjamin Disraeli
Statistical analyses have historically been a stalwart of the high tech and advanced business industries, and today they are more important than ever. With the rise of advanced technology and globalized operations, statistical analyses grant businesses an insight into solving the extreme uncertainties of the market. Statistical studies foster informed decision making, sound judgments and actions carried out on the weight of evidence, not assumptions.
As businesses are often forced to follow a difficult-to-interpret market road map, statistical methods can help with the planning that is necessary to navigate a landscape filled with potholes, pitfalls and hostile competition. Statistical studies can also assist in the marketing of goods or services, and in understanding each target markets unique value drivers. In the digital age, these capabilities are only further enhanced and harnessed through the implementation of advanced technology and business intelligence software. If all this true, what is the problem with statistics?
Actually, there is no problem per se – but there can be. Statistics are infamous for their ability and potential to exist as misleading and bad data.
While numbers don’t lie, they can in-fact be used to mislead with half truths. This is known as the “misuse of statistics.” It is often assumed that statistical misuse is limited to those individuals or companies seeking to gain profit from distorting the truth, be it economics, education or mass media.
However, the telling of have truths through statistical study is not only limited to mathematical amateurs. A 2009 investigative survey by Dr. Daniele Fanelli from The University of Edinburgh found that 33.7% of scientists surveyed admitted to questionable research practices, including modifying results to improve outcomes, subjective data interpretation, withholding analytical details and dropping observations because of gut feelings…. Scientists!
While numbers don’t always have to be fabricated or misleading, it is clear that even societies most trusted numerical gatekeepers are not immune to the carelessness and bias that can arise with statistical interpretation processes. Given the importance of data in today’s rapidly evolving digital world, it is important to be familiar with the basics of statistical misuse and oversight. As an exercise in due diligence, we will review some of the most common forms of statistical data misuse, and various alarming (and sadly, common) misleading statistics examples from public life. Do numbers lie? You can be the judge.
Misuse of Statistics – Common Quantitative Examples
Remember, misuse of statistics can be accidental or purposeful. While a malicious intent to mislead and misuse data with misleading statistics will surely magnify bias, intent is not necessary to create statistical misunderstands. The misuse of statics is a much broader problem that now permeates through multiple industries and fields of study. Here are a few potential statistical mishaps that commonly lead to misuse:
1) Faulty Polling: The manner in which questions are phrased can have a huge impact on the way an audience answers them. Specific wording patterns have a persuasive effect and induce respondents to answer in a predictable manner. For example, on a poll seeking tax opinions, let’s look at the two potential questions:
• Do you believe that you should be taxed so other citizens don’t have to work?
• Do you think that the government should help those people who cannot find work?
These two questions are likely to provoke far different responses, even though they deal with the same topic of government assistance. These are examples of “loaded questions.”
A more accurate way of wording the question would be, “Do you support government’s assistance programs for unemployment?” or, (even more neutrally) “What is your point of view regarding unemployment assistance?”
The latter two examples of the original questions eliminate any inference or suggestion from the poller, and thus, are significantly more impartial. Another unfair method of polling is to ask a question, but precede it with a conditional statement or a statement of fact. Staying with our example, that would look like this: “Given the rising costs to the middle class, do you support government assistance programs?”
A good rule of thumb is to always take polling with a grain of salt, and to try to review the questions that were actually presented. They provide great insight, often more so than the answers.
2) Flawed Correlations: The problem with correlations is this: If you measure enough variables, eventually it will appear that some of them correlate. As one out of twenty will inevitably be deemed significant without any direct correlation, studies can be manipulated (with enough data) to prove a correlation that does not exist or that is not significant enough to prove causation.
To illustrate this point further, let’s assume that a study has found a correlation between an increase in car accidents in the state of New York in the month of June (A), and an increase in bear attacks in the state of New York in the month of June (B).
That means there will likely be six possible explanations:
• Car accidents (A) cause bear attacks (B)
• Bear attacks (B) cause car accidents (A)
• Car accidents (A) and bear attacks (B) partly cause each other
• Car accidents (A) and bear attacks (B) are caused by a third factor (C)
• Bear attacks (B) are caused by a third factor (C) which correlates to car accidents (A)
• The correlation is only chance
Any sensible person would easily identify the fact that car accidents do not cause bear attacks. Each is likely a result of a third factor, that being: an increased population, due to high tourism season in the month of June. It would be preposterous to say that they cause each other.. and that is exactly why it is our example. It is easy to see a correlation.
But, what about causation? What if the measured variables were different? What if it was something more believable, like Alzheimer’s and old age? Clearly there is a correlation between the two, but is there causation? Many would falsely assume, yes, solely based on the strength of the correlation. Tread carefully, for either knowingly or ignorantly, correlation hunting will continue to exist within statistical studies.
3) Data Fishing: This misleading data example is also referred to as “data dredging” (and related to flawed correlations). It is a data mining technique where extremely large volumes of data are analyzed for the purposes of discovering relationships between data points. Seeking a relationship between data isn’t a data misuse per se, however, doing so without a hypothesis is.
Data dredging is a self-serving technique often employed for the unethical purpose of circumventing traditional data mining techniques, in order to seek additional data conclusions that do not exist. This is not to say that there is no proper use of data mining, as it can in-fact lead to surprise outliers and interesting analyses. However, more often than not, data dredging is used to assume the existence of data relationships without further study.
Often times, data fishing results in studies that are highly publicized due to their important or outlandish findings. These studies are very soon contradicted by other important or outlandish findings. These false correlations often leave the general public very confused, and searching for answers regarding the significance of causation and correlation.
4) Misleading Data Visualization: Insightful statistical graphs and charts include very basic, but essential, grouping of elements. Visualization charts must convey:
• The scales used
• The starting value (zero or otherwise)
• The method of calculation (e.g., dataset and time period)
Absent these elements, visual data representations should be viewed with a grain of salt. Intermediate data points should also be identified and context given if it would add value to the information presented. With the increasing reliance on intelligent solution automation for variable data point comparisons, best practices (i.e., design and scaling) should be implemented prior to comparing data from different sources, datasets, times and locations.
5) Purposeful Bias: The last of our most common examples for misuse of statistics and misleading data is, perhaps, the most serious. Bias is the deliberate attempt to influence data findings without even feigning professional accountability. Bias is most likely to take the form of data omissions or adjustments.
Examples of Misleading Statistics – A Digital Age of Blurred Lines
Now that we have reviewed several of the most commons methods of data misuse, let’s look at various digital age examples of misleading statistics across three distinct, but related, spectrums: News & politics, business & advertising and science. While certain topics listed here are likely to stir emotion depending on one’s point of view, their inclusion is for data demonstration purposes only.
1) News & Politics – Planned Parenthood
Misleading statistics in the news are quite common. On Sept. 29, 2015, Republicans from the U.S. Congress questioned Cecile Richards, the president of Planned Parenthood, regarding the misappropriation of $500 million in annual federal funding. The above graph/chart was presented as a point of emphasis.
Representative Jason Chaffetz of Utah explained: “In pink, that’s the reduction in the breast exams, and the red is the increase in the abortions. That’s what’s going on in your organization.”
Based on the structure of the chart, it does in-fact appear to show that the number of abortions since 2006 experienced substantial growth, while the number of cancer screenings substantially decreased. The intent is to convey a shift in focus from cancer screenings to abortion. The chart points appear to indicate that 327,000 abortions are greater in inherent value than 935,573 cancer screenings. Yet, closer examination will reveal that the chart has no defined y-axis. This means that there is no definable justification for the placement of the visible measurement lines.
Politifact, a fact checking advocacy website, reviewed Rep. Chaffetz’s numbers via a comparison with Planned Parenthood’s own annual reports. Using a clearly defined scale, here is what the information looks like:
And like this with another valid scale:
Once placed within a clearly defined scale, it becomes evident that while the number of cancer screenings has in fact decreased, it still far outnumbers the quantity of abortion procedures performed yearly. As such, this is a great misleading statistics example, and some could argue bias considering that the chart originated not from the Congressman, but from Americans United for Life, an anti-abortion group.
2) Business – Misleading Advertising Data
In 2007, Colgate was ordered by the Advertising Standards Authority (ASA) of the U.K. to abandon their claim: “More than 80% of Dentists recommend Colgate.” The slogan in question was positioned on an advertising billboard in the U.K., and was deemed to be in breach of U.K. advertising rules.
The claim, which was based on surveys of dentists and hygienists carried out by the manufacturer, was found to be misrepresentative as it allowed the participants to select one or more toothpaste brands. The ASA stated that the claim “… would be understood by readers to mean that 80 percent of dentists recommend Colgate over and above other brands, and the remaining 20 percent would recommend different brands.”
The ASA continued, “Because we understood that another competitor’s brand was recommended almost as much as the Colgate brand by the dentists surveyed, we concluded that the claim misleadingly implied 80 percent of dentists recommend Colgate toothpaste in preference to all other brands.” The ASA also claimed that the scripts used for the survey informed the participants that the research was being performed by an independent research company, which was inherently false.
Based on the statistical misuse techniques we covered, it is safe to say that this slight of hand technique by Colgate would fall under faulty polling and outright bias.
3) Science – Misleading Global Warming Statistics
Much like abortion, global warming is another politically charged topic that is likely to arouse emotions. It also happens to be a topic that is vigorously endorsed by both opponents and proponents via statistical studies. Let’s take a look at some of the evidence for and against.
It is generally agreed upon that the global mean temperature in 1998 was 58.3 degrees Fahrenheit. This is according to NASA’s Goddard Institute for Space Studies. In 2012, the global mean temperature was measured at 58.2 degrees. It is, therefore, argued by global warming opponents that, as there was a 0.1 degree decrease in the global mean temperature over a 14 year period, global warming is disproved.
The below graph is the one most often referenced to disprove the global warming. It demonstrates the change in air temperature (Celsius) from 1998 to 2012.
It is worth mentioning that 1998 was one of the hottest years on record due to an abnormally strong El Niño wind current. It is also worth noting that, as there is a large degree of variability within the climate system, temperatures are typically measured with at least a 30-year cycle. The below chart expresses the 30-year change in global mean temperatures.
And now have a look at the trend from 1900 to 2012:
While the long term data may appear to reflect a plateau, it clearly paints a picture of gradual warming. Therefore, using the first graph, and only the first graph, to disprove global warming is a perfect misleading statistics example.
To Conclude – Transparency and Data Driven Business Solutions
While it is quite clear that statistical data has the potential to be misused, it can also ethically drive market value in the digital world. Big data has the ability to provide digital age businesses with a roadmap for efficiency and transparency, and eventually, profitability. Advanced technology solutions can enhance statistical data models, and provide digital age businesses with a step-up on their competition.
Whether for market intelligence, customer experience or business reporting, the future of data is now. Take care to apply data responsibly, ethically and visually, and watch your transparent corporate identity grow.