Nobody likes feeling manipulated; however, people are duped by data visualizations everyday. From political issues to sports statistics, to the recent report you received on the ROI of your company blog, the internet and reports are flooded with examples of misleading data visualization.
The best way to safeguard from misinformation is to arm yourself with tech-appropriate analytical and evaluative skills that will expose the most oversimplified and malicious data visualizations. In the following feature we will present some most common data misrepresentations together with some tips on how not to fail when presenting data.
Learn how to spot the common tricks used to manipulate data and how to avoid the pitfalls for your own data visualizations.
Infamous for its overuse in politics, the truncated y-axis is a classic way to visually mislead. Take a look at the graph above, comparing people with jobs to people on welfare. At first glance, the visual dynamics of the graph suggest people on welfare to number four times as many as people with jobs. Numbers don’t lie, however, and when analyzed, they point out much less sensational facts than the data visualization would suggest.
This type of misinformation occurs when the graph’s producers ignore convention and manipulate the y-axis. The conventional way of organizing the y-axis is to start at 0 and then go up to the highest data point in your set. By not setting the origin of the y-axis at zero, small differences become hyperbolic and therefore play more on people’s prejudices rather than their rationality. Notice on the graph below, originally shared by Gizmodo, how much larger the differences look when truncating the y-axis.
Focus on creating your data visualizations using data with a zero-baseline y-axis and watch out for truncated axes. Sometimes these distortions are done on purpose to mislead readers, but other times they’re just the consequence of not knowing how an unintentional use of a non-zero-baseline can skew data.
Why lie when you can just omit? By omitting certain data points, trends that don’t actually exist can easily be created whereas some existing highlights can go unnoticed. That’s because by omitting some data we are missing the context. Leaving out variables can affect how you interpret the data and what conclusions you draw from it. So whenever you’re examining a variable and its relationships, carefully consider the context in which that variable exists and deliberately seek out other variables that could affect the one you’re studying.
As an example of what happens when you omit some data, be that because you purposefully want to create a misleading data visualization or you simply want to make your work easier, take a look at the scatter plot below. By leaving out some data points, the chart that normally would be filled with dips and spikes, looks much smoother and more stable. See these graphs originally published by Cogent Legal.
By only plotting every second year instead of every year, the graph appears to have a steady increase, while the real data is more volatile. Companies can take advantage of this by omitting years with significant changes in sales to make their earnings look constant and predictable, masking the true volatility of the market. When evaluating data visualizations make sure to have all the data accessible.
Any high school math class should have covered that correlation doesn’t imply causation. But looking at the headlines of the most popular internet articles (“Does X cause Y?”) it is easy to forget. We are beginning to see correlating causation more and more with big data analyses. Data scientists are finding statistical patterns in data and sometimes care more about correlation rather than causation. Figuring out correlation is simply easy. For more on how big data is effecting correlating causation, don’t miss the blog post Big data: are we making a big mistake?
Here’s our favorite correlating causation data visualization, but for more check out Buzzfeed’s: The 10 Most Bizarre Correlations.
How to Avoid The Pitfalls of Misleading Data
If you want your data to tell the whole truth and nothing but the truth, implement these practices to make sure you avoid misleading data visualization.
By using the standard model for visual models, you can avoid misleading your reader. If your high school math teacher would mark you down on an exam for your methods of data representation, think twice.
Start your y-axis at 0 to avoid making small differences look large. The exception to this practice would be if these small differences actually mean something significant. In global climate change data, often a global average temperature increase of 1 degree per year can have dire consequences. Thus small increasing in temperature is important and needs to be highlighted. However, when looking at something where small changes don’t correlate into a big impact, start your y-axis at 0.
Following convention also goes for pie charts as well. People are conditioned to look at pie charts as equaling 100% of the data; don’t mislead people by only giving them a slice of the pie’s data.
Make Your Data Visualization Clear and Easy to Understand
This one should be a no-brainer, but it is often neglected in a world inundated with flashy and intricate graphs. Below are two graphs mapping out the same data. As you can see, the second graph is much easier to identify trends and more aesthetically pleasing. Don’t let sloppy visuals negate the credibility of your hard earned data.
Visualization guru Edward Tufte explains, “excellence in statistical graphics consists of complex ideas communicated with clarity, precision and efficiency”. Make this your mantra every time you sit down to create data visualizations.
Give up on PowerPoint
PowerPoint is a tool of the past. Now dashboards are in. The new dashboard software tools work with real-time data and allow you to turn numbers into visuals with only a few clicks. On the other hand, in PowerPoint performing SQL queries, exporting the results to create charts before importing them onto slides can take a long time. With extra bells and whistles – colorful headlines, text, images – the process can take even more time. With easy-to-use SQL query builders, drag and drop interface, and a metric builder, data visualization tools can produce results in mere minutes. The less time spent on creating data visualizations, the more time you have to understand the data.
Think for Yourself, Question Authority
With your new skills of sniffing out faulty data and misleading information, make sure to review each data visualization with a skeptical eye before you present it. More importantly, don’t let your data become infected with these cheap tricks. Create a beneficial dashboard culture in your company to be sure that every piece of data and every visualization has been scrutinized before it goes public.