I am highly suspicious of line graphs displaying the number zero (0).
I am in fact so suspicious of them that when I see one, I hear myself inquiring (in a most dubious tone of voice), “Is the value shown here truly zero (cross your heart and hope to die) or is there a gap (missing data) in the underlying data-set you’re using?”
The very substantial cause of my skepticism is the awareness that people sometimes mistakenly enter a value of zero on a spreadsheet or in a data table to stand for missing data. While many like to debate the point – “Don’t real numbers have to have value? Does nothingness actually count as a value?” – zero is a real number.
For example: zero patients responded “Agree” to a survey question; zero patients fell this month; zero new cases of pertussis were observed in the U.S. this year; zero payments were received on outstanding bills today. Knowing this, when I have been assured that the true result is zero, I am fine with a display like the following, and I cease my interrogation.
If, however, data is missing for a particular period, I know there really is a gap that must be displayed as such. That is, I want to see a line graph like the one below, which makes it clear that June data is completely missing: there are no results in the data-set to report.
Oh, but if it were always this straightforward, I just might be out of a job.
Many software applications – such as Excel in my examples – have been specifically designed to let users display this data correctly. But sadly – okay, infuriatingly – that is not the case with all of them.
Imagine my dismay when I recently learned from a colleague that in the application she was using, the software simply linked points in time on a line chart, completely obscuring the fact that data was missing. Instead of showing a gap for absent data, the software creates a display something like this:
Oh, my aching head! As you can see, this display makes it appear as if there were known results for 2013, when in fact there are none. I later discovered that this flaw is a known problem that the vendor has yet to correct. (See Steve Few’s post about it here.)
What do you do if you encounter this issue?
Certainly, you can write the vendor – but that won’t solve your immediate problem. You can (and should) also note the misleading flaw in your report, drawing attention to the gap it causes in the data, and stating that the results for the time period (the year 2013 in the example above) are missing. Make that alert stand out, too! Something like
NO DATA AVAILABLE FOR 2013 – SOFTWARE AUTOMATICALLY LINKS YEARS
should be clear enough.
You could also delete 2013 from your data-set altogether, which would result in a graph that looks like this:
This solution introduces a new problem, however: viewers will not easily pick up on the fact (if they do so at all) that 2013 is missing.
They will see a series and assume it is consistent, losing sight of the broken sequence at the end of the chart, where the years jump from 2012 to 2014. You will therefore – again – need to highlight this omission as you did with the warning above. And how annoying is that? Very.
The power of well-designed data visualization lies in how beautifully and simply lines, points, boxes, and bars can make the story in our data easy to see and understand.
The operative expression here, though, is “well-designed.” Only with good design will the charts and graphs you create using the best practices of data visualization be of any use. Unfortunately, not everyone has gotten the memo about how it all works – and Liquid Paper is no longer an option.