Can’t See the Forest for the Treemaps

I’m sure you’ve heard the expression “Can’t see the forest for the trees.” It describes someone far too involved in the details of a problem or situation to grasp its entirety; (s)he has lost sight of overall goals — the “big picture.”

Thinking about this challenge in the context of data visualization, I can recall many displays created by analysts so enamored of the software they’re using and what it can do that they miss the larger objective: creating visualizations that show the story in the data simply, clearly, and compellingly.

And yes, not so coincidentally, these are sometimes displays of data using a visualization technique called Treemaps (forest, trees, Treemaps — get it?).

It’s important and extraordinarily helpful to understand the genesis of a visualization technique to ensure we are using it correctly. Why was a technique conceived? What problem was it designed to help us solve?

In the case of Treemaps, it was during the 1990’s that Ben Shneiderman of the University of Maryland imagined a new technique to display space-constrained visualizations of hierarchies — or, more simply, to visualize large quantities of hierarchical data far too numerous to be displayed more simply and effectively in a bar graph.

Here’s an example from Steve Few’s book Now You See It, which displays hierarchical stock market data using Shneiderman’s Treemap technique.

(click to expand)

Look closely, and you will see the different levels of data being displayed:

  • Level 1 The whole visualization represents the entire stock market.
  • Level 2 Next, different stock market sectors (financial, healthcare, etc.) are displayed and labeled in the secondary level of rectangles.
  • Level 3 Inside each of these rectangles, the smallest rectangles represent individual stocks within each sector.

Additional information is encoded by:

  • Making the size of each rectangle representative of the size of the respective sectors and stocks being displayed.
  • Using different saturations of the colors blue and red to encode both the current price of the stock and its change in price since the previous day (blue for gains, red for losses).

In the example above, we can see the entire stock market and the relative size of the different sectors (Technology, Consumer Goods, Healthcare). We can also see (for example) that the Financial sector is the largest in the stock market, and that Citigroup’s stock represents a large portion of this sector.

The light blue color conveys that Citigroup had gains, but not as much as some others, which are displayed in a darker blue. By comparison, we can see the relative size of the Technology sector (bottom row) and that Microsoft represents the largest number of stocks in that sector. The red color shows that it had losses.

Treemaps are a relatively complex type of visualization technique designed to solve the challenge of how to display complex categories and sub-categories (hierarchies) of data. The mistake I see most often, however, is that the displays of health and healthcare data being displayed in a Treemap don’t present the same challenges.

For example, I’ve seen simple categories of data like the top 25 states rates of adults diagnosed with diabetes displayed in a Treemap, when bar chart would be more appropriate.

(click to expand)

At first glance, a Treemap for such basic data may look cool, but the novelty wears off very quickly because we can’t easily understand data ranking or compare rates between states; and we can’t add further contextual data of interest.

Now consider the same data in a simple bar graph where we can rank the data, compare each state’s diabetes rate, and label everything directly. We can also include additional contextual information, such as the average for the entire country, using a vertical line overlaid on the bars.

(click to expand)

Here’s the bottom line: if we focus only on the “trees” of the different functionalities and seemingly cool visualizations that many new software applications allow us to create easily — without understanding why they were conceived, and what problems they’re designed to solve — we more often than not will miss the bigger objective of creating clear, accurate, compelling views of crucial data.

However, if we commit to understanding the underlying structure of our data and the best visualization to convey the meaning buried in it, we will be able to see both the forest and the trees.

Leave a Reply