Thursday, 22 January 2015

Making graphs of tables

Seeing tables as graphs


We often put tables into papers by reflex. Making them is a dull activity because, I suspect, there is the sense that no-one reads them. And there’s a very good reason for this: while tables are a very good resource, they are lousy communicators. 

Tables : lousy communicators


Here is a table of hair and eye colour

. use "Hair and eye colour.dta"
(Hair and Eye Colour, Caithness, from Tocher (1908))

. tabulate eye_colour hair_colour [fweight = freq]

           |                      Hair colour
Eye colour |      Fair        Red     Medium       Dark      Black |     Total
-----------+-------------------------------------------------------+----------
      Blue |       326         38        241        110          3 |       718 
     Light |       688        116        584        188          4 |     1,580 
    Medium |       343         84        909        412         26 |     1,774 
      Dark |        98         48        403        681         85 |     1,315 
-----------+-------------------------------------------------------+----------
     Total |     1,455        286      2,137      1,391        118 |     5,387 

You have to be pretty determined to make any sense of the table. Indeed, to do so requires somehow digesting the information from 20 numbers, most of which are three-digit numbers. This is pretty much guaranteed to be beyond the working memory capacity of the average human.

And no, percentages don’t help much:


. tabulate eye_colour hair_colour [fweight = freq], column nofreq 

           |                      Hair colour
Eye colour |      Fair        Red     Medium       Dark      Black |     Total
-----------+-------------------------------------------------------+----------
      Blue |     22.41      13.29      11.28       7.91       2.54 |     13.33 
     Light |     47.29      40.56      27.33      13.52       3.39 |     29.33 
    Medium |     23.57      29.37      42.54      29.62      22.03 |     32.93 
      Dark |      6.74      16.78      18.86      48.96      72.03 |     24.41 
-----------+-------------------------------------------------------+----------
     Total |    100.00     100.00     100.00     100.00     100.00 |    100.00 


Stacked bar charts

Here, instead, is what happens when we graph the data

catplot  eye_colour hair_colour [fw=freq], name(catplot,replace) ///
asyvars stack percent(hair) legend(rows(1) stack)



The stacked bar chart shows the trend of dark-to-light running from top left to bottom right. This shows the breakdown of eye colour within each hair colour, but tells us nothing about the distribution of hair colour. 

This is done with Nick Cox’s command catplot. Download it from the ssc archive

. ssc install catplot


Spineplots (mosaic plots)

Spine plots (also called mosaic plots) are a very effective way of visualising tables. Unlike stacked bar charts, you may not have heard of spine plots. 
A spineplot will show both the distribution of hair colour, and the distribution of eye colour within hair colour:

spineplot  eye_colour hair_colour [fw=freq], percent





The hair colours are shown as columns, and we can see that red hair and black hair are much rarer in this population (Scotland, early 20th century) than fair, medium and dark. And the relationship with eye colour is now very evident – the colour changes from bottom left (fair hair, light or blue eyes) to the top right (dark or black hair, dark eyes). 

Do you need a graph rather than a table

The tables above contain the relationship but they don’t show it. And even if you are determined to find it, there are simply too many numbers in the table for any normal person to hold them all in working memory and make sense of the pattern. 

The spine plot, on the other hand, shows the relationship with little work needed on the part of the reader. It doesn’t record the exact percentages. If you needed simply to record the exact percentages for reference, then a table is better, but if you wanted to communicate a pattern, then there’s no question: the graph wins hands-down.

This is done with Nick Cox’s command spineplot. Download it from the ssc archive

. ssc install spineplot