Seeing tables as graphs
We often put tables into papers by reflex. Making them is a dull activity because, I suspect, there is the sense that no-one reads them. And there’s a very good reason for this: while tables are a very good resource, they are lousy communicators.
Tables : lousy communicators
Here is a table of hair and eye colour
. use "Hair and eye colour.dta"
(Hair and Eye Colour, Caithness, from Tocher (1908))
. tabulate eye_colour hair_colour [fweight = freq]
| Hair colour
Eye colour | Fair Red Medium Dark Black | Total
-----------+-------------------------------------------------------+----------
Blue | 326 38 241 110 3 | 718
Light | 688 116 584 188 4 | 1,580
Medium | 343 84 909 412 26 | 1,774
Dark | 98 48 403 681 85 | 1,315
-----------+-------------------------------------------------------+----------
Total | 1,455 286 2,137 1,391 118 | 5,387
You have to be pretty determined to make any sense of the table. Indeed, to do so requires somehow digesting the information from 20 numbers, most of which are three-digit numbers. This is pretty much guaranteed to be beyond the working memory capacity of the average human.
And no, percentages don’t help much:
. tabulate eye_colour hair_colour [fweight = freq], column nofreq
| Hair colour
Eye colour | Fair Red Medium Dark Black | Total
-----------+-------------------------------------------------------+----------
Blue | 22.41 13.29 11.28 7.91 2.54 | 13.33
Light | 47.29 40.56 27.33 13.52 3.39 | 29.33
Medium | 23.57 29.37 42.54 29.62 22.03 | 32.93
Dark | 6.74 16.78 18.86 48.96 72.03 | 24.41
-----------+-------------------------------------------------------+----------
Total | 100.00 100.00 100.00 100.00 100.00 | 100.00
Stacked bar charts
Here, instead, is what happens when we graph the data
catplot eye_colour hair_colour [fw=freq], name(catplot,replace) ///
asyvars stack percent(hair) legend(rows(1) stack)
The stacked bar chart shows the trend of dark-to-light running from top left to bottom right. This shows the breakdown of eye colour within each hair colour, but tells us nothing about the distribution of hair colour.
This is done with Nick Cox’s command catplot. Download it from the ssc archive
. ssc install catplot
Spineplots (mosaic plots)
Spine plots (also called mosaic plots) are a very effective way of visualising tables. Unlike stacked bar charts, you may not have heard of spine plots.
A spineplot will show both the distribution of hair colour, and the distribution of eye colour within hair colour:
spineplot eye_colour hair_colour [fw=freq], percent
The hair colours are shown as columns, and we can see that red hair and black hair are much rarer in this population (Scotland, early 20th century) than fair, medium and dark. And the relationship with eye colour is now very evident – the colour changes from bottom left (fair hair, light or blue eyes) to the top right (dark or black hair, dark eyes).
Do you need a graph rather than a table
The tables above contain the relationship but they don’t show it. And even if you are determined to find it, there are simply too many numbers in the table for any normal person to hold them all in working memory and make sense of the pattern.
The spine plot, on the other hand, shows the relationship with little work needed on the part of the reader. It doesn’t record the exact percentages. If you needed simply to record the exact percentages for reference, then a table is better, but if you wanted to communicate a pattern, then there’s no question: the graph wins hands-down.
This is done with Nick Cox’s command spineplot. Download it from the ssc archive
. ssc install spineplot
No comments:
Post a Comment