No, your eyes are not deceiving you; the title of the blog has changed slightly, from “How to Make Truly Terrible Graphs” to “How to Make Truly Terrible Tables.” This reflects the fact that it is possible to screw up (am I allowed to say that? Let’s make it “have things go amiss”) in areas other than graph-making. So, in the next few blogs, we’ll turn our attention to making tables for papers and presentations. (As a woodworker, I’ve screwed up making other types of tables, but that discussion will have to wait for a different forum.) The second part of the title may also raise some eyebrows; how can you be too accurate? After all, the need for accuracy has been drummed into our heads since we were scientists-in-training, learning the rules of the game at our supervisor’s knee. Whether we’re using an extremely expensive piece of lab equipment or designing a new paper-and-pencil scale, the mantra is the same: reduce the error in order to improve the reliability of our measurements and increase the accuracy. So in presenting our results in a table, how can we be “too accurate?”

As a matter of fact, it’s actually quite easy; all we have to do is ignore the imprecision inherent in any measurement and just keep printing out all of those numbers to the right of the decimal point. For starters, let’s take a look at Table 1, presenting some basic demographic information for a group in a study.

Table 1

Demographic Information

Variable |
Group 1 |
Group 2 |

Number of males/females |
6/4 |
5/5 |

Age in Years (SD) |
38.25 (10.05) |
37.60 (9.90) |

Education in Years (SD) |
13.45 (4.20) |
12.90 (4.15) |

Starting off with Age, we report that it’s 38.25 years for the 10 people in Group 1. If we determined age by asking the people how old they were at their last birthday, then on average, there’ll be an error of about 180 days. For example, at my last birthday, I was 73 years old, but I’m actually 73 years, 8 months, and 12 days old on the day that I’m writing this. (For those who want to send cards or presents, my actual birth date is 12 November; my mailing address is available on request.) We can improve the accuracy by asking people how old they are as of their nearest birthday, but that decreases the error to “only” 90 days, on average. Now, just what does that ‘5’ in the second decimal place represent? It’s 1/100^{th} of a year, or 3.65 days. Given the degree of inaccuracy in how we measured age to begin with, can we really justify this degree of accuracy in reporting the results, especially given that there are only 10 people in the group? If just one person in the group were replaced with another who is one year older, that would change the *first* decimal place from 2 to 3, or slightly more than one month. To claim that we know an average participant’s age to four days’ accuracy does violence to the data.

In fact, that overestimation of the precision of the data pales in comparison to our estimate of the participants’ education. Because the school year is about 200 days long (and they often seemed like very long days), then the last decimal place represents two days in class. Do you really think the data can support this degree of accuracy? I thought not.

If you think that these examples are fairly extreme, then (in the words of TV pitch men), “But wait – there’s more!” I just checked a Web site for the population of Brazil, and the number it reported was 206,769,143. Seriously? Even if that’s based on some equation taking into account the estimated birth and death rates, let’s examine where the numbers came from. There first had to be a census to establish the baseline, and that data-gathering was likely spread out over many weeks or months, covering not only major cities but also remote villages buried deep in the Amazonian forest. During that time, some people were dying and others being born. But let’s not forget the words of Sir Josiah Stamp (1880-1941), a statistician and former Director of the Bank of England: “The government are very keen on amassing statistics. They collect them, add them, raise them to the nth power, take the cube root and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the *chowy dar* [village watchman in India], who just puts down what he damn pleases.” Then, the birth rate is 14.46/1,000 population, or slightly over 340 new souls *per hour*! (The figure for deaths is about 154/hour.) So, that final “143” in the population estimate is wrong within an hour of being written down. It would be far more “accurate” to say that the population is 206.8 million and leave it at that, indicating that the estimate is really just that – an estimate.

So remember, too much accuracy in a table is inaccurate.

]]>

Statistics Commentary Series: Commentary #9—Sample Size Made Easy (Power a Bit Less So) JOURNAL OF CLINICAL PSYCHOPHARMACOLOGY · MARCH 2015 · DOI: 10.1097/JCP.0000000000000297 · http://www.researchgate.net/publication/273463222

The reason I like this analogy so much is that magnification is intuitively clear to just about anyone who has ever stood far away from something, then moved in for a closer look. We know perfectly well that what we are looking at isn’t changing, but by changing position so that we can gather more information, we become more sure of what we are seeing. By gathering more data (having a larger sample size), we can be more sure* (have better statistical significance) of what we are seeing.

*Remember, p-values, which are what most people mean when they refer to statistical significance, *only* tell you the probability that you have incorrectly found a difference between two treatments (a false positive), so the words “can be more sure” are on purpose *not* “can know.”

]]>

where I just clicked on

This page talks about the problem with p-level being the be-all and end-all of way too many scientific studies. You might be aware that there is discussion about this in the scientific literature. (I think statisticians deserve prizes for still trying to get the rest of us to pay attention.) The problem is that the ONLY thing p-level tells you is the probability that rejecting the null hypothesis is the wrong thing to do. That’s why small is good: you want the probability of a false positive to be as small as possible. And that is all it does. Nothing else. Take a moment, if you will, to consider all the other ways you could be wrong: false negative, wrong question, wrong control, etc. Most importantly, it does not tell you if your result is significant in any *scientifically meaningful* way.

This is what Paul Ellis is talking about when he uses the vocabulary “substantive significance” and the link above goes right to the heart of the matter: researchers are confusing statistical significance with substantive significance, and journals are letting them get away with it. In my ideal world every result comes with, at least, the standard deviation, and the four things noted below: alpha, beta, sample size, and effect size, that last being accompanied by some sentences describing why that effect size was chosen.

You may be lucky enough to have as big a sample size as you want, but you still must use your brain to decide what matters, to design an experiment that actually answers your question, and to do appropriate controls so that you can make interesting comparisons. A large sample size may allow you to find very small differences, but if the differences are that small, do they matter? They very well might, but you must think that through, no statistic can do that for you.

There is, however, a numerical representation of that “minimum important difference” called the effect size. The best way to plan an experiment is to decide *in advance* on the effect size that you, with your brain, think is important enough to be worth detecting, *and* decide in advance how low you want the probability of a false positive to be (alpha, or, p-level), *and* decide how low you want the probability of a false negative to be (beta), *and* decide on the most information-packed way to measure, which includes deciding on the appropriate statistical test to use, *THEN* calculate the sample size and stick to it.

To learn more about effect size, go to effectsizefaq.com and read all about it.

]]>David L. Streiner, special guest contributor and co-author of excellent statistics texts

In my last blog, we learned the first step in making truly terrible graphs, by confusing the role of a visual with that of a table. But that barely scratches the surface of how graphing packages can allow us to totally screw up. This blog will examine another widely misused technique: pie charts.

Pie charts are the essence of simplicity. If you have a nominal variable with a number of response options, such as “What is the dullest subject you ever studied in college?”, the size of each slice is proportional to the number of people endorsing each alternative, as in the figure below:

It’s obvious that Economics ranks highest on the list and Psychology at the bottom. (The fact that I’m a psychologist did not influence these fictitious data in the slightest.) That was easy. So what’s there not to like about this; isn’t it as American as, say, apple pie? Well, actually no. To begin with, it’s origins are Scottish, not American, with William Playfair’s 1801 book, *The Statistical Breviary*. Even if we ignore Spence and Wainer’s (2001) description of him as an “engineer, political economist and scoundrel,” Playfair has much to answer for. The first problem is the cognitive load imposed on the viewers. They have to first find a color on the chart, move over to the legend to see what course it corresponds to, then move on to the next color, and so on. Not too much of a problem when there are only two or three segments, but the task gets more and more difficult as the number of response options increases. Also, with a greater number of segments, the colors are often progressively harder to discriminate from one another. Again, these problems are exacerbated if the chart is thrown up on a screen for only a brief period of time.

“That’s easy enough to fix up,” I hear you say, “we’ll just put the labels next to each slice. Problem solved!” Well, not really. That may work if we have a small number of slices, but what will happen when there are a larger number of responses? We can see what happens in the next figure.

Getting pretty messy, isn’t it? Some of the legends are inside the pie and some are outside. But this is the least of our problems.

Which slice is larger, Biology or Philosophy? It’s not that easy to tell. The problem is that we’re not very good at judging angles. We tend to underestimate acute angles and overestimate obtuse ones. Moreover, the amount of distortion is different, depending on whether the slice is oriented vertically or horizontally (Robbins, 2005). OK, we can solve that – just include the numbers along with the course names.

Did that help? Not really. Now the chart is even more crowded and we’ve violated the lesson from the first blog – don’t try to turn a figure into a table. A far better way to display these data would be a bar chart:

This shows the relationships among the courses much more clearly, with little mental effort required by the viewer. Note two other things. First, to make the graph even easier to read, the courses have been put in rank order. Second, because the legends are relatively long, the graph has been turned on its side. The course titles never could have fit along the X-axis without a tremendous amount of clutter; or they would have had to have been written vertically. By putting the graph on its side, the viewers don’t have to lie on their sides to read them.

This reinforces one motto of statisticians: “The only time a pie chart is appropriate is at a bakers’ convention.”

To echo the cry of TV sales people, “But wait … There’s more!” We can screw up even more royally. Quoting the master maven of graphing, Edward Tufte (2001), “the only thing worse than one pie chart is lots of them.” Let’s say we broke down the data by gender, with a pie chart for females on the left and males on the right:

It’s easy enough to see that females find economics more boring than do males, because both slices start at 12:00 o’clock. But what about art history or sociology? In order to compare men and women, you have to look at a slice, mentally move it over to the other pie, rotate it until the starting edges line up, try to keep the angle constant, and see if the trailing edges line up. This is not difficult – it’s nearly impossible. Yet again, bar charts would greatly clarify the picture. (By the way, the proportions finding art history and sociology boring are exactly the same.)

So lesson two – to really screw up visual presentations, use a pie chart. Even better, use two or more of them.

References

Robbins, N. (2005). *Creating More Effective Graphs*. New York: Wiley.

Spence, I., & Wainer, H. (2001). William Playfair. In: C. C. Heyde & E. Seneta (Eds.). *Statisticians of the Centuries*, pp. 105–110. New York: Springer.

Tufte, E. (2001). *Visual Display of Quantitative Information* (2^{nd} ed.). Cheshire, CT.: Graphics Press.

(thanks to Dr. David Streiner for alerting us to this page) ]]>

David L. Streiner, special guest contributor and co-author of excellent statistics texts

In 1968, when I was writing up my doctoral thesis, I needed to make some graphs showing how the different groups changed over time under various conditions. There were no computer programs to draw graphs (indeed, there were no such things as desk-top computers back then), so I had to draw the lines by hand, using special pens and ink, and the symbols and letters were added by rubbing them off special sheets of transfer paper. It took an entire day or more to make a single graph, and few people had the ability to do them (I had the advantage of training in engineering, and having spent five summers working as a draftsman). Consequently, there were relatively few graphs in journals, and those which did appear were simple black and white line charts or bar graphs. Researchers had very little ability to screw things up.

Nowadays, every computer comes equipped with at least one, and often two, graphing packages, and they allow the user to add a host of special effects – being able to make the graphs look three-dimensional, to have pie charts with segments highlighted by separating them from the rest of the pie, or to use bars of different shapes and colors. Even more options are available if the graphs will be used during a live presentation: you can use many different fonts and colors; text can fly in and out from any direction; and you can add logos from your university, your research unit, and the funding agency at the bottom of every slide. This is in addition to pictures of leaves or keys or some other totally irrelevant (but cutesy) graphic running down the left side. In other words, users are able to screw up graphs in ways that were previously unimaginable.

Unfortunately, few people know how to take full advantage of these features in order to draw truly terrible graphs. Over the next few months, I hope to remedy this parlous situation in this blog and teach you how you, too, can make graphs as bad as those that grace the pages of many daily newspapers and popular magazines. As an added bonus, I will also show you how to make tables that are unnecessarily dense, obscure, and confusing.

The first lesson in bad graphing, and the focus of this blog, is to fail to differentiate between the purpose of a graph and that of a table. Take a look at the graph below. It shows the expenditure per acute hospital bed in seven regions of a province in Canada. Now imagine you’re sitting in a darkened auditorium and this is on the screen for 30 seconds. So look at it for a while and then close your eyes. Now tell me: What was the average for the entire province? What was the expenditure in region C? Which region had the highest expenditure? (Actually, if you can read these questions, you’re cheating, because your eyes must have been open to do so.)

I’m willing to bet that if you didn’t peek, you’d have trouble answering the first two questions, but may be able to answer the third. It’s simply impossible to remember all those numbers, except perhaps that they’re somewhere in the range of $60,000 to $80,000, and a lot easier to pick up the fact that Region B is the highest. This illustrates the major difference between a table and a graph – the former is better for presenting *numbers* and the latter for showing *relationships*. You may be able to get away with a graph such as this one on the printed page, where the reader has the luxury of staring at it as long as he or she wants, but it would be a disaster if it were shown during a talk. There’s just too much information for the viewer to absorb in just 30 seconds or so. If you want the audience to come away with the message that there are large differences among the regions (and that’s all they will remember one hour later), then kill the numbers.

In fact (and jumping ahead a bit), we can make the message even stronger. Because Region is a nominal variable, the order doesn’t matter, so let’s make the audience’s task easier by rank ordering the regions, and we get a graph like this one:

Now the message comes through loud and clear – there are large differences among the regions, where B is the clear winner and E gets shafted. So the take-home messages are: (1) be clear what you want to communicate, (2) use tables to show numbers, (3) graphs should be used to show relationships, and (4) do everything possible to make it easy for the audience.

]]>I found this link to be a very helpful description of Bayes’ theorem.

]]>

]]>