It is called the Wason* 2-4-6 Task, (I’ve seen it referred to as the 2-4-8 test). It is the best exercise I’ve ever seen for demonstrating the perils of confirmation bias. It also stimulates great conversations about the importance of controls, the careful examination of assumptions, the importance of negative results, and, the biggie, how critical it is to attempt to DISPROVE your hypotheses, not prove them. When I’ve done it with colleagues as well as students, it has also stimulated discussions about experimental design, and different kinds of creativity, and how having multiple hypotheses can help prevent falling dangerously in love with one.

There are many versions on the web; I like this site:

https://explorable.com/confirmation-bias

It has a very nice explanation and a charming video. If you can, stop it before he gives the answer (at 2’55″) – see if you can guess the rule.

I cannot recommend this exercise more highly. I do it with every new student that crosses my path, as well as friends and family (I am such a nerd). Everyone, without exception, thinks it is a fun and intriguing experience. And forevermore, you can help students realize when they are thinking in a biased way just by saying “2-4-6” so it also provides a handy tool for reinforcing the ideas.

Go forth and joyously spread the news of the Wason 2-4-6 Task!

*Peter Cathcart Wason, 1923-2003. Among many achievements, he coined the term “confirmation bias”.

]]>Analysis of Patriot’s pressures incorrect

(Thanks to Marc Abrahams for bringing this to my attention)

]]>where I just clicked on

This page talks about the problem with p-level being the be-all and end-all of way too many scientific studies. You might be aware that there is discussion about this in the scientific literature. (I think statisticians deserve prizes for still trying to get the rest of us to pay attention.) The problem is that the ONLY thing p-level tells you is the probability that rejecting the null hypothesis is the wrong thing to do. That’s why small is good: you want the probability of a false positive to be as small as possible. And that is all it does. Nothing else. Take a moment, if you will, to consider all the other ways you could be wrong: false negative, wrong question, wrong control, etc. Most importantly, it does not tell you if your result is significant in any *scientifically meaningful* way.

This is what Paul Ellis is talking about when he uses the vocabulary “substantive significance” and the link above goes right to the heart of the matter: researchers are confusing statistical significance with substantive significance, and journals are letting them get away with it. In my ideal world every result comes with, at least, the standard deviation, and the four things noted below: alpha, beta, sample size, and effect size, that last being accompanied by some sentences describing why that effect size was chosen.

You may be lucky enough to have as big a sample size as you want, but you still must use your brain to decide what matters, to design an experiment that actually answers your question, and to do appropriate controls so that you can make interesting comparisons. A large sample size may allow you to find very small differences, but if the differences are that small, do they matter? They very well might, but you must think that through, no statistic can do that for you.

There is, however, a numerical representation of that “minimum important difference” called the effect size. The best way to plan an experiment is to decide *in advance* on the effect size that you, with your brain, think is important enough to be worth detecting, *and* decide in advance how low you want the probability of a false positive to be (alpha, or, p-level), *and* decide how low you want the probability of a false negative to be (beta), *and* decide on the most information-packed way to measure, which includes deciding on the appropriate statistical test to use, *THEN* calculate the sample size and stick to it.

To learn more about effect size, go to effectsizefaq.com and read all about it.

]]>David L. Streiner, special guest contributor and co-author of excellent statistics texts

In 1968, when I was writing up my doctoral thesis, I needed to make some graphs showing how the different groups changed over time under various conditions. There were no computer programs to draw graphs (indeed, there were no such things as desk-top computers back then), so I had to draw the lines by hand, using special pens and ink, and the symbols and letters were added by rubbing them off special sheets of transfer paper. It took an entire day or more to make a single graph, and few people had the ability to do them (I had the advantage of training in engineering, and having spent five summers working as a draftsman). Consequently, there were relatively few graphs in journals, and those which did appear were simple black and white line charts or bar graphs. Researchers had very little ability to screw things up.

Nowadays, every computer comes equipped with at least one, and often two, graphing packages, and they allow the user to add a host of special effects – being able to make the graphs look three-dimensional, to have pie charts with segments highlighted by separating them from the rest of the pie, or to use bars of different shapes and colors. Even more options are available if the graphs will be used during a live presentation: you can use many different fonts and colors; text can fly in and out from any direction; and you can add logos from your university, your research unit, and the funding agency at the bottom of every slide. This is in addition to pictures of leaves or keys or some other totally irrelevant (but cutesy) graphic running down the left side. In other words, users are able to screw up graphs in ways that were previously unimaginable.

Unfortunately, few people know how to take full advantage of these features in order to draw truly terrible graphs. Over the next few months, I hope to remedy this parlous situation in this blog and teach you how you, too, can make graphs as bad as those that grace the pages of many daily newspapers and popular magazines. As an added bonus, I will also show you how to make tables that are unnecessarily dense, obscure, and confusing.

The first lesson in bad graphing, and the focus of this blog, is to fail to differentiate between the purpose of a graph and that of a table. Take a look at the graph below. It shows the expenditure per acute hospital bed in seven regions of a province in Canada. Now imagine you’re sitting in a darkened auditorium and this is on the screen for 30 seconds. So look at it for a while and then close your eyes. Now tell me: What was the average for the entire province? What was the expenditure in region C? Which region had the highest expenditure? (Actually, if you can read these questions, you’re cheating, because your eyes must have been open to do so.)

I’m willing to bet that if you didn’t peek, you’d have trouble answering the first two questions, but may be able to answer the third. It’s simply impossible to remember all those numbers, except perhaps that they’re somewhere in the range of $60,000 to $80,000, and a lot easier to pick up the fact that Region B is the highest. This illustrates the major difference between a table and a graph – the former is better for presenting *numbers* and the latter for showing *relationships*. You may be able to get away with a graph such as this one on the printed page, where the reader has the luxury of staring at it as long as he or she wants, but it would be a disaster if it were shown during a talk. There’s just too much information for the viewer to absorb in just 30 seconds or so. If you want the audience to come away with the message that there are large differences among the regions (and that’s all they will remember one hour later), then kill the numbers.

In fact (and jumping ahead a bit), we can make the message even stronger. Because Region is a nominal variable, the order doesn’t matter, so let’s make the audience’s task easier by rank ordering the regions, and we get a graph like this one:

Now the message comes through loud and clear – there are large differences among the regions, where B is the clear winner and E gets shafted. So the take-home messages are: (1) be clear what you want to communicate, (2) use tables to show numbers, (3) graphs should be used to show relationships, and (4) do everything possible to make it easy for the audience.

]]>I happen to think that advertising is the root of all evil[1]. The word “smart” now applies to phones; need I say more? To sell a product requires convincing a buyer that *this* product offers something *that* product does not and you *need* that thneed[2]. Whether that something is useful or good is rarely discussed and certainly not by the salespeople. Plus we all like shiny new things. New math anyone?

Most of the fundamentals, like multiplying fractions, using a pipet, reading a graduated cylinder, and matching your predicate to your subject, are forgotten in our excitement over ANOVAs and digital qPCR machines and telling everyone what we did. Perhaps it *should* be reasonable to assume that students got those fundamentals in high school, grade school, or utero. Unfortunately we make that assumption at the peril of our experiments. The wrong pH can really mess you up and it will be almost impossible to discover what went wrong or, worse, that something did go wrong. Even if your students took the classes and aced the tests, it is likely that the skills were forgotten, or deemed useless, before your student realized that Science was the best career on the planet. Plus the students, especially the A students, either don’t know they don’t know[3], or won’t admit they don’t know. Students are often embarrassed to ask how to use the tools, or they assume they don’t need to ask. So everyone thinks they know how to pipet and how to write, and here we are, publishing p-levels while leaving out the sample size, the effect size, and the power, and reviewing manuscripts that take forever to read because the writer’s meaning is so well hidden. The public doesn’t have a chance and science writing is now a specialty that, while it is well written, often garbles some of the facts, or misses the important ones. And I know good scientists who don’t know that an outlier is not just something that looks different, there is an actual calculation involved[4].

My suggestion is that everyone, regardless of whether they believe it to be true, announce: “I do not write, or measure, or calculate as well as I could.” Spend time with your students actually reading the manuals of your tools before using them – even pipettes have directions. Have a journal club in which you read *The Science of Scientific Writing* by Gopen & Swan[5] and *Strong Inference* by Roger Platt[6]. Work through *Biostatistics: the Bare Essentials* or *PDQ Statistics* by Norman and Streiner[7]. That knowledge is the foundation you must have, *and maintain*, so that you can build a new paradigm with your creativity and your novel insights. And the students will really appreciate being taught stuff without having to ask. Do this for the whole field: work on your writing and your arithmetic skills. Help put Lab Math on the remaindered list.

[1] The irony of that sentence appearing on a blog that is, at least in part, advertising for my book, has not escaped my notice.

[3] See: Kruger J. and Dunning D. (1999) Unskilled* and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments*. Pers. Soc. Psychol. 77(6):1121-34.

[4] I will take this opportunity to say how delighted I am to see box plots again!

[5] Gopen G. and Swann, J. (1990) The Science of Scientific Writing. American Scientist. https://www.americanscientist.org/issues/pub/the-science-of-scientific-writing

[6] Platt, J.R. (1964) Strong Inference. Science. 146:3642. http://pages.cs.wisc.edu/~markhill/science64_strong_inference.pdf

[7] Norman, G.R. & Streiner, D.L. (2014) Biostatistics: The Bare Essentials 4th ed. Hamilton, Ontario, Canada. B.C. Decker Inc. –OR– Norman, G.R. & Streiner, D.L. (2003) PDQ Statistics, 3rd ed. Hamilton, Ontario, Canada. B.C. Decker Inc.

]]>

I found this link to be a very helpful description of Bayes’ theorem.

]]>

]]>