In the 2 July 2017 issue of Nature, there is an article in the Comment section about the science of measuring:

http://www.nature.com/news/metrology-is-key-to-reproducing-results-1.22348

The moral of the article is that if we all actually measured properly it would go a long way towards fixing our reproducibility problem. Good point, that unfortunately has to be made over and over and over.

Then, while looking for a good image to represent Metrology in this post, I learned that the 20th of May is World Metrology Day. According to http://www.npl.co.uk/world-metrology-day/,

“World Metrology Day is celebrated by over 80 countries each year on 20 May – the anniversary of the signing of the Metre Convention back in 1875. To this day, the agreement provides the basis for a single, coherent system of measurements that are traceable to the International System of Units (SI).”

Now THAT is worth cheering!

While this may not be something you think about much, some of you may recall what happened to the Mars Climate Orbiter because of a “failed translation of English units into metric units” (it probably crashed into mars). That’s right, not everyone used SI units, and, well, oops.

Here is my suggestion: You know how we all (are supposed to) check the batteries in our smoke detectors whenever the time changes? I think that every May 20th we should all check our units. Mark your calendars.

Image from: https://degiuli.com/2017/04/19/6-project-management-lessons-from-the-mars-climate-orbiter-failure/

]]>Check out this definition of standard (from http://www.oxforddictionaries.com/us/definition/american_english/standard): “An idea or thing used as a measure, norm, or model in comparative evaluations.” ‘Comparative evaluations’ is what I want to emphasize here – when you draw bars indicating the uncertainty in the data you collected, those bars should be comparable to everyone else’s bars. Standard error bars are not comparable and they make your audience have to do extra work to figure out what you found; how annoying! In contrast, standard deviations always mean the exact same thing! How nice for your audience!

The first step of reporting any data set (collection of measurements) is to describe the distribution of your data. To do that, you first make a frequency plot – the x-axis shows the values of your measurements, the y-axis shows the number of times you got each of those values, like in figure 1. Then, you summarize the distribution by saying where the center is and how the measurements are spread out around that center point.

Important aside: Thinking in terms of distributions will help with doing statistical analysis, too. Unless you are using non-parametric statistics, the statistics you will use tell you about distributions, not absolute numbers. As smart as they are, even statisticians cannot predict your data. So in many ways, I advise thinking about the distribution of your data as soon as you possibly can

Figure 1 shows identical frequency plots. Note, though, the scales of the y-axes have been changed to indicate different sample sizes; nevertheless, the distributions of the data points are exactly the same. If the distributions are identical, it follows that the description of the distributions should be identical. And the standard deviations are, indeed, identical: 1.6 and 1.6.

But look what happens to the standard error because of the difference in sample size: 0.3 vs 0.03 is a difference of an order of magnitude, even though the distributions are, you may remember, identical. Standard error is not a comparable evaluation. QED.

Here is a visual that shows what happens when the frequency distribution data are presented in summary form, the kind of figure you are more like to see in a paper:

The data on the right may “look” better, but that kind of spin is frowned upon in science, since your audience will assume it describes the spread of your data, but it does not. The standard error is not standard.

I hope it is pretty clear at this point that the standard error *cannot* be a “standard” way to describe the distribution of your data. Did someone tell you it was OK or traditional to use standard error as long as you say what your sample size was? True, to a point, but is it OK to divide your uncertainty by 10 as long as you say you did it? I recommend going to that person and saying “I’m confused. You said to use the standard error, but this easy to understand article by well respected biostatistician David Streiner (Maintaining standards: differences between the standard deviation and standard error, and when to use each. Can J Psychiatry. 1996 Oct;41(8):498-502; https://www.ncbi.nlm.nih.gov/pubmed/8899234) says that is wrong.” It’s a teachable moment; question authority.

The distribution of your data was what it was – don’t make it look like you are trying to hide something: share it proudly and accurately using the agreed upon standard. You surely worked hard enough to collect it. Also, to repeat myself, the international community of scientists has declared that the standard deviation is the correct way to report uncertainty; so, reporting standard error is like reporting length in cubits instead of meters, and that is just being ornery for no good reason.

So, where does that leave standard error?

There are two kinds of statistics: descriptive and inferential. Above, I’ve been pontificating about descriptive statistics – numbers that describe the distribution of the measures you actually made on your sample. *IF* your data are normally distributed, mean and standard deviation are useful summaries of what you found, because they are standard so just two numbers will give your audience an interpretable summary of your data.

Inferential statistics let you make inferences about the population from which the sample came. I think it is fairly intuitive that if you measure many more individuals (that is, your sample size is bigger), your estimate of the distribution of the entire population will get better and better. One way to think about this is to look at the extremes: if your sample size is 0, you will make an absolutely terrible estimate of the mean of the population. If your sample size equals the size of the population, your estimate will be perfect. In between, the bigger your sample size, the closer to perfection you get with your estimate of the whole population. Thus, it is when you are calculating inferential statistics that you should take into account the sample size.

One useful statistic to report when discussing your inferences about the population is the confidence interval. It tells your audience the range within which you believe the mean of the population would be found. As always with statistics (“statistics means never having to say you are sure”), you also tell your audience the degree of confidence you have in those intervals. To calculate confidence intervals, divide your standard deviation by the square root of the sample size then multiply that quotient by 1.96, if you want to indicate that you are 95% confident, or 2.58 if you are 99% confident. That quotient, for some reason, got a name: the standard error. In other words, standard error is just a rest stop on the road trip towards confidence intervals: you might be tempted to stop in for little chocolate donuts and coffee, but you really don’t want to linger there or brag about having been there. Just keep moving towards your goal.

I will end with a rule of thumb for interpreting graphs that (annoyingly) show standard error instead of standard deviation: in your head, double them, and that will give you a reasonable estimate of the 95% confidence intervals, although it will still leave you unclear about the data the authors collected. YOU will never make your readers do that, right?

]]>

It is called the Wason* 2-4-6 Task, (I’ve seen it referred to as the 2-4-8 test). It is the best exercise I’ve ever seen for demonstrating the perils of confirmation bias. It also stimulates great conversations about the importance of controls, the careful examination of assumptions, the importance of negative results, and, the biggie, how critical it is to attempt to DISPROVE your hypotheses, not prove them. When I’ve done it with colleagues as well as students, it has also stimulated discussions about experimental design, and different kinds of creativity, and how having multiple hypotheses can help prevent falling dangerously in love with one.

There are many versions on the web; I like this site:

https://explorable.com/confirmation-bias

It has a very nice explanation and a charming video. If you can, stop it before he gives the answer (at 2’55″) – see if you can guess the rule.

I cannot recommend this exercise more highly. I do it with every new student that crosses my path, as well as friends and family (I am such a nerd). Everyone, without exception, thinks it is a fun and intriguing experience. And forevermore, you can help students realize when they are thinking in a biased way just by saying “2-4-6” so it also provides a handy tool for reinforcing the ideas.

Go forth and joyously spread the news of the Wason 2-4-6 Task!

*Peter Cathcart Wason, 1923-2003. Among many achievements, he coined the term “confirmation bias”.

]]>I think there might be an error in the equation for converting RCF to rpm on page 140 of the second edition, hardcover.

Should the equation be:

*rpm*= (*RCF / (r* x 1.118 x 10^-**6**))^1/2

instead of 10^-5?

because the radius is measured in mm?

…

E. D.

Dear E. D.

Thank you for pointing out the issue. You are correct. The difference has to do with the units of radius.

If you look around, you will find that there is no convention for whether to report the radius of the rotor in mm or cm. Unfortunately, I didn’t make it clear that there are two versions in common use, and that they are both in the book specifically to show that. On page 139, it is written out correctly for mm, and it states explicitly that I mean radius in mm. On page 140, I switched to cm, with only a parenthetical comment that I had done that. I really should make that more obvious. When using cm, the exponent is –5, when using mm the exponent is –6.

One way to think about it is to imagine measuring the rotor in mm, now imagine measuring the exact same rotor in cm. The second measurement is going to be the first measurement divided by 10. But the RCF hasn’t changed. To take that “divide by 10” into account, therefore, you need to multiply the answer by 10, or you won’t get the same RCF. That “multiply by 10” gets folded into the constant so the exponent becomes 10^-5.

]]>Analysis of Patriot’s pressures incorrect

(Thanks to Marc Abrahams for bringing this to my attention)

]]>

My lab has a new centrifuge that I recently needed to use for the first time. Like most centrifuges, you can set the rotations per minute (RPM) and the number of minutes; my protocol said ‘spin at 125 g for 6 minutes.’ Having used many centrifuges, and having written (in Lab Math) about the indefensible* conversion from RPM to g, I expected this. However, when I went to look at the conversion chart that I expected to find taped to the lid of the centrifuge, it wasn’t there. No one had made a spreadsheet to calculate the conversion from g’s to RPMs for frequently used values. No one had measured or looked up the radius of the rotor, or indicated whether it represented the distance to the middle or the tip of the holders. At least, no one had thoughtfully posted it in an obvious place for those who were to follow. So I went in search of someone who I knew had used this centrifuge to find out if this information was kept somewhere that I didn’t know about. The person I found to ask was trained in a lab at Yale. He was told, during his training, that 1.0 RPM (the numbers have been changed to protect the innocent) would give him the correct RCF, and since our centrifuge is about the same size as the one in the Yale lab, he just uses 1.0 RPM. Always.

SERIOUSLY? The equation is: RCF = RPM^{2} [min^{-2}] x Radius [mm] x 10^{-6}

It is multiplication! Granted, getting out a ruler and measuring from the center of the rotor to the tip of the holder can be physically exhausting and is best left to the young athletic types in your lab. No argument there. But *risk your experiments rather than do multiplication?* And he learned this at Yale? This is a very smart person, a very good scientist, yet the thought of doing multiplication is so distasteful, that he relies on a number he once heard from someone he considered trustworthy.

What is this ‘culture of equation avoidance’ doing to our scientists? He may someday find himself with a new centrifuge, of a different size, and his 1.0 RPM could lead him to bad data. Troubleshooting will be close to impossible, and he will abandon his beautiful experiment and not get a grant.

Please help him, and scientists like him. Change the culture. Use equations until it hurts.

* I object to the use of “g” as unit for this purpose, although I appreciate that thinking in terms of our constant companion gravity is a comfort. The correct name for the parameter in question is Relative Centrifugal Field or RCF. Without regard for reality, however, RCF is traditionally reported in units of g. RCF has dimensions of length over time squared (L T^{-2}), which is mm/minutes squared in the above equation (rotation is dimensionless). RCF is determined entirely by the rotations per minute and the radius of the rotor. On the other hand, gravitational force has units of, surprise, force, i.e. Newtons, meaning its dimensions are mass x length / time squared (M L T^{-2}). What happens to the mass when you convert to RCF? Traditionally, they’re not telling. So, writing “an RCF of 125 g” is an abomination. However, I have gotten over this and moved on. Really.

In previous blogs, I described how to make terrible graphs using some of the features of leading graphing packages, such as pie charts and 3-D graphs. But, this is unfair to users of other programs that do not offer these “enhancements” (yes, such programs do exist; in fact, I use them exclusively except when I’m preparing talks for hospital and university administrators). “How,” I hear them cry, “can we too make truly terrible graphs?” Well, do not despair; help is at hand. In this blog, I will discuss a very easy way to turn a straightforward graph into a disaster.

The vast majority of graphs have two axes – the X-axis (abscissa) along the bottom and the Y-axis (ordinate) running along the left side. There can be variants of this, such as having a secondary Y-axis on the right, or having the Y-axis cross the X-axis in the middle, but these won’t change the basic message. Also in most cases, the Y-axis starts at zero and runs up (or down, in some cases) to the maximum. Simple as this seems, it leaves a lot of room for mischief. The best way to thoroughly distort what the data show is to have a “floating Y” – starting the axis at some point other than the natural base, which in most cases is zero. For example, let’s assume that the university’s president is trying to justify his request for an (obscenely high) increase to his (already obscenely high) salary because his workload has gotten so much heavier over the past few years. To bolster his case, he presents the following graph to the board of governors:

Wow! Look at the increase. Of course we have to reward him (although we could ask why he’s still working a shorter week than mere mortals). But wait a second – the Y-axis doesn’t start at zero; it’s floating up there with a minimum of 30. What would the graph look like if it did start at zero?

That’s more like it and just as we suspected; that “increase” is barely perceptible without a microscope. By shrinking the range of the Y-axis, small differences are magnified.

You may object to this graph on esthetic grounds, that most of the graph – the area below 30 – is blank, and why waste space showing nothing? That’s a valid point. There are times when it doesn’t make sense to start at zero. In these instances, the honest thing to do is at least alert the reader to that fact by making a break on the axis, like this:

Note that we’ve made a bit of a compromise; there’s less empty real estate, but the increase appears a bit more extreme than it actually is. We’ll discuss in a bit how to determine if there’s too much of a distortion.

Lest you think that exaggerating differences by having a floating Y-axis is restricted to unscrupulous administrators (if that isn’t a redundancy), here’s a graph taken from an article purportedly showing that the risk of suicide is reduced by attending religious services (Kleiman & Liu, 2014).

For those of you who are unfamiliar with survival analysis, the left axis, “Survival function,” shows the odds of being alive after a given time for the two groups.

Again, the first reaction is Wow! Maybe we should all think of attending services a couple of times a week, if not every day, and that’ll really reduce our risk. But let’s take a closer look at the Y-axis. The bottom is not at zero, but at 0.9990. In other words, the entire range is 0.001 rather than 1.0. That “difference” between the groups is actually 0.9998 versus 0.9992 over an 18 year span. I tried plotting it with a true zero, and the lines were perfectly flat and superimposed on one another, as was the case with starting it at 0.80 and 0.90. In fact, I couldn’t see any light between them until I did:

Even here note that the axis extends only from 0.98 to 1.00. Kinda sorta makes you want to reconsider how you spend your weekends, at least insofar as preventing suicides is concerned.

So, how can you tell if a graph is misrepresenting what’s really going on? You can use the Graph Distortion Index (GDI) proposed by Beattie and Jones (1992). It’s defined as:

In the first graph, the president’s change in time looks like a 350% increase (from 20% up the Y-axis to 90% up the axis), whereas the actual increase is 15.6%. So, plugging those numbers into the equation we get (350/15.6) – 1 = 21.4, which is more than a bit higher than the recommended maximum of 0.05.

Remember what we said in an earlier blog: the main purpose of a graph is not to present numbers, but to allow the viewer to get an immediate visual impression of what’s going on. So don’t despair; even if you can’t make pie charts or 3-D graphs, you can still really distort the data by using a floating Y-axis.

References

Beattie, V., & Jones, M. (1992). The use and abuse of graphs in annual reports: Theoretical framework and empirical study. *Accounting and Business Research, 22*, 291–303.

Kleiman, E. M., & Liu, R. T. (2014). Prospective prediction of suicide in a nationally representative sample: Religious service attendance as a protective factor. *British Journal of Psychiatry, 204*, 262-266.

]]>

David L. Streiner, special guest contributor and co-author of excellent statistics texts

In the two last blogs, we learned the first steps in making truly terrible graphs: by confusing the role of a visual with that of a table, and by using pie charts. But that barely scratches the surface of how graphing packages can allow us to totally screw up. This blog will examine another widely used travesty: making the graph look three-dimensional. Indeed, in some (unnamed, at least for now) programs, the default option is 3-D, and you have to work hard to reset it to make the bar chart or pie chart 2-D. With so many newspapers using 3-D graphs in their pages, you may be excused for thinking that this is good practice; after all, they are ever so much sexier and eye-grabbing than the flat, 2-D types. So what if they are harder to read and distort the data; isn’t that a small price to pay for sexy and cutesy? Yes if you’re an administrator, but No if you’re a scientist.

Take a look at the first graph; can you guess what values are being plotted?

Not that easy, is it? First, do you attend to the front of the bar or the back? The front grabs our attention, but it’s actually the back that’s important. Then you have to follow the lines on the back “wall” over to the left, turn down by about 20 degrees, and read the number off the Y axis. Not too hard here, but imagine that the bar fell in between the labeled values; even more estimation and guesswork is required. Now, what are four values? You’d be excused f you said 2, 4, 6 and 8, but “Gotcha!” The values are actually 2.22, 4.22, 6.22, and 8.22. The reason that the bars look smaller than their true values is seen at the floor of the graph. The bars aren’t flat against the back wall; they’re displaced somewhat in front of it. So, to get the true values, you first have to mentally project the level of the top upwards to the back wall by the same amount that the bar is displaced, and then carry that line to the left and back down; a totally unnecessary series of steps, prone to error at each step.

Let’s add insult to injury by plotting the same set of numbers (2, 4, 6, and 8) using two programs created by the same, still unnamed, software company. The one on the left was made with PowerPoint (whatever happened to spaces between words?) and the one on the right by Excel – same data, different look (oops, did that give away the name of the company?). I pity the poor people sitting in the audience trying to figure out the real values.

Compare that to a simple 2-D chart:

Not nearly as sexy. The only things it has going for it are that (1) it’s easy to read; (2) there’s no ambiguity; and (3) it’s accurate. Obviously, university and hospital administrators will shun 2-D graphs in favor of 3-D.

As the final step, let’s combine the worst of both worlds – 3-D pie charts.

Rank order the four segments. Most likely, you’d say A < B < C < D. That would be understandable, but it’s the second “Gotcha!” In fact, A, B, and C are the same, and D is twice as large as each of them. So why did you get it so wrong? I mentioned in the previous blog that equal angles are perceived differently, depending on whether they are oriented vertically or horizontally, and that’s part of the problem here. The other part is that tilting the pie distorts the angles even more. You can try this at home – make a 3-D pie chart with four equal segments, and then modify the angle of the tilt and see what happens. With any luck, that will be the last 3-D pie chart (or any type of 3-D chart) you will ever make.

So in conclusion, if you really want to screw up a chart (or if you’re making a presentation to administrators), use 3-D graphs, and especially 3-D pie charts. Meanwhile, real researchers will be content with only two dimensions.

]]>(thanks to Dr. David Streiner for alerting us to this page) ]]>

Let me correct something intentionally misleading in the first paragraph: the author of the article does, in fact, mention some apps that are probably using statistical methods in the background, such as an app that (I’m guessing) counts the times you do various things, like breathe and beat your heart, and from those numbers it infers what phase of the sleep cycle you are in (I’m guessing), so that it can wake you when you are sleeping lightly. This, presumably, makes it easier to wake up and face the day. I’m all for things that “make it easier” just like I am all for more poetry. Another example of an actual statistic: he begins the article talking about an app that will calculates averages, and an average is a descriptive statistic. It’s not good for much if you separate it from its standard deviation; the author does not mention whether the standard deviation is provided by the app, which is a shame because the app under discussion is one that tracks sexual behavior. Talk about lost opportunities.

But even though I lied in the first paragraph, the average (mean, median, or mode is not disclosed) is not the kind of statistic that is going to have the devastating effect feared by some of the people quoted. A writer, whose writing I love, is quoted as saying: “metrics…rob individuals of the sense that they can choose their own path.” Whassat? Knowledge is the opposite of free will? Having a map robs me of the choice of where to go? The end of the quote makes me want to scream: “The surface and numbers aren’t going to hold if your child gets sick or your wife gets cancer.” When someone I love is sick, my number one first priority is getting them the best treatment. Finding and choosing among treatments means understanding statistics. After we all agree that we’ve given the patient the best chance of recovery, then I go to poetry and literature for solace, never forgetting to be profoundly grateful to the nerds who figured out the medicine, using statistics.

There is a difficulty though, aptly put by another person quoted in the article: “Coming up with the correct meaning is what’s hard.” My fear is that seeing all those averages makes people think they know something more than what they know. Humans aren’t very good at statistical reasoning, on average. And it is not difficult to spout accurate statistics then make it sound as if they support a claim – pick your favorite example, advertising or political campaigns. There is a great little book called How to Lie With Statistics that will teach you how to do it. The danger is that because the statisticians are definitely not winning, numbers can be abused and we are the losers.

What’s to be done, what’s to be done. We, defined as everyone reading this plus all teachers everywhere, have to erase the false wall between poetry and statistics. I can even argue that they are quite similar: poetry is to words as statistics is to numbers. Both capture a lot of meaning using a few symbols. I won’t attempt to carry that analogy any further. But my point is that it is not either or. Understanding numbers, even complicated ones like inferential statistics, can not possibly impact our need for art. A lack of understanding of numbers, however, is a dangerous state (see: Innumeracy by John Allen Paulos). Do I need to mention political rhetoric again?

We need to teach the interpretation of statistics, maybe even before we teach the statistics themselves. We need to teach the difference between description and prediction, (a rhyme, not a poem). We need to teach the difference between correlation and causation, and between measurement and reality, and we need to do it well. We should probably get some help from the poets.

**“A poet must not aim to teach and advance a science as much as to show its advantages and make it loved.”** René-Richard Castel (1758–1832)