The mysteries of statistics
For many, statistics is one of those few fields of study that leaves one feeling more mystified the more one learns of it. Much of it seems thoroughly rigorous and grounded, but a gap opens up as soon as one begins applying the inferences to actual research. Statistics claims to be able to condense some extraordinarily complicated inferences into formulas that return highly precise answers. Sometimes there are sound reasons to trust your gut when things don't add up. Here is one critics view of modern statistics:
The methods of modern statistics...are founded on a logical error. These methods are not just wrong in a minor way...They are simply and irredeemably wrong. They are logically bankrupt, with severe consequences for the world of science that depends on them.
So begins Aubrey Clayton's Bernoulli's Fallacy: Statistical Illogic and the Crisis of Modern Science (Columbia University Press, 2021). The book is written less as scholarship than as `a piece of wartime propaganda, designed to be printed on leaflets and dropped from planes over enemy territory' (p. XV). The leaflets are for researchers whose life's work is dominated by modern statistical methods.
Perhaps you are one of them. If so, Clayton has a lot to say to you but, in the end, I think he has one big point which deserves a broader audience than his book will be able to reach. That one point is what he calls Bernoulli's fallacy. Clayton provides many examples from law, science, and games. Here I've tried to put together a short review of Bernoulli's Fallacy that doesn't overtly use any statistics or equations (though the book has plenty of numerical examples). After sharing some of the big ideas from the book I'll describe something I think is still missing from it, namely a critique of the philosophy of science standing behind the problems identified by the book.
If it's not sufficiently clear, I'll just add that Clayton clearly loves probability and science. It's an important qualification given the attacks on science and reason these days. The book is not an outsider's attack on science or probability but rather a continuation of a longstanding dispute between contending ways of understanding probability. So we should emphasize that his strongly worded critique has some major qualifications. Most importantly, many researchers who use statistics already incorporate other ways of reasoning into their work, so that the shortcomings of statistical inference are not always as grievous as they might be.
Objectivity and probability
Before getting into the substance of the book, it will help to introduce one concept that appears over and over again in debates about probability, including in Bernoulli's Fallacy.
The core question here is about objectivity—what does it mean for an analysis or its method to be objective? One answer is that an objective method is one that is both transparently stated and fixed, so that one at least knows when the rules are being applied and when not. One can apply it equally to contending ideas.1 Moreover, only if one can extract the logical structure from an argument ('objectifying' it) can that logic can be subjected to scrutiny. According to one theory, probability is a part of logic and it consists of formal logical structures, a standard of sorts, that can be used to make assessments of arguments and evidence. Here, objectivity is about logic and critical analysis. This is the view that Clayton favors.
There are other meanings to the word 'objective'. One of them refers to things that exist independently of our thoughts. Trees and earth have objective existence, as do symmetry and light; beauty does not. According to the single most influential theory in statistics, probabilities have objective existence in just this sense. The probability of an event is its long-run rate of occurrence. This is known as the frequency theory of probability and its greatest defenders include Venn, von Mises, and Fisher. Probability is said to be a very scientific term which need not resemble the meaning of 'probability' in common parlance, law, and so forth.
Defenders of the frequency theory ardently object to the claim that probability theory is 'mere' logic2. They have been known to doubt whether there could even be a logic of induction. The frequency theory is a cordon that is supposed to keep inductive logic out of probability theory. The rub is that none of these beliefs ever diminished the will of (frequency-theory) statisticians to tell scientists how to how to make inductive inferences. If anything, the 'objective' theory seems only to have strengthened their confidence and to lend to their inductive practices a false air of objectivity (in the second sense, not the logical sense).
If probability is objective like logic then it can be used to probe ideas, to construct arguments, and to explode them. If probabilities are objective like stones then it would seem pointless to argue with them. In that sense the 'objective' frequency theory of probability is soporific, dampening critical thought.
Barbarians with pencils
The soporific power of 'objective' probabilities was, on Clayton's telling, part of an effort to promote the ideology of eugenics. The eugenics movement was mostly British and American men of the professional class arguing that men of their class and nation deserve special privileges over others due to their innate qualities, and that some ambiguously defined 'races' or 'racial stocks' are best exterminated. Francis Galton, Karl Pearson, Ronald Fisher, and others in the eugenics movement preferred to portray their social ideology as objective fact and their statistical methods were central doing so.
Consider Pearson's study of students at the Jews' Free School in East London. He and his coauthor Margaret Moul examined 600 children, Clayton says, `to see if it would be appropriate for the British government to prejudicially deny entry to Jews', who Pearson feared 'will develop into a parasitic race' (145) Examining table after table of averages and correlations of variables such as vision, home cleanliness, and skull shapes they write
we must admit that...we have not reached close correlations...But in breaking what we believe to be new ground we have come across indications that such correlations probably exist [between vision and either eye color or head shape]...there is far more hope of showing vision as a function of anthropometric characters than a product of environment. In other words, it is a question of race, rather than of immediate surroundings. (264)
They found 'indications' that some correlations may exist. Not impressive but enough in their view to support spiteful political action.
Clayton rightly points to Pearson's flexibility when it comes to determining desirable traits, readily altering his position to accommodate his antisemitism, as when saving money became bad only after he found that Jews did it. See Stephen Jay Gould's Mismeasure of Man for a more intensive study of the eugenics literature.
The eugenicists' beliefs were unshakable. After the catastrophe of Nazism—the ultimate test of eugenics—Fisher felt this way (p. 158):
I have no doubt also that the [Nazi] party sincerely wished to benefit the German racial stock, especially by the elimination of manifest defectives, such as those deficient mentally, and I do not doubt that von Verschuer gave, as I should have done, his support to such a movement.
He wrote this in defense of the Nazi Otmar Freiherr von Verschuer who, Clayton reminds us, `used data collected by [his mentee Josef] Mengele in his Auschwitz experiments' (158).
Free Sally Clark!
This history of eugenics and statistical methods is important and intriguing but not central to Clayton's argument. It should be said that Fisher's rotten beliefs do not themselves discredit his research, though his ethical idiocy certainly is reason to closely scrutinize his work and way of thinking. This part of the book may even consume more pages than the argument warrants because the central thesis of Bernoulli's Fallacy is that the problem with (orthodox) statistics stems from a single logical fallacy which has been repeated over and over again. Clayton attributes the mistake to Jacob Bernoulli (1655—1705). The problem can be understood without any statistical training.
Here's a very serious and real example of this logic at work (97-101): Sally Clark of Manchester, England gave birth to two baby boys, one in 1996 and a second in 1997. Sally relays that both babies died under similar circumstances; essentially, the baby was unconscious and stopped breathing. This is known as sudden infant death syndrome (SIDS). In the second case, the baby showed signs of trauma and, according to Sally and her husband, this was due to attempts to resuscitate the boy by themselves or paramedics.
The reasoning that sent Sally to prison for double murder goes like this:
The probability that two boys of the same mother would die of SIDS is extremely small. Therefore the probability that Sally is innocent of murdering them is also very small.
Indeed, SIDS is rare and so it would seem to be extremely unlikely for a mother to lose two babies to it. The jury was convinced that this reasoning was objective, mathematically sound evidence which amounted nearly to proof that Sally could not really be innocent. Sally was convicted to life in prison for murdering her infants.
What is the logical fallacy here? The statistical reasoning provides an answer to the wrong question. The statistical question was presented this way:
What is the probability that both of Sally's two boys would die of SIDS?
That is known as a sampling probability or 'likelihood'. It may be an interesting question but the jury had to answer this question:
Did Sally kill her children?
The logical fallacy consists of the claim that to answer the second question, one need only answer the first question. It is a case of false substitution.
The usual way of reasoning one's way through a problem, and what probability theory actually tells us we must do, is to weigh all of the available evidence. We do have to think about sampling probabilities (and there is more to that than we will mention) but we also have to ask the following question:
Setting aside the untimely death of her boys, what is the probability that Sally would kill her own children?
Put differently, what grounds does the prosecutor have to propose that Sally would even consider committing such an act? That a seemingly normal, loving mother like Sally would murder her babies is ridiculously improbable. The good prosecutor wants a motive. They want character witnesses that could puncture the image of Sally as a normal mother and thereby remove a source of reasonable doubt. To some this is know as the 'prior probability'.
So we have two kinds of evidence. First, the 'sampling probabilities' or 'likelihoods' which are like the relative explanatory power of the competing theories presented by prosecution and defense. Then we have the plausibility or prior probability of the murder. If these two forms of evidence push in opposite directions then they can balance or cancel each other out. This last point was obscured during Sally's prosecution.
Here is how Clayton puts it:
Two children dying in infancy by whatever means is already an extremely unlikely event...The whole landscape of our probability assignments needs to change to reflect the fact that, by necessity, we are dealing with an extremely rare circumstance. And the prior probability we should reasonably assign to the proposition "Sally Clark murdered her two children," determined before considering the evidence, is itself extremely low because double homicide within a family is also incredibly rare! (99)
So a second crucial fact which could have informed the prior probability is the rate at which mothers are found to murder their children. If SIDS is rare, then mother murderers are even more rare. The jury should have been shown a comparison of these two rates. They would have seen that the two not only balance one another but even favor Sally's innocence.
Sally Clark was tragically imprisoned for three years before the courts reversed this mistake and began revisiting similar cases as well.
P-values: precisely wrong
Bernoulli's Fallacy is at the heart of what Clayton calls 'the crisis of modern science', by which he means the failure of an embarrassing number of peer-reviewed studies (in certain fields) to survive attempts at replication. Failure to replicate results is an issue for those fields that rely heavily on the statistician's method of 'null-hypothesis significance testing' (NHST) and are not balanced by a very strong role for explanatory theory or models of mechanisms, including psychology, neuroscience, genetics, medicine, and others. In other words, you could say the problem is that plausibility or prior probability is not seriously coming into play.
NHST is a case of Bernoulli's fallacy. For example, 'Sally is innocent' would be the prosecutor's null hypothesis; if we deduce that the observations would be sufficiently unlikely if the null hypothesis were true then we would reject the null hypothesis. I don't doubt that the damage done by this fallacy is countered somewhat by other scientific practices. For example, medical research does use NHST but new medicines go through multiple stages of evaluation on multiple criteria, typically to include knowledge of the mechanisms that explain the drug's success.
Clayton does a nice job of recounting some of the severe criticisms of NHST that were voiced before it became dogma. Dr. Joseph Berkson of the Mayo Clinic, for one, complained in 1942,
There is no logical warrant for considering an event known to occur in a given hypothesis, even if infrequently, as disproving the hypothesis. (Berkson cited 241)
P-values provide a formal, quantifiable way to commit Bernoulli's fallacy. They appear in journal articles next to estimates of all sorts of quantities, like differences between various groups or effect sizes for some treatment. A small p-value (< .01) means that a value equal to or greater than the actual estimate would (supposedly) be unlikely to arise if the quantity being estimated were, in reality, equal to zero (e.g., if the treatment has no effect).
Clayton provides a fun review of one of the more scandalous papers from the replication crisis. Some psychologists tested for extrasensory perception (yes, ESP!) and obtained 'significant' results (small enough p-values). The study is especially troublesome because the authors followed all of the standard statistical procedures. The 'right' methods gave ridiculous results, which only called into question the methods themselves.
Alternative to what?
When it comes to fixing the problems of statistics Clayton has a number of suggestions, all of which are worth considering. The main line of argument is to abandon the frequency theory of probability. 'The better, more complete interpretation of probability', he writes, 'is that it measures the plausibility of a proposition given some assumed information' (281). This means making use of Bayes' theorem which, in Clayton's view, places a premium on two things (which he helpfully repeats throughout the text): (1) you have to formulate clear alternative explanations and (2) you have to assess their prior probabilities.
Something that is missing from Bernoulli's Fallacy, in my view, is a discussion of the philosophy of science that stands behind the frequency theory. Without addressing those basic ideas about what makes good science good, one risks repeating the same errors. In fact, Bayesian analysis has already been incorporated into moden statistics without bringing about any substantive change in scientific practice. The statistical models may be a little better, but how they fit into the broader research enterprise has not changed.
Statistics is quite thoroughly dominated by the school of thought known as empiricism or (its more recent offshoot) positivism. This is especially obvious if you read the 'causal inference' literature in statistics, where some very radical ideas about causality have advanced with almost no debate to speak of (e.g., that causality has practically nothing to do with the mechanisms that make things happen). The point of the empiricist philosophy championed by the likes of Fisher and von Mises was to banish 'speculative' concepts from science so that it can rest securely on observational evidence alone.
In some respect Bernoulli's Fallacy reminds me of Adam Becker's What is real? The unfinished quest for the meaning of quantum physics (Basic Books, 2018). Becker provides an engaging critique of positivism, showing how a seemingly obscure (and, among philosophers, dead) philosophy of science continues to hold sway in theoretical physics. In Bernoulli's Fallacy, we see all the bogus effects of positivism but never the underlying philosophy.
It would be interesting to read Clayton's book together with, say, Susan Haack's Defending science—within reason: Between scientism and cynicism (2011) or Nancy Cartwright and Jeremy Hardie's Evidence-based policy: A practical guide to doing it better (2012). They talk about non-statistical ways of reasoning that are indispensible to science.
Another interesting reference point would be George Pólya's Mathematics and plausible reasoning, which shows how probability theory can be highly relevant even when one does not apply quantitative analysis. He provides all sorts of carefully theorized examples that somewhat resemble our discussion of Sally Clark's court case. Pólya built the bridge we need to get from probability theory to the rest of science and philosophy.
In the end, the kind of logical probability that Clayton advoces is important but if it seeks change how science is done then it must shed its current 'Bayesian'/positivist image and learn to play well with other theories.
General references
Aubrey Clayton (2021). Bernoulli's Fallacy: Statistical Illogic and the Crisis of Modern Science. Columbia University Press.
Connor Donegan (2025). 'Probability and the philosophies of science: A realist view'. SocArxiv preprint. https://osf.io/preprints/socarxiv/k3nf5_v2 PDF
James Franklin (2001). 'Resurrecting Logical Probability'. Erkenntnis: 55 (277-305). https://philarchive.org/rec/FRARLP
Rom Harre (1972). The Philosophies of Science. Oxford University Press.
Susan Haack (2011). Defending Science - within reason: Between Scientism and Cynicism. Prometheus, 2011.
George Pólya (1954). Mathematics and Plausible Reasoning. Princeton University Press, 2 volumes.
Notes
This meaning of 'objectivity' is based on some ideas found in Harold Jeffreys' Theory of Probability. His emphasis was on 1) knowing when we are following certain rules and when we are not, and 2) being able to apply the same rules to different ideas, so that the ideas I favor can be assessed using the same standards as the ideas I do not favor.
They want you to think about games of dice or cards or other gambling machines. You'll notice that probability distributions, once you learm them, can be very good at predicting average outcomes. In fact, there is an incredibly wide range of situations which show aggregate results that superficially resemble a probability distribution. The frequentist sees this and says, 'it must not be logic because logic lives in your head; but this here is an objective fact'. The response to this argument is that the same is true of all logic. Deductive logic accords with all experience but it is logic nonetheless. That is the nature of (sound) logic and the subject of a different kind of discussion. The way we feel about the consistency of probability theory with relevant experience is due, first, to the correctness of that logic and, second, to how readily we dismiss of all those cases where results don't resemble expectations.