Some Thoughts on Knowledge and the Logic of Conspiracy

Douglas P. McManaman
Reproduced with Permission

Conspiracy theories along with the anomalies that allegedly support them are certainly interesting things to think about. I have pursued some of them, but after a time I began to suspect a problem with the logic behind what I will refer to as conspiratorial thinking. It is difficult to pinpoint the problem, because it is indeed the case that people do team up to commit crimes, that is, they conspire to murder, defraud, lie and deceive, etc. In other words, not all criminals act alone. It seems to me, however, that there is something fundamentally different about “conspiratorial thinking”, which describes a habit of suspicion, which in turn disposes a person to readily believe and even construct conspiracy theories on the basis of what seems to me to be tenuous evidence. Although I do believe there are psychological and cultural factors that are at the root of this phenomenon, I would, however, like to focus on what seems to me to lie behind a good deal of conspiratorial logic. 

Firstly, conspiratorial thinking seems to have as a cognitive backdrop an “absolutist” habit of mind, a dogmatic posture that pays little attention to the implications of our cognitive limits. A healthy skepticism towards what appears to him as true and certain is often lacking in the dogmatist. To defend this dogmatism, the absolutist zeroes in exclusively on truth’s immutable and eternal nature, which is why conspiratorial thinking is typically found among fundamentalists of all stripes, i.e., very traditionalist Catholics, Protestant Evangelical fundamentalists, Islamic fundamentalists, political/ideological fundamentalists on the left and right, etc. 

It is indeed the case that “truth is eternal and immutable”, however, a great deal is overlooked here. Truth must be considered from two angles: the definitional and criteriological. The definition of truth is indeed the “adequation between what is in the mind and what is”, and so if “what is” is unchanging and what is in my mind corresponds precisely to that, then it follows that truth is unchanging. “What is”, however, very often changes, and so what was true at one time is often no longer true. Indeed, my contention about change is, if true, unchanging. On the side of the subject, however, the immutability of truth in large part depends on the level of abstraction on which one is thinking. The higher the level of abstraction, the less are one’s insights subject to change. For example, mathematics takes place on a rather high level of generality: the mind abstracts from sensible matter and motion (i.e., wood, steel, etc.) in order to consider quantity alone, either discrete (arithmetic) or continuous (geometry), without reference to this or that particular quantity of matter. The result is that mathematical truths do not change as do scientific contentions, which are tentative and often discarded when new and more plausible data introduces inconsistencies with old data, causing serious rends in the fabric of our earlier scientific estimates. Although mathematics does not change in that respect, it certainly develops, which is a kind of change nonetheless. 

The critical point, however, is that the criterion for truth is not the same as its definition. The absolutist habit of thinking tends to overlook this distinction. The inductive nature of knowledge acquisition, most evident in the sciences and in the study of history, is such that for the most part we don’t know for certain whether we are in possession of “the truth”. That is why there is a history of science, or a history of philosophy, and a history of theology, of psychology, of economics, etc. An important problem in the theory of knowledge is determining the criteria for what is in fact “true”; what we typically have at any one time is “the truth as I see it”. “As I see it” might very well be false without me knowing it, or it might be true without my being entirely aware of the fact; what often happens is that new information will cause me to discard what I up to that point believed, in favor of a more plausible estimate, given new data, at least until newer information either brings me back to my former position if what I held previously was true after all, or causes me to formulate a new and better estimate—the maximally plausible, consistent, coherent, simple, and functionally efficient estimate. In other words, all we may be able to achieve in the end is a “realistic certainty”. In short, truth (knowledge) is difficult to achieve. 

The world, or “what is”, is made up of myriads of complex layers. This is so much the case that two people rarely see the world in the same way. For example, I can walk through a Mall with my daughter and we will have two completely different experiences of that same reality, if not on a very general level, certainly on the more concrete level of particulars. The same is true with respect to an entire city: I don’t see New York quite the same way she does. One day a woman, a stranger to me, came to the door to purchase a gift certificate from my wife. My daughter knew that this woman belongs to our parish, that on Sundays she carries a Louis Vuitton purse, wears high end shoes of a particular make, high end bracelets, etc. I had no idea who the woman was. The point is we pay attention to things that interest us, and our interests orient us within this world, positioning us in a particular direction, and this direction has epistemic implications. The reason that the taxonomy of the sciences becomes increasingly complex as time goes on is that different people ask different questions, which in turn are rooted in different problems they are interested in solving. Chemistry, for example, has developed over time into various kinds: inorganic, organic, analytical chemistry, physical chemistry, and biochemistry, but chemistry also divides into various branches: Synthetic organic chemistry, Environmental chemistry, Forensic chemistry, Geochemistry, Polymer Chemistry, Theoretical, Industrial, Thermochemistry, Pharmaceutical, Nuclear and Materials chemistry. The same can be said of biology, for example, some of the less familiar parts of this particular taxonomy include Astrobiology, Biogeography, Biophysics, Histology, Chronobiology, Gerontology, Epidemiology, Paleobiology, Paleobotany, Paleopathology, Bacteriology, Mycology, Virology, Cellular neuroscience, Cognitive neuroscience, Neuropsychology, Systems biology, Nematology, Immunology, etc. And of course, physics, psychology, economics, and so many other areas of knowledge all exhibit that same taxonomic complexity, which gives us a peak at the hidden complexities of this single reality, opened up by different questions, which in turn are rooted in different interests in solving different problems.

To pose a question is “to pose”, which is to position oneself in a particular direction (as in posing for a photo). However, a particular direction marks out a very limited trajectory. To head north is not to head south, nor is it to head east or west, or north east or south west, etc., and what I discover down a particular avenue is often very different from what I would discover had I chosen another. An avenue of inquiry works much the same way. What one discovers down one avenue of inquiry is typically very different from what one discovers while on a different avenue of inquiry. What I see when I pick up a bottle of hand sanitizer is very different from and much simpler than what my friend sees, who has studied chemistry all his life; and what I see when I look around the York Regional Forest is very different and less richer than what the botanist sees—a living world of rich diversity opens up before his gaze.

The point is that the world is unimaginably complex and human knowing is profoundly limited. Once these two points are firmly grasped, the implications are important; for one, it is much more difficult to adopt an overconfident and dogmatic posture on what it is we think we “know”. Moreover, it is simply not the case that “what you see is all there is” (WYSIATI). Most people believe it is, which is why many speak with inordinate confidence on a number of issues belonging to areas of knowledge in which they have very little experience and training, such as medicine, statistics, ethics, theology, metaphysics, economics, politics, etc. It takes a lifetime to become relatively proficient in a very limited area of interest; take a few steps outside of that limited circle and we will, if we have just a modicum of honesty and humility, feel lost and bewildered; but this is rarely the case today. Hence, the absolutism that characterizes a great deal of contemporary discourse, especially the discourse of those of a fundamentalist bent (Catholic, Protestant, Muslim, and political fundamentalists, ideologues in particular). 

Fundamentalism, I would argue, tends to dispose a person to conspiratorial thinking. The logic that underpins this kind of thinking has puzzled me over the years, and it has been my impression that a conspiratorial disposition has its roots in a statistical fallacy, among many other factors. I am referring to a common inductive error, an error in conditional and probabilistic reasoning, namely, the fallacy of the transposed conditional.[1]

Inductive reasoning begins not on the level of ideas, but on an empirical level. It begins with evidence, and it seeks the reason for that evidence. The direction is the reverse of what takes place in deductive reasoning, which begins with a universal premise and a fact in evidence, and it draws out a conclusion that is implicitly contained in both premises. For example, 1) the sum of the angles of all triangles equals 180 degrees; 2) this is an isosceles, a species of triangle; 3) we deduce that the sum of its angles equals 180 degrees. Inductive reasoning, on the other hand, does not have the luxury of a universal premise; it begins rather with facts in evidence (i.e., a person lying on the floor is dead) and proceeds to explain the fact (he died of a heart attack, or suffered a brain aneurysm, or he was strangled, etc.). The reasoning is, from a deductive point of view, invalid, because it takes the following form: If p, then q; q; therefore, p . Although it is true that “if John suffered a brain aneurysm while making a sandwich, then he will be lying dead on the kitchen floor”, it does not follow that because John is lying dead on the kitchen floor, he suffered a brain aneurysm. He might very well have suffered a heart attack, or he may have been poisoned, etc. There are a number of possible hypotheses that can, theoretically speaking, account for the evidence, not all of them equally probable or plausible. That is why each hypothesis must be tested, which generally speaking amounts to gathering as much salient information as possible in order to determine how coherently and consistently each hypothesis fits with the available data. Some hypotheses will be discarded as simply inconsistent with the available data, other hypotheses will survive the test, but not all of them will enjoy maximal plausibility. And so, the conclusion of an inductive argument is at best reasonably probable (or maximally plausible). 

It is this probability factor that now takes us a bit further into probabilistic reasoning. The transposed conditional is a fallacy that typically occurs in the context of hypothesis significance testing .[2] As an example of hypothesis testing, consider that a person comes down with a cold and decides to take a new cold medicine; the result is he recovers quickly. The data now available is that there was a quick recovery, the hypothesis is that his recovery was due to the cold medicine. But such a hypothesis needs to be tested, for his recovery could have been due to any number of factors. The null hypothesis (H 0 ), in this case, is that the cold medicine was ineffective. Contrast this with the alternative hypothesis (H 1 ), which is that the cold medicine works and was the reason for his quick recovery. 

What significance testing does is it measures the probability of a piece of evidence, given a particular hypothesis. For example, given that the cold medicine is ineffective, what is the probability that the outcome for our experiment will yield results that are rare and unexpected, such as an unusually quick recovery (well into the tail of a null hypothesis distribution of the bell curve)? The probability of specific data, given the null hypothesis, can be written thus: P(D|H 0 ). Note, however, that this is not the same as the probability of the hypothesis given the observed data P(H 0 |D). The fallacy of the transposed conditional confuses the two, equating them, so to speak: P(D|H 0 ) = P(H 0 |D). 

Allow me to provide a non-conspiratorial example of this confusion. Given that there is a racoon (H) in my backyard, what is the probability that it is grey and has four legs (D)? The probability is very high (> 99%). Hence, P(D|H) = > 99%. However, it does not follow that there is a greater than 99% probability that the animal in my backyard is a racoon, given that it is grey and has four legs. P(H|D) is not necessarily > 99%. In fact, the probability is much less than 99%, for there is a much greater probability that the animal in my backyard is a squirrel, for there are over 2000 squirrels for every square kilometer in this area, compared to 20 racoons per square kilometer. 

To determine the probability of the hypothesis (whether the animal is a squirrel or a racoon) requires a different probability calculation, namely Bayes’ Theorem. This theorem measures the probability of a hypothesis, given the data P(H|D), against the probability of alternative hypotheses (H 1 , H 2 , H 3 ,…H n ). This is not what happens in significance testing, which does not test directly for the hypothesis. The formula for Bayes’ Theorem is the following:

P(H 0 |D) = P(H 0 )P(D|H 0 ) / P(H 0 )P(D|H 0 ) + P(H 1 )P(D|H 1 )

This theorem employs base rate probability P(H), which describes the percentage of a population that exhibits some characteristic (i.e., 0.01 racoons), and multiplies this by the likelihood, which is the probability of the data (D) given the null hypothesis. This number is then divided by that same product plus the base rate probability of the alternative hypothesis (0.99 squirrels), multiplied by the likelihood, which is the probability of the data given the alternative hypothesis.  

Let’s consider a scenario using imaginary data. The probability that you are going to be sick (vomit) given that you have food poisoning P(D s |H) is rather high, but it is certainly not the same as the probability that you have food poisoning given that you are sick P(H|D s ). It is almost certain that if you get food poisoning, you will be sick, but it is far from certain that you got food poisoning given that you are sick (you might have the flu that is going around). Let’s fabricate some data for the sake of illustration:

P(H 0 ) = The base rate probability that one does not have food poisoning: 0.98 (Most people do not get food poisoning when they dine out).

P(H 1 ) = The base rate probability for the hypothesis that one has food poisoning. 0.02 (very rare).

The following are the likelihoods of both the alternative and null hypothesis: 

P(D s |H 1 ) = .97 (The probability that you will be sick given that you have food poisoning)

P(D s |H 0 ) = .03 (The probability that you will be sick given that you do not have food poisoning).

Since the probability that you will be sick, given that you do not have food poisoning is so low, are we justified in concluding that the probability that you have food poisoning given that you are sick is correspondingly high? Are we justified, on the basis of the results, in rejecting the hypothesis that you do not have food poisoning (concluding that you do have food poisoning)? The significance test assumes we are so justified; however, until we test directly for the hypothesis, we cannot conclusively reject any hypothesis. Consider the following calculation using Bayes’ Theorem, which tests for the hypothesis:

P(H 0 |D s ) = P(H 0 )P(D s |H 0 ) / P(H 0 )P(D s |H 0 ) + P(H 1 )P(D s |H 1

P(H 0 |D s ) = .98 x .03/.98 x .03 + .02 x .97 = 60%

According to Bayes’ Theorem, there is a 60% probability that you do not have food poisoning, given that you were sick. There is a significant difference between 3% and 60%. Employing a null hypothesis significance test might, if we are not careful, tempt us to flat out reject the null hypothesis and conclude that one has food poisoning on the basis of a 0.03 significance level.[3] However, employing Bayes’ Theorem, which tests directly for the hypothesis, the probability that you do not have food poisoning given that you are sick is 60%. Hence, it is more probable than not that you do not have food poisoning.

How does this relate to conspiracy theories? It seems to me that conspiracy theorists calculate likelihoods on the basis of a given assumption, namely, a conspiracy. In itself, there is nothing wrong with that, for it is an essential part of the investigative process. But it seems the story ends there—I would argue that’s what distinguishes a conspiracy theorist from an investigator. Just as a significance test should not be the end of the investigative process, but the beginning, so too picking out anomalies should not mark the end of the investigative process, but the beginning. The probability of each hypothesis must be tested against all other alternative hypotheses. It seems to me that this is not what conspiracy theorists do. Essentially, the conspiracy argument (or conspiratorial thinking) amounts to an identification between P(D|H) and P(H|D). Given a conspiracy, what is the probability that we will see certain anomalies, unanswered questions, evidence consistent with a conspiracy, a frame up, etc.? Typically, a high probability. Inevitably, such anomalies are discovered. It does not, however, follow that the probability of a conspiracy is high, given these anomalies, unanswered questions, and pieces of evidence that are consistent with a conspiracy. We can also ask the following: given the status quo (the null hypothesis, i.e., there is no conspiracy), what is the probability that we will see certain unusual pieces of data? For the sake of argument, let us acknowledge a low probability. Again, it does not follow that there is a low probability for the null hypothesis (that there is no conspiracy), and thus a correspondingly high probability that there is. 

Furthermore, given a conspiracy, the probability of a motive, means, and opportunity is certain; but the probability of a conspiracy given that a group of people has motive, means, and opportunity is not at all equally high among alternative hypotheses—countless people had a motive to kill Kennedy, for example, and the means to accomplish it, as well as the opportunity. Those very same people at the same time have a reason not to attempt to murder a president, that is, an incentive not to act—i.e., the anxiety of eventually becoming an object of an investigation, the likelihood of getting caught and punished, etc.), which has to be weighed against the motive to carry it out. Mysterious deaths of supposed key witnesses who supposedly have information about the conspirators seem to fall under the umbrella of this fallacy as well. One might grant that it is reasonably probable that key witnesses with valuable information will be murdered (and made to look like suicide), given there was a conspiracy and cover up. The probability that there was a conspiracy and cover up given that key witnesses died, or committed suicide, is not necessarily high at all. One must now test the hypothesis bearing upon each death against the alternative hypothesis that there was no conspiracy and the witness’ death can be adequately explained. 

But the most important point I wish to make with respect to Bayesian reasoning is that as evidence accumulates, the posterior probabilities change. Suddenly an alternative hypothesis can jump from a very low to a rather high probability. For example, looking at three possible hypotheses and a total of 3 pieces of incriminating evidence (D), we now calculate the following:

P(H 0 |D 3 & D 2 & D 1 ) = The probability of the null hypothesis given each piece of data.

P(H 1 |D 3 & D 2 & D 1 ) = The probability of the first alternative hypothesis given each piece of data.

P(H 2 |D 3 & D 2 & D 1 ) = The probability of the second alternative hypothesis given each piece of data.

Our Bayesian calculus will now look like this: 

P(H 0 |D 3 & D 2 & D 1 ) = P(H 0 |D 2 & D 1 )P(D 3 |H 0 & D 2 & D 1 ) ÷ P(H 0 |D 2 & D 1 )P(D 3 |H 0 & D 2 & D 1 ) + P(H 1 |D 2 & D 1 )P(D 3 |H 1 & D 2 & D 1 ) + P(H 2 |D 2 & D 1 )P(D 3 |H 2 & D 2 & D 1 )

P(H 1 |D 3 & D 2 & D 1 ) = P(H 1 |D 2 & D 1 )P(D 3 |H 1 & D 2 & D 1 ) ÷ P(H 1 |D 2 & D 1 )P(D 3 |H 1 & D 2 & D 1 ) + P(H 0 |D 2 & D 1 )P(D 3 |H 0 & D 2 & D 1 ) + P(H 2 |D 2 & D 1 )P(D 3 |H 2 & D 2 & D 1 )

P(H 2 |D 3 & D 2 & D 1 ) = P(H 2 |D 2 & D 1 )P(D 3 |H 2 & D 2 & D 1 ) ÷ P(H 2 |D 2 & D 1 )P(D 3 |H 2 & D 2 & D 1 ) + P(H 1 |D 2 & D 1 )P(D 3 |H 1 & D 2 & D 1 ) + P(H 0 |D 2 & D 1 )P(D 3 |H 0 & D 2 & D 1 )

Again, employing a non-conspiratorial example to illustrate the point, the following are our priors:

given that there is an animal in my backyard that is grey, what is the probability that it is a racoon (H 0 ), against the probability that it is a squirrel (H 1 ) and the probability that it is a cat (H 2 )? The probability that it is a squirrel [P(H 1 |D 1 )] is .9928; that it is a racoon [P(H 0 |D 1 )] is .00759; the probability that it is a cat [P(H 2 |D 1 )] is .00088. Hence, there is a 99% probability that the animal is a squirrel. The posterior probabilities are barely altered after the addition of a new piece of data, namely that the animal is four legged (D 2 ). But, when a third piece of data is discovered, namely that the animal is 35 pounds, the posterior probabilities change significantly. No longer is it probable that the animal is a squirrel; it is far more probable that it is a racoon. 

P(H 0 |D 2 & D 1 ) = the probability that it is a racoon, given it is four legged and grey. New base rate = .00758

P(D 3 |H 0 & D 2 & D 1 ) = the probability that it is 35 pounds, given it is a racoon, is four legged and grey = .95

P(H 1 |D 2 & D 1 ) = the probability that it is a squirrel, given it is four legged and grey = .9916

P(D 3 |H 1 & D 2 & D 1 ) = the probability that it is 35 pounds, given it is a squirrel, four legged, and grey = .002

P(H 2 |D 2 & D 1 ) = the probability that it is a cat, given it is four legged and grey = .00088

P(D 3 |H 2 & D 2 & D 1 ) = the probability that it is 35 pounds, given it is a cat, four legged, and grey = .002

The probability that it is a squirrel has gone from 99% to 21.5% (improbable), and the probability that it is a racoon jumps from under 1% to 78% (the probability that it is a cat drops to .00019). 

What made the difference was the likelihood of the null hypothesis, against the likelihoods of the others: given the animal is a grey four-legged squirrel, what is the probability that it is 35 pounds [P(D 3 |H 1 & D 2 & D 1 )]? The answer is less than half a percent. However, given that the animal is a grey four-legged racoon, what is the probability that it is 35 pounds [P(D 3 |H 0 & D 2 & D 1 )]? I gave that likelihood a 95% probability, which is certainly reasonable (a full grown adult male racoon typically weighs 35 pounds). As more data consistent with the hypothesis of a male racoon accumulates, the posterior probability only increases.

In sum, it is very important not to stop at likelihoods [P(D|H)]. In the investigation of crimes, there are all sorts of unexplained anomalies, questions that need answers, missing documents, apparent inconsistencies, unusual connections between people, etc., that can quite easily be consistent with the hypothesis of a conspiracy or frame-up, as in JFK assassination literature. A likelihood, however, is very far from a clear demonstration of a hypothesis of conspiracy, or its component parts, such as the fabrication of documents, destruction of evidence, planting of evidence, intimidation of witnesses, etc. The hypothesis must be tested directly, given the sum total of available evidence, against alternative hypotheses, given the sum total of the same evidence:

P(H 0 |D 50 & D 49 & D 48 …D 1 )

The conclusions of inductive arguments, such as plausibility estimates made on the basis of sets containing data of varying degrees of plausibility, or probability calculations as we have looked at here, are always risky, for they are, unlike conclusions of deductive arguments, underdetermined.[4] Aware of the role that information deficiency plays in these calculations, it is possible nonetheless to achieve reasonable probability given the sum total of the best available evidence. 

Notes

[1] Pawel Kalinowski, Fiona Fidler, and Geoff Cumming (2008). “Overcoming the Inverse Probability Fallacy: A Comparison of Two Teaching Interventions”. Experimental Psychology Vol 4(4): 152-158. DOI: 10.1027/1614-2241.4.4.152. 

[2] Deirdre N. McCloskey and Stephen T. Ziliak (2009). “The Unreasonable Ineffectiveness of Fisherian “Tests” in Biology, and Especially in Medicine”. Biological Theory 4(1) 2009, 44–53. Konrad Lorenz Institute for Evolution and Cognition Research. 45-46.

[3] See Gerd Gigerenzer (2004). “Mindless statistics”. The Journal of Socio-Economics 33. 587-606. DOI: 10.1016/j.socec.2004.09.033. 594-595.

[4] What is the criterion for superior research? More detail? More words? More complexity? One element in the set of criteria for reasonable warrant is functional simplicity and economy. Often a case is simple, but a conspiratorial mind renders it three or four times more complex—which seems to be the norm. That is no criteria for “better research”. We don’t “know” what the truth is. A theory might come across as profound, well researched, etc., but it might just be nothing more than a vastly complex synthesis of sheer nonsense. Some people love the conspiracy angle, and so anything that supports it, especially with what appears to be a very complex synthesis of investigative work, is deemed to be astounding, fascinating, mind blowing, etc. We can judge a superior essay, because we already know what it means to write. We can judge a superior math performance. But how do we judge three possible theories on the origin of the universe? Or a number of theories on the assassination of Kennedy? The only way is through the parameters of cognitive systematicity: maximal consistency, maximal plausibility, coherence, overall efficiency, functional simplicity, economy and elegance, etc. It seems to me, however, that this is just what conspiracy theories lack, in particular JFK conspiracy theories; for they are not maximally consistent, and they are certainly not functionally simple and economical.

Top