Bayes' Arrows

Interview by Richard Marshall.

Clark Glymour worked in the 1970s on traditional issues in the philosophy of science, especially formal accounts of the confirmation of scientific theories. In this same period he worked on philosophically interesting global properties of models of general relaivity. In the 1980s, in collaboration with John Earman, he worked on historical topics in late 19th and early 20th century psychiatry and physics, especially on the genesis and testing of the special and general theories of relativity. In the same period he became interested in the possibility of automated procedures for finding causal explanations in the social sciences. A collaboration with his students, Kevin Kelly, Richard Scheines and Peter Spirtes developed automated heuristic procedures for respecification of linear latent variable models. In the 1990s Scheines, Spirtes and Glymour had developed the causal interpretation of Bayes nets, and outlined a program of research: to find feasible search algorithms, characterize indistinguishability, and generate algorithms for prediction from interventions on partially characterized causal structures. His current research applies previous work on causal Bayes nets and formal learning theory to a variety of topics. Here he discusses different kinds of uses of probabilities in science, causality, Hume and Bayes, why thinking causality is a fiction isn't even wrong, causal Bayes nets, social sciences poor record of making inferences, free will, why Aristotle's approach to philosophy bests Plato's and why there's not enough of that approach in contemporary philosophy at the moment, Laplacian demons, why in general scientists are right to criticise contemporary philosophy on the grounds that it doesn't do anything, and the threats that Bayesians will avert. This'll wake you up...

3:AM:What made you become a philosopher?

Clark Glymour:When I was sixteen, after reading the Origin of SpeciesI decided I wanted to know everything, or at least to know what could not be known. As a freshman at the University of Montana I sat in on a one night a week adult course on the history of philosophy taught by the late Cynthia Schuster, who had been Hans Reichenbach’s doctoral student. My fate was decided. I had to hide my interest from my father, who expected me to become an attorney.

3:AM:First, looking at science generally, would you say the use of probabilities is one of the biggest changes in science over the last century or so? Could you sketch for us the landscape as it looks to you now, how it developed and how you’d characterise the explanatory virtues of probability?

CG:There are two kinds of uses of probability in science. One is that probability claims may be intrinsic to a theory, as in statistical mechanics or quantum theory; the other is the probability used in the assessment of theories, as in most of applied statistics. In the first role, probability claims are built into whatever explanation a theory provides; in the second role, they have no such function. For technical reasons, the division is not quite so sharp as I have stated it. In many forms of data assessment, the theory itself must specify a probability distribution for the data. Those specifications are usually ancillary to the “substantive” claims of a theory; for example, in the social sciences they are typically about the probability distribution of unobserved “disturbance” or “noise” variables that are themselves usually of no substantive interest. This contrasts, for example, with certain classes of theories in psychology, and of course in quantum theory, where the relations among the variables of interest are specified to be probabilistic.

It is often forgotten but should be emphasized that some of the foundational theories in scientific history were not probabilistic in either of the ways I have just described. Among many others, Newtonian dynamics and Darwin’s theory of evolution are but two examples of a-probabilistic theories and theory assessments. Probability entered theory assessment early in the 19^thcentury, I think beginning with Legendre’s (1808, I think) appendix on estimating the orbits of comets by least squares, although I believe Gauss claimed credit, as he did for much else. In the 18^thcentury probability had a role in speculative theories of human abilities, but its first intrinsic role in physical theories seem to have been in the kinetic theory of gases in the 19^thcentury. By the 20^thcentury, probability was increasingly (and now almost universally) required in data assessment.

3:AM:Does this mean that really causality is no longer scientific and that what science will look at instead is probabilities connecting distinct events and so forth? Do causality and probability come apart necessarily, or can they be unified?

CG:Phooey! Try to plan getting out of a room by computing the probability that you try to turn the doorknob conditional on the doorknob turning…versus…computing the probability that the knob will turn given that you try to turn the knob. The conditional probabilities are different. Causality makes the difference, and is why when planning to get out of a room, we use the second, and not the first, conditional probability. For planning actions and policy interventions, probability is useless without causality. Once upon a time yellowed fingers were highly correlated with lung cancer later in life. The surgeon general recommended against smoking; he did not recommend that people wear gloves to prevent yellowed fingers.

3:AM:Hume has probability as a measure of opinion, of credence or degrees of belief and put like that they’re not statements about nature itself. Bayes had probability as a norm of belief which again is not about empirical beliefs about nature. Has science then become a science of nothing– and is this ok or is it less than what we expect from science – and less than what scientists tell us we’re getting?

CG:It is not clear that Bayes took probability as a norm of belief as modern subjective Bayesians do, or merely a matter of opinion as Hume says. Bayes has only a brief comment in his essay on probabilities, which seems to suggest that a situation and observation of outcomes logically necessitate what we now call “posterior probabilities.” No contemporary Bayesian I know holds such a view. Bayes’ remark is so brief that one cannot be sure whether he thought the logical necessity was conditional on the prior probabilities he used and the data, or whether he thought the prior probabilities were themselves logically necessary, or what. Whether logically necessary relations between the facts that make sentences true are part of nature or a matter of convention is an issue, so far as I know, that Bayes did not address.

Bayesian statistics is two things: a useful technology and a bundle of mythology. A Bayesian data analyst almost never, and I mean almost never,inquires as to her degrees of belief: she makes mathematically convenient and not absurd assumptions and goes on. She tests the resilience of the outcomes she obtains by varying those assumptions—the prior probabilities, the penalties in a model score, etc.. Essentially, her “prior probabilities” are just a measure to guide through a search space of alternative possible values for parameters in a model or models. The measure is adaptive, in the sense that it alters (by Bayes Rule) as data are acquired. It is subjective, in the sense that there is no best adaptive measure for guiding search, but there are better and worse adaptive measures. Generally, the measures are nobody’s degrees of belief.

The facts we are really talking about when we talk of probabilities in science is estimates of “large” but finite sample frequencies. “Large” is of course vague. Frequentist textbooks often fudge their introductions this way. Then they go on to give a mathematical theory of “probabilities” that are estimated from finite sample frequencies. The mathematics hides the vagueness of the fundamental notion.

Probability in its mathematical form is about nothing, and so about any domain you may want it to be about. The same is true of logic, and of causality. Those three notions are the intangible bedrocks of science, and of rationality.

3:AM:Is your position an instrumentalist view of science and causation? Is causality just a useful fiction? There seems something wrong with mere instrumentalism and fictionalism just because when we talk about lead exposure effects on children or smoking and so forth we want to say the causal connection is real don’t we?

CG:Anyone who seriouslythought causation is a fiction, a social creation of some kind unlike the everyday facts of the world…such a person would be paralyzed, without reason for planning any one action rather than another. To get out of my office, shall I open the doorknob or wait for the doorknob to open? If I move my legs will I find myself at the door? If I move to an apartment with thin walls, will I hear my neighbors, and they me? I don’t care so much whether people say broccoli tastes good; it makes a bad taste in my mouth. An ad hominem: people who say causality is a fiction are not doing much thinking.

3:AM:The story of causal Bayes netsisn’t the complete story about causal relations in science but it is a big part of the story. For the uninitiated could you try and explain what this approach is – what’s a Bayes net, a graphical causal model and what’s an intervention in this context, and are interventions necessary?

CG:Causes are relations between events: one event is a cause of another, or not. In science, causes are usually regarded as general, repeatable relations among variable quantities or properties: if one variable changes values will it, in all circumstances of a specifiable kind, produce changes in other variable quantities? A trivial example: if lobsters are boiled, do their carapaces change color (not trivial for the lobsters)?

Graphical causal models represent distinct variables as points and a direct causal relation between two variables as a directed line between one point and the other. By itself, this is just a convenient picture (of a kind people often produce spontaneously when asked to depict causal relations). When appropriately combined with the theory of probability, however, it is much more: an effective guide to predicting the effects of interventions that deliberately and directly change one or more of the variables (and hence, through the effects of the variables directly intervened upon, change other variables that they influence), and a framework that allows systematic investigations of the circumstances in which causal relations can be discovered.

3:AM:When asking the question about whether there can be mental causes, why did you ask ‘Why is a brain like the planet?’ and what’s the answer to both?

CG:Damned if I remember.

3:AM:How can your approach to causation help us when entering debates about whether violence on tv causes violent behaviour, for example? Do you think philosophical debates about such issues require an approach likes yours and do you think that in principle anyway your approach can handle any causal situation?

CG:I wrote a paper about that, jointly with two of my children, Glymour, B., Glymour, C., & Glymour, M. (2008). Watching social science: The debate about the effects of exposure to televised violence on aggressive behavior. American behavioral scientist, 51(8), 1231-1259. The methods we used are in some cases, such as this one, a guard against rash and ill-founded inferences that do not take account of the possibilities of unrecorded common causes or selection bias or that are driven by strong opinions as to what the answers should be (and so only consider a handful of possible alternative explanations of the data, or use methods that favor the investigator’s opinon). In some cases, with good data, new search methods can give positive information about what causes what, and what does not, and the methods require an explicit separation of what the user assumes and what the methods infer. In the example at hand, the problem was that the data we had were not very good, and much of the important data we wanted to use would not be shared by the investigator who acquired it. Such people should not be allowed to publish.

I take “my approach” to be automated, data driven search for causal relations, without complete experimental data. Developments of methods for trying to do that are now pursued by hundreds of investigators on four continents. I had a hand in starting it, but it is scarcely my property.

There are kinds of causal circumstances we don’t really have a good handle on. One is continuous causal relations represented by differential equations or partial differential equations. An example was developed by Harold Jeffreys (who later became a famous statistician) early in the history of general relativity, and there is some more recent work in physics, but the area is undeveloped. Another is agent to agent causation as in social networks.

3:AM:You make an argument for thinking that free will – even if it is an illusion – is required for learning. Can you say why you argue for this – how do you link the conscious sense of autonomous agency in ourselves and in others to learning causal effects? (And do you say it is an illusion?)

CG:If one did not tacitly believe that one’s actions influenced immediate outcomes one would have a tough time learning what to do, because recognizing that some events are interventions is a powerful guide to discovering causal relations. Suppose for example one thought that something else, some third thing, produced both one’s actions and the outcomes by independent processes (I suppose Leibniz had a view something like that, but I doubt Leibniz often believed what he wrote.) Then there would be no point to planning or intending or deciding or exploring to learn: stuff, including one’s own actions, would on one’s view just happen; perhaps there would be something to learn about in what environments the stuff we call our actions and their consequences would occur.

If by “free will” one means the traditional Christian/Cartesian notion that the physics of my environment and bodily state just before I decide to do something does not determine what I will do, or does not determine the finest possible probability for what I will do, then I do not believe in free will. The links between biology, consciousness and our sense of autonomy is I think the hardest question in science, and I have no idea what they are.

3:AM:You make an interesting division within philosophy between Euclidean style philosophy and Socratic, finding both useful but in very different ways. Philosophers like McGinn and Giere would say that only Socratic style philosophy is of value. Can you say what you meant by the division and why, despite thinking that the Socratic theories are almost always wrong and have no interesting consequences and the Euclidean theories also are almost always wrong but have interesting consequences you still think both are valuable? I suppose I’m wondering why you don’t think Euclidean approaches are the only way to go for philosophy?

CG:By their fruits ye shall know them. Compare Plato and Aristotle, superficially. Plato made no effective contributions to how to acquire true belief. Plato had analyses and counterexamples (The Meno) and a huge metaphysical discourse; we still don’t know necessary and conditions for virtue, the subject of the Meno. Aristotle had axioms for logic, a logic that was pretty much the best anyone could do for 2300 years. He had a schema for conducting inquiry (albeit, not a terribly good one, but it wasn’t bested until the 17^thcentury). Euclid was not a contemporary of Plato or Aristotle, but he systematized the fragments of geometry then current. The result was a theory that could be systematically investigated mathematically, applied in a multitude of contexts, and that constituted a stalking horse for alternative theories that have proved better empirically. Euclid has no formal definition of “point” that plays any role in his mathematical geometry. Just imagine if instead the history of geometry consisted of analyses of necessary and sufficient conditions for something to be a point, which is McGinn’s ideal of philosophy. Or look at Newton’s Principia, axiomatic if anything is. Many of the ideas Newton put together were in the air in the early 17^thcentury. Combining them into the three laws, and relating the variables in specific conditions to observed quantities, created modern physics. McGinnwould have preferred a litany of necessary and sufficient conditions for “quantity of matter” which Newton does not define. (Of course there is an interesting question of how, given the theory, “quantity of matter” can be estimated, on whatever extra assumptions.) Or look at von Neumann’s foundations of quantum theory, the best work in philosophy of physics in the 20^thcentury (Oliver Schulte’s work on particle physics is the second best) which put to rest the disputes over what were taken to be different theories. Or look at Frank Ramsey’s brief formulation of behavioral decision theory, derivatives and developments of which still run a good deal of economics, or Hilbert and Bernay’s axiomatization of first order logic, which was a critical step towards computation theory, or David Lewis’ and others’ axiomatizations of the logic of counterfactuals, which run through contemporary philosophy and much, much else. Socratic thinking has no comparable fruits.

I don’t claim that the back and forth of Socratic philosophy is entirely useless. The canon of proposals and counterexamples is, or could be, a caution for superficial definitions and neglect of subtleties. Philosophers are good at subtleties.

McGinn and Giere and their like can botanize the world of thought into “philosophy” and “not philosophy” and corral the mostly fruitless into the former, as they wish. Universities make those sorts of separations an invitation to triviality and sometimes, outright stupidity. I think the basic motivation is pretty simple: most philosophers can’t do any mathematics, certainly not original mathematics; they are trained not to know statistics or computation. They treasure playground and rewards for the skills they have, and want to make sure the playground is well-guarded. Sometimes that reaches to ignorance as a policy, as in Tim Scanlon’s recent book, Being realistic about reasons, which insists (argues would be too complimentary) that ethics and metaethics need take no account of any other subject in the intellectual universe (save possibly logic) and the theories of that enterprise cannot be assessed or assailed from any other discipline; no new knowledge about the human condition can or should touch the metaethcists’ arguments. John Rawls opened that trail in his Theory of Justice, using the “veil of ignorance” as a conceit to pick and choose what facts of the human condition can be used in deciding the political constitution of society so as to get his conclusion, much like a prosecutor working a grand jury.

3:AM:In development psychology you’re not so interested in how children come to generate explicit causal explanations rather you’re interested in how they come to be able to predict and control their environment. Is this an example of the Euclidean approach and why do you think a historical perspectiveis helpful?

CG:Well, sort of, Alison Gopnik and I were looking for a coherent mathematical framework—Causal Bayes Nets as she put it-- within which we could describe qualitative features of children’s acquisition of causal competence. So were others, such as Patricia Cheng (for adults) with whom I also collaborated (viz, The Mind’s Arrows). It generated a lot of work in cognitive psychology and developmental psychology but I haven’t followed results in recent years.

3:AM:You’ve worked on learning algorithms for forecasting rare events like forest fires and the like. If this approach was something that a Laplacian demon had access to, would even a single one-off event be forecastable? And sticking with the Laplacian set-up, would a probabilistic approach be able to predict everything that ever happens in the universe?

CG:Same answer as before. If the universe were deterministic, and there were an intelligence that had all the initial conditions and was not limited by computational possibility, then every succeeding event would be predictable in a relativistic space-time that allows a Cauchy surface (never mind: think no causal loops). But the world is not deterministic, so we think, and prediction requires computation, and not everything is computable not even in physics. Pour-El and Richards have a little book that gives examples of uncomputable physical processes. (Pour-El, M. B., & Richards, J. I. (2017). Computability in analysis and physics (Vol. 1). Cambridge University Press.)

3:AM:Several prominent scientists, including the late Stephen Hawking, ask: if philosophical questions are so vague or general that we don’t know how to conduct experiments or systematic observations to find their answers, what does philosophy do that can be of any value? Maybe in the past it was creative and was the basis of science, but that was then: why do philosophy now? How do you answer them?

CG:The trouble with physicists who denigrate philosophy is that they read the wrong philosophers, which sad to say ismost philosophers. Had they read Peter Spirtes (CMU), or Jiji Zhang (Lingnan, Hong Kong) or Frederick Eberhardt (Cal Tech) or Oliver Schulte (Simon Fraser) or Teddy Seidenfeld (CMU) or Scott Weinstein (Penn), they might have had a different opinion. Looking back to the last century, philosophers (e.g., Bertrand Russell) made major advances in logic, created the basics of behavioral decision theory (Ramsey), co-created computational learning theory (Putnam), and created the causal interpretation of Bayes nets and the first correct search algorithms for them (Spirtes, Glymour and Scheines). (Yes, I know, some of your readers may say that Judea Pearl did this last bit, but no he didn’t. Like Simon Peter, he thrice denied that directed acyclic graphs have a causal interpretation, and only seems to have changed his mind on learning of Peter Spirtes’ discoveries. Not that Pearl did not do wonderful, crucial work. He did, and continues to.) One of my colleagues, Steve Awoody, made a central contribution to the creation of a new branch of mathematics, homotopic type theory.

The reason a handful of philosophers were able to make these contributions is relatively simple: they were well-prepared and in academic or financial circumstances that enabled them to think outside of disciplinary boxes and develop novel ideas in sufficient detail to make an impact, or in Ramsey’s case, lucky enough to have a later figure really develop the fundamental idea. It is a rare university department that allows for such thinkers.

Statistically, the physicist critics are pretty near correct. Philosophy of science is a deadletter subject filled with commentary book reports on real scientific work, banal methodological remarks (e.g.,scientists of a time don’t always think of true alternatives to the theories they do think of; scientists sometimes have to think at multiple “levels”), and “mathematical philosophy” some of which is very interesting but none or which is of practical scientific relevance. I once was interviewed for a job at UCLA. Pearl was invited to dinner with me and with some of my potential colleagues. Pearl managed to compliment me and insult the others with one question: “Why don’t the rest of you guys doanything?” In the context of your question, Pearl’s was a very good question.

Here is my answer to Pearl’s question: Demographics and history have killed philosophy of science. The Logical Empiricists, European émigrés just before and after World War II, had almost no interest in methodology, did not engage much in the developments in statistics or computation, and basically gave philosophy of science a reconstructive turn--the heritage of their neo-Kantianism. They educated two generations of American philosophers interested in science. By the 1980s computer science and statistics increasingly took over methodology, and (at least in computer science) began to address some of the issues that motivated me a generation earlier to study history and philosophy of science. After that, someone with my interests would have to be either very ambitious or foolhardy or not really smart to study philosophy rather than statistics and machine learning. Born too early, I was.

3:AM:As a take home could you say whether you think your Bayesian approach will be the key to AI being successful? Can the mind in principle be modelled, should it be and what do you see the future of AI to be in the near future? Are there philosophical issues raised by AI that you think need to be attended to that so far haven’t been considered much?

CG:In so far as we want intelligent machines to learn and act in novel circumstances and make and report new discoveries, yes, graphical causal models and algorithms for using and learning them will have an important role. (I wrote an essay about that “Angela Android among the Asteroids” (Ford, K. M., Glymour, C., & Hayes, P. (1994). Android epistemology,AAAI Press.) The main but not exclusive focus of machine learning has been on recognition and forecasting without interventions, but truly autonomous machines will need to do much more than that. I hope they are all nicer, wiser people than some of we humans are, but I am not optimistic.

The problem of “artificial intelligence” is that it irresistibly generalizes convenience and reliability and it localizes power terribly. Nations will build autonomous weapons, and every kind of weapon ever invented has been put to use. Good recognition of individuals by face and gait and locale would make autonomous war drones feasible; I expect, but do not know, they are feasible for the United States today, and are only not deployed because of political considerations of various kinds. Chinese authorities will have the power to know without intrusion every move a resident makes.

But AI may not be our biggest problem by far. There are about 180,000 base pairs in the smallpox virus, within reach of many sophisticated DNA research laboratories. The polio virus has already been synthesized, can smallpox be far behind? Or ebola transmitted by air? Androids will not be infected.

3:AM:And finally, are there five books you can recommend other than your own that will take us further into your philosophical world?

CG:

Darwin, Charles, The Origin of Species.

Pearl, J., Glymour, M., & Jewell, N. P. (2016). Causal inference in statistics: a primer. John Wiley & Sons.

Gopnik, A., Schulz, L., & Schulz, L. E. (Eds.). (2007). Causal learning: Psychology, philosophy, and computation. Oxford University Press.

Russell, Bertrand, Wisdom of the West(1959.) Crescent Books.

Pinker, Stephen, (2012), The Better Angels of Our Nature. Penguin Books.

ABOUT THE INTERVIEWER

Richard Marshall is still biding his time.

Buy his new book here or his first book here to keep him biding!

End TimesSeries: the first 302