Bayesianism when learning isn't straightforward

Dec 20, 2025

Occasionally, you come across a paper you like so much, you feel the need to evangelise about it. So it is with ‘Bayes is Back’, a recent paper by Alex Meehan and Snow Zhang in Philosophical Review that I’ve been studying as I write a new book on the value of information and the epistemology of inquiry. And what better venue to evangelise than a blogpost. So, in the coming sections, I want to spell out the bare bones of their argument. This is no substitute for reading the paper, which is extremely rich and rewarding, but I hope it might spur some to look into it further.

Even among enthusiasts for the Bayesian approach to epistemology, it might not have been obvious that the theory was on the ropes. Yet it did indeed face a significant threat. Not a threat to the overall approach, I should say, but certainly a concern about some of the central details. But now it’s back, with those concerns hopefully assuaged by the intriguing argument of Meehan and Zhang’s paper.

I moved out of my old flat last week, so the photos for this post are a tribute to Bluebell, the wonderful cat who prowled the corridors of that building.

What is Bayesianism?

Let me start with a brief overview of the theory whose fortunes this paper aims to revive. It consists of three claims: one concerning the representation of opinion; the other two concerning the norms that govern opinion so represented.

The first claim says that we can represent an individual’s opinions by a credence function, which takes each proposition about which they have an opinion and returns a number between 0 and 1, which measures the strength of their conviction in it. More formally, we say there is a set Ω of possible states of the world, or ways the world might be. We then say that our individual assigns credences to propositions represents as sets of these possible worlds. So say F is the set of all propositions so represented. Then we represent our individual by a credence function p : F → [0, 1], which takes a proposition X in F, and returns a number p(X) at least 0 and at most 1 that gives the individual’s credence in X.

The second claim says that an individual’s credence function should be a probability function, that is, it should satisfy the probability axioms: it should assign credence 1 to any necessary truth, credence 0 to any necessary falsehood, and the credences it assigns to two propositions that are inconsistent with one another should sum to give the credence it assigns to their disjunction (this is Probabilism). More formally, (i) p(Ω) = 1, (ii) p(∅) = 0, and (iii) if there is no world in Ω at which X and Y are both true, then p(X or Y) = p(X) + p(Y).

The third says that an individual’s current credence function should be the result of conditioning their credence function at an earlier time on their total evidence up to the present—that is, if E is the proposition that gives their current total evidence, and X is a proposition, then their current credence in X should be the proportion of the credence they once gave to E that they also at that time gave to X, providing they gave some positive credence to E (this is Conditionalization). More formally, if p is their current credence function, p’ is their earlier credence function, and p’(E) > 0, then p(X) = p’(X|E) = p(X & E)/p(E).

Bayesianism on the ropes

Probabilism and Conditionalization are both normative claims. What justifies them? There are two central arguments for each: betting arguments and accuracy arguments. I’ll set the betting arguments to one side and focus on the accuracy arguments, since that’s what Meehan and Zhang do—we can in fact extend their reasoning to give an argument along the lines of the betting arguments, but that would take us too far afield.

The accuracy arguments for Probabilism and Conditionalization

The accuracy arguments begin with the assumption that, just as we prefer a true belief to a false belief, we prefer more accurate credences to less accurate ones, where a credence in a true proposition is more accurate the higher it is and a credence in a false proposition is more accurate the lower it is. But of course this assumes we have a measure of the accuracy of credences; this is a function A that takes a credence function p and a state of the world ω and returns a number A(p, ω) that measures the accuracy of p at ω. The accuracy arguments don’t usually assume there is just a single correct measure of the accuracy of credences. Instead, they lay down two conditions that a function must have if it is to measure the accuracy of credences.

First, it must be continuous. That is, for each ω in Ω, if the sequence of credence functions p1, p2, … converges on p, then A(p1, ω), A(p2, ω), … converges on A(p, ω).

Second, it must be strictly proper, where that means that, for any probabilistic credence function, it expects itself to be the most accurate credence function; for any alternative credence function, it expects that to be less accurate than its expects itself to be. That is, for any probabilistic p and any q ≠ p,

\(\mathbb{E}_p[A(p)] > \mathbb{E}_p[A(q)]\)

I won’t go into the justifications for these assumptions; let’s just grant them.

The Accuracy Dominance Argument for Probabilism is then based on the following mathematical result:1

(I) If your credence function is not probabilistic, there is a probabilistic alternative that is guaranteed to be more accurate than your credence function. That is, if p is not probabilistic, there is q that is probabilistic such that, for all ω in Ω,

\(A(q, \omega) > A(p, \omega)\)

(II) If your credence function is probabilistic, there is no such alternative. Indeed, if p is probabilistic, and q ≠ p, then there is ω in Ω such that

\(A(p, \omega) > A(q, \omega)\)

So, if you fail to satisfy Probabilism, you leave free accuracy on the table.

The Expected Accuracy Argument for Conditionalization, on the other hand, runs as follows:2 Suppose you are about to receive some evidence, but you don’t know what it is. We can represent your uncertainty about what evidence you’ll receive by an evidence function E: this takes each possible state of the world ω and tells you the strongest proposition E(ω) you’ll learn from E at ω. For instance, perhaps you don’t know whether the handkerchief in my pocket is red or blue, nor whether it is round or square. I am about to reveal a little corner of it, which will teach you its colour, but not its shape. Then the evidence function through which you’ll acquire evidence takes the world at which it’s a red square to the proposition that it’s red, and it takes the world at which it’s a red circle to the same proposition; and it takes each world at which it’s blue to the proposition that it’s blue. So you don’t know exactly what you’ll learn, because you don’t know what state of the world you inhabit; but, for any state of the world, you know what you’ll learn at that state.

Now, let me enumerate a couple of important properties an evidence function might have:

E is factive if, for each possible state of the world ω, the proposition E(ω) that you learn at ω is true at ω. The evidence function that represents me revealing a corner of the handkerchief is factive—if the handkerchief is red at ω, then E(ω) = Red, and if it’s blue, then E(ω) = Blue.
E is partitional if the set of propositions it might teach you forms a partition—that is {E(ω) | ω in Ω} is a mutually exclusive and exhaustive set of propositions. The evidence function that represents me revealing a corner of the handkerchief is partitional—after all, {E(ω) | ω in Ω} = {Red, Blue}, which is a partition.

Now let an updating rule R be a function that takes each possible state of the world ω and returns a credence function R(ω). Then, given an evidence function E, we say an updating rule R is available for E if, whenever E(ω) = E(ω’), R(ω) = R(ω’). That is, R doesn’t update differently at states of the world at which you receive the same evidence. So, for instance, since you learn the same proposition at the world at which the handkerchief is red and round as you learn at the world at which it’s red and square, an available updating rule must take both of those worlds to the same credence function.

Now, we take the accuracy of an updating rule R at a world ω to be the accuracy, at that world, of the credence function R(ω) it maps that world to. That is, A(R, ω) = A(R(ω), ω). Finally, given a prior credence function p, we say that R is a conditionalizing rule for p and E if, whenever p(E(ω)) > 0, R(ω)(-) = p(- | E(ω)).

Then the Expected Accuracy Argument for Conditionalization turns on the following mathematical fact:

If E is factive and partitional, p is a prior credence function, R and R’ are available updating rules for E, and R is a conditionalizing rule for p and E, then

\(\mathbb{E}_p[A(R)] \geq \mathbb{E}_p[A(R')]\)

And, what’s more, if there is ω with p(ω) > 0 and R(ω)(-) = p(- | E(ω)) ≠ R’(ω)(-), then

\(\mathbb{E}_p[A(R)] > \mathbb{E}_p[A(R')]\)

So a conditionalizing rule is at least as accurate as any other rule in expectation, and it’s more accurate in expectation if it disagrees with that rule at any world that receives positive credence.

Objections to the Accuracy Argument for Conditionalization

Now, you might naturally object that this is not an argument for updating as Conditionalization requires; instead, it shows at most that, if you plan how you’ll update in a way that can be represented by an available updating rule, or if have a disposition to update in a particular way that can be represented by an available updating rule, then that updating rule should be a conditionalizing one for your prior and the evidence function through which you know you’ll receive evidence. I’m inclined to agree with this concern, but let’s put that worry to one side, for there is another that will occupy us here.

The worry is that this argument only works if the evidence function through which you’ll acquire evidence is factive and partitional. Now, you might think that evidence must be factive; it’s not evidence if it’s not true. But even if we grant that, what reason have we to think our evidence must be partitional? Indeed, there are prominent externalist accounts on which it is not necessary partitional.

Suppose I have a different handkerchief in another pocket. You know this one is pink, but you don’t know which shade. In particular, you know it’s coral, amaranth, rose, or fandango, but not which. Currently, you spread your credences equally over all four possibilities. I’m about to show it to you, but unfortunately your powers of discernment are not perfect. If the handkerchief is coral or rose, you’ll learn it’s coral, rose, or amaranth, and nothing stronger; if it’s amaranth or fandango, you’ll learn it’s rose, amaranth, or fandango, and nothing stronger. So the evidence function is factive, but it’s not partitional: the proposition you learn if it’s coral or rose is different from the proposition you learn if it’s amaranth or fandango, but they are not mutually exclusive—both are true at the worlds at which the handkerchief is rose or amaranth. And so the theorem from above doesn’t tell us we should plan to conditionalize if faced with this evidence function.

Indeed a theorem due to Miriam Schoenfield tells us we shouldn’t plan to conditionalize. She shows that the available updating plans that maximize expected accuracy in these cases are not those that require you to condition on your evidence, but those that require you to condition on the fact you received the evidence you did. In the case of the handkerchief, if it is coral or rose, and you learn that it’s coral, rose, or amaranth, then you should update on the fact you learned it was coral, rose, or amaranth, and since you learn that only at the worlds at which it is coral or rose, you should update on the fact it’s coral or rose. So, Schoenfield’s theorem suggests, Bayesianism is false—you should not update as Conditionalization says you should. Bayes is on the ropes.

Here’s Schoenfield’s theorem in more formal detail. Given an evidence function E, define the evidence function E* as follows: given a world ω, E*(ω) is the proposition that you learned E(ω) via E; that is, it is the proposition that is true at all worlds at which your evidence from E is the same as it is at world ω; that is, E*(ω) = {ω’ in Ω | E(ω’) = E(ω)}.

Given an evidence function E, a prior p, and an updating plan R, we say R is a Schoenfield plan for p and E if, whenever p(E*(ω)) > 0, R(ω)(-) = p(- | E*(ω)); that is, whenever R is a conditionalizing plan for E*.

Then Schoenfield’s theorem says:3

If p is a prior credence function, R and R’ are available updating rules for E, and R is a Schoenfield rule for p and E, then

\(\mathbb{E}_p[A(R)] \geq \mathbb{E}_p[A(R')]\)

And, what’s more, if there is ω with p(ω) > 0 and R(ω)(-) = p(- | E*(ω)) ≠ R’(ω)(-), then

\(\mathbb{E}_p[A(R)] >\mathbb{E}_p[A(R')]\)

Now you might respond: Nice work, if you can get it, but Schoenfield updating plans are simply not available to us! After all, you might contend, in order to carry out a Schoenfield plan, we must bring the proposition that we learned together with the evidence function through which we learned it to identify the proposition that is true at exactly those worlds at which we’d learn that evidence through that evidence function. But, we may well not know what our evidence is! Indeed, in the handkerchief case, that plausibly happens. In the coral world, you learn that the handkerchief is coral, rose, or amaranth; but if it’s amaranth, you learn it’s rose, amaranth, or fandango. That is, your evidence is not positively introspectible: it leaves open a world in which the evidence you get is different from the evidence you actually got.

This is all true, and yet with Dan Greco and Dmitri Gallow as notable exceptions, proponents of this view do tend to say that you are able to conditionalize on whatever evidence you receive, even when you receive it through a non-partitional evidence function. Indeed, they think that’s what you should do; that’s what they propose as an alternative to Schoenfield updating. But if we have some mechanism that can take in a proposition and update your credences by conditionalizing on it, why can’t we have a mechanism that takes in that proposition and combines it with the evidence function to give the proposition true at exactly those worlds at which the evidence function teaches you the proposition? If you are aware enough of what you learned to conditionalize on it, why are you not aware enough to Schoenfield conditionalize on it?4

Bayes is Back

I’ll leave this debate here, but let’s suppose that Schoenfield updating is really not available to us, though Bayes updating is. Is there any good justification for Bayesian updating over other rules when your evidence function is not factive or not partitional? It is this question that Alex Meehan and Snow Zhang seek to answer positively in ‘Bayes is Back’. So let me now present their intriguing argument.

The first crucial move is to say that your response to new evidence should be sensitive only to the proposition you learn and not to the inquiry through which you learn it. So, if there are two different evidence functions, and two states of the world, and I learn the same proposition from the first evidence function at the first state of the world as I learn from the second evidence function at the second state of the world, I should update in the same way. To represent this condition, Meehan and Zhang generalise the notion of an updating rule. Where an updating rule takes only a state of the world and returns a posterior credence function, a global updating rule f takes an evidence function E and a state of the world ω and returns a posterior credence function f(ω, E)(-). They then impose a condition they call global evidential constancy: for any two evidence functions and any two states of the world, if the first evidence function teaches the same proposition at the first state of the world as the second teaches at the second state of the world, your global updating rule should take the first evidence function and state to the same posterior to which it takes the second evidence function and state. That is, if f is a global updating rule, E and E’ are evidence functions, and ω and ω’ are states of the world, Meehan and Zhang demand that, whenever E(ω) = E’(ω’), f(ω, E) = f(ω’, E’).

They then argue that you should have a global updating plan that maximizes total expected accuracy across all possible factive evidence functions. That is, we judge a global updating rule as follows: take each possible factive evidence function, take your prior’s expectation of the accuracy of the posterior your global updating rule recommends when faced with that evidence function; and sum these all up. Then rationality requires you to have a global updating rule that maximizes this total expected accuracy. They then prove that, at least when your prior is regular and so assigns positive credence to each possible state of the world, the global updating rule that maximizes total expected accuracy is the conditionalizing rule for your prior, that is, the one that takes an evidence function and a state of the world and updates your prior on whatever the evidence function teaches you at that state of the world by conditionalizing on it.

Given a regular prior p, the unique global updating rule that is conditionalizing for p is f(ω, E)(-) = p(- | E(ω)). The accuracy of a global updating rule f in the situation in which you’re at world ω and facing evidence function E is A(f, ω & E) = A(f(ω, E)(-), ω).

Given a prior p, a global updating rule f, the total expected accuracy of f is

\(\mathrm{Total}(p,f) = \sum_{E \in \mathbf{Fact}} \sum_{\omega \in \Omega}p(\omega)A(f(\omega, E)(-), \omega)\)

where Fact is the set of all factive evidence functions, that is, Fact = {E | for all ω in Ω, ω is in E(ω)}.

Then we can state Meehan and Zhang’s result as follows:5

If p is a regular prior, f and f’ are global updating rules that satisfy global evidential constancy, and f is the conditionalizing global updating rule for p, then

\(\mathrm{Total}(p,f) \geq \mathrm{Total}(p,f')\)

And, if there is a factive evidence function E in Fact and a world ω such that f(ω, E)(-) ≠ f’(ω, E)(-), then

\(\mathrm{Total}(p,f) > \mathrm{Total}(p,f')\)

We might see this argument as a sort of rule-consequentialist argument for Bayesian conditionalization. In many cases, you would do better by not updating in the way Bayes’ rule tells you to update; indeed, in many cases, you’d do better by sticking with your priors. But if you must use a rule, and if the rule can only be sensitive to the proposition you learn and not the evidence function through which you learn it, Bayes’ rule is in fact the best option in the sense that, on average or in aggregate, it has the highest accuracy among all rules that are sensitive only to what you learn and not to how you learn it.

It’s worth noting that we cannot hope to strengthen Meehan and Zhang’s result by dropping the requirement of factivity. For instance, if we consider only the case in which there are two possible states of the world, and we have prior credence 1/2 in each, and we measure epistemic utility using the Brier score, and we sum over all evidence functions, and not only over the factive ones, then the global updating rule that tells you to stick with your prior credences has better total expected epistemic utility than the conditionalizing rule.6 In some ways, this is not surprising: if a proposition is true and you start with credence 1/2 in it, you gain less accuracy by moving to the perfectly accurate credence 1 than you lose by moving to the perfectly inaccurate credence 0; and similarly if a proposition is false—moving from 1/2 to 0 gains you less accuracy than moving from 1/2 loses you. And so, when we include evidence functions that teach you falsehoods, the benefits of responding to much true evidence are outweighed, in aggregate, by the costs of responding to much false evidence. This is a limitation of Meehan and Zhang’s approach, but for many it won’t be much of a limitation, since they will say that evidence must be factive—you can only learn what’s true.

It’s also worth noting that Meehan and Zhang’s result relies on you thinking each evidence function is equally likely. Clearly, if you think it sufficiently likely you’ll face an evidence function that, in expectation, is worse for you than learning nothing, then the updating plan of sticking with your priors will be better, in expectation, than the plan to update by conditionalizing. But the result is nonetheless powerful: it shows that the negative expected effects of being exposed to certain evidence functions and conditionalizing on them are counterbalanced by the positive expected effects of doing this when exposed to other evidence functions.

In the end, then, the debate over how to update reduces to a debate about what updating mechanisms are available to us. Can we implement an updating mechanism that is sensitive to both the proposition we learn and the evidence function through which it we learn it? If so, we have Schoenfield’s argument for her updating rule, but then in that situation it’s not unreasonable to say that the evidence we actually receive is not the proposition we learn, but the fact that we learn that proposition, and then Schoenfield’s argument simply tells us to update on that evidence as Conditionalization tells us to. Or are we restricted to a mechanism that is sensitive only to the proposition we learn? If so, we have Meehan and Zhang’s argument for Conditionalization. Either way, Bayes is back!

The form of argument is due to Jim Joyce’s 1998 ‘A Nonpragmatic Vindication of Probabilism’, though Joyce appeals to different mathematical properties of accuracy measures.

The form of argument is due to Hilary Greaves and David Wallace’s 2006 ‘Justifying Conditionalization’.

I’ve written up a quick proof of Schoenfield’s result and also Meehan and Zhang’s in these notes.

Greco answers this as follows: in these cases, our mental life is fragmented in a particular way; the part of the mind that takes on the evidence is isolated from the part that knows the evidence function. We’re used to that sort of fragmentation in other cases, such as when I know I’ve got a dental appointment on Tuesday, and I know I’ve got a student meeting at the same time, but because I haven’t thought about both together, I haven’t realised that there’s a clash.

Here is the proof.

The Brier score measures the accuracy of a credence function p at ω as follows:

\(\mathrm{Brier}(p, \omega) = 1 - \frac{1}{|F|}\sum_{X \in F} |p(X) - \omega(X)|^2\)

where ω(X) = 1 if ω is in X, and ω(X) = 0 if ω is not in X.

Kenny Easwaran

Dec 20

This is a great paper and I’m glad you’re talking about it. I can’t believe that I forgot to include some discussion of it in this paper that Michael Nielsen and I wrote on the topic of non-partitional updating: https://link.springer.com/article/10.1007/s10992-025-09814-6

Their result works by summing the inaccuracy of an update rule in all possible updating situations where you use it, but I think it is basically equivalent to assuming you are equally likely to be using it in any of these situations and then taking the expected accuracy. In their framework, one always gets a proposition as one’s evidence, and the proposition is true in every world where one gets it as evidence.

In our framework, we don’t insist that what one updates on is a proposition, and just call it a “signal”. We show that out of all functions from signals to updates, the one with highest expected accuracy is the one that shifts each world’s probability in proportion to the likelihood of the received signal in that world. I think we can reconstruct a version of their picture in ours if we think each signal must be a proposition that is true in the actual world, but that which proposition is received as the signal is chosen uniformly at random from the set of all propositions true at the world.

1 reply by Richard Pettigrew

Richard’s Substack

Discussion about this post

Ready for more?