Richard’s Substack

May 23, 2024

Not just allowed—encouraged!!!

Expand full comment

Leszek Wroński

Thank you very much, Richard. So here goes. Forgive me if it's at times a bit orthogonal to your project in the note; rather, it pertains the elimination counterexample phenomenon itself. But there's a bit in the middle that's directly relevant :-)

Anyway. When I read the original Fallis & Lewis paper, which only really talks about credal states defined on partitions*, it occurred to me that things should change dramatically if full algebras are considered. I quickly checked that indeed, if we move to the general context, the existence of elimination counterexamples (ECs) is no longer an argument against the Brier Score, since when algebras are concerned, ECs exists for all the three (kinds of) measures under discussion (after we switch to their algebra-suitable variants). I put this into my 2018 book, available in open access here: https://ruj.uj.edu.pl/server/api/core/bitstreams/31f8a450-d56f-4f4a-b226-0f770dc17f6c/content (Section 6.1.3.)**. It also occurred to me that the existence of ECs can be interpreted differently, e.g. as an argument against the particular update behavior; I showed (Section 6.1.1) that, if partitions are concerned, then the 'uniform imaging' update rule (i.e., 'split the surplus credence evenly among the possible worlds that are left') is free from ECs under the Brier Score.

I was, however, still interested about the severity of the EC phenomenon. With Michał Tomasz Godziszewski, and later also with Zalán Gyenis, we've tinkered at some general a priori results which would show some sort of inevitability of ECs, ideally, for any Bregman-divergence-based measures. We've failed (so far! haha, *sigh*). However, we've also asked ourselves the question of frequency: how pervasive is the phenomenon, really? And does the level of pervasiveness differ between the various measures? On that front we can report something we take to be interesting; here's our paper which is currently under review: https://ujchmura-my.sharepoint.com/:b:/g/personal/leszek_wronski_uj_edu_pl/EevBDlVoTmBKtvj76mPRN04BxwuPUpO-arZhHrXS5ZWelQ?e=kflq24 (it has a pretty picture and QR codes! :-))

The data we gathered using our SAGE code (the paper links to an, uh, anonymized Git repository) suggests that -- and all this is carefully defined in the paper -- no matter which measure is used, a randomly chosen instance of conditionalization runs a significant risk of leading to an EC. We assume w_1 is the actual world. If we fix the update situations as 'exclude w_n and conditionalize', then if the number of atoms is above 7, the chance that a randomly determined credence leads to an EC is above 50%. If, instead, we ask of a randomly determined credence P 'is it possible to choose a w_i (i>1) such that P conditionalized on W \ \{w_i\} is an EC?', the answer is 'yes' in close to 100% cases already if the number of atoms is 5. Now, we take this as an argument against accuracy monism (veritism), but that's just us. If you don't want to go there, we suggest that the ECs are not a problem to be dealt with, to be disposed of if we do things right (correctly update our credences, correctly measure inaccuracy...). Rather: the existence of an EC is a pervasive phenomenon and we have to learn how to conceptually live with it :-) And here, this perspective allows us to basically shrug off actual cases of ECs, e.g. from the history of science. We expect ECs to arise more often than not. So we are not moved by the fact that they actually *do* arise.

Now for a, uh, bibliometric remark. When writing my 2018 book I was working with Michał Tomasz Godziszewski, who, as far as I recall, in the Fall of 2019 reported our findings (including his investigations into the details of how the Brier Score operates on partitions) in 2019 in Munich to Branden, and during an online talk to Lewis & Fallis. I notice that the point about the existence of ECs for all three types of inaccuracy measures when algebras are concerned is made by Lewis & Fallis in their 'Accuracy, conditionalization, and probabilism', first published in 2019. However, also in the new Ergo paper these authors, joined now by Branden, write 'The Brier score is the most popular measure of inaccuracy in the literature (...) but other measures also support these results (...) [w]e consider the possibility of using a different measure in Section 5', and in that Section they only cite their 2019 paper.

*Yes, it *does* mention algebras, but in a way that's not relevant for my purposes here. (I can elaborate on my understanding of the algebra-subsection of the 2016 Fallis & Lewis paper if anyone's interested. But as I understand, Fallis' & Lewis' views on this have changed, so this is not really important, I guess.)

**The print version, available under that link, contains an infuriating typo where the spherical inaccuracy measure is defined, where I'd somehow mangled a quotation from Joyce. But the results stand :-)

Expand full comment

This is great, Leszek!! I really appreciate you taking the time to set it out. I think I probably agree with you that ECs are not too worrying, but I hadn't thought of your frequency-style argument for that, and that's very helpful. Of course, I guess they'd respond that this just makes things worse for the accuracy-firster. After all, I think their argument is supposed to tell against accuracy-first approaches in general, not just particular implementations using particular scoring rules. That's why I was keen to see whether there is a scoring rule that avoids them but gets the arguments for Probabilism and Conditionalization, etc. I think the measure I define escapes your frequency arguments because it's non-additive. I don't have a general proof yet, but it seems that it gives no cases of ECs. The key is that now (thanks partly to my limited result but much more to Michael Nielsen's more general result) we have an argument for Probabilism and Conditionalization and the rest that doesn't depend on additivity. In fact, come to think of it, we should be able to extend the old log score (the non-additive one) to the cases of non-probabilistic credences in a way that avoids ECs as well. And there we do have a proof that it works.

Expand full comment

Leszek Wroński

May 24, 2024Edited

I will have to take more time to inspect your measure. Very interesting stuff.

I agree 100% that our argument actually 'makes things worse for the accuracy-firster' -- that's why we have 'a blow to accuracy monism' in the title :-) In fact, this is what surprises me about the way F&L talk about boolean algebras in the 2016 paper. They mention them as a potential tool in the hand of someone who disputes their claims -- and they defend. But in fact, it seems, using boolean algebras allows one to strengthen F&L's position! [Actually, just a single proposition added to a partition is enough, as we show in the paper; we don't need to consider full algebras.]

"we should be able to extend the old log score (the non-additive one) to the cases of non-probabilistic credences in a way that avoids ECs as well. And there we do have a proof that it works." -- could you be a little more specific? You have a proof that *what* works? I'm asking because Zalan and me have a weird argument for superconditionalization in the context of the 3-valued symmetric logic [it's not ready for anything yet, but we will certainly go back to it]

Expand full comment

Kenny Easwaran

What does the scoring rule look like that you are working with in the three world, uniform distribution over decisions case? It's been a while since I've read any of the Lewis and Fallis things, so I thought they had been using an argument that ruled out any scoring rule other than the logarithmic one, but I suspect that this isn't the logarithmic rule you are getting here.

Expand full comment

Yes, this isn't the log rule, but it is non-additive. I think Lewis and Fallis have a proof that, under certain conditions like 0/1 Symmetry, there's no additive strictly proper scoring rule that avoids elimination counterexamples. And then Ben Levinstein has a more general proof (but unpublished). But in both cases, they assume additivity.

Expand full comment