The cost of remembering

Feb 03, 2025

Evaluating forgettings

When Ailsa moves into her new flat, she decides the bay window in the bedroom would be a perfect spot for a writing desk. So she measures the width of the space, discovers it’s 153.54cm, and heads to the local flea market to find something suitable. By the time she gets there, she’s forgotten the precise measurement, but remembers it’s between 153cm and 154cm. In some sense, her epistemic state has deteriorated—she now knows less than she did before—and yet we don’t necessarily judge it negatively. We know our memories only afford us a limited amount of storage space, and so there’s a cost to retaining information. Ailsa is using the measurement of the window space for a very specific purpose—she doesn’t really care about the width of the bay window beyond the task of choosing a desk. And so, if the precision of her earlier information is very unlikely to make a difference to the choice she makes—it will only make a difference if she finds a desk between 153cm and 154cm wide, for only then will she need to know the precise width within that range in order to know whether it will fit—and there is a non-negligible cost to retaining it, it might well be perfectly fine for her to retain only the less precise information. On the other hand, if she retained only the information that the space is greater than 100cm, or if she was almost certain that all the desks at the flea market were between 153cm and 154cm, we would judge her forgetting, and the subsequent epistemic state, more harshly—it is pretty likely that the information she has will no longer allow her to choose a desk with any confidence, and she’ll incur the cost of returning home to measure it again.

All of this is to say: we rationally evaluate instances of forgetting. We also morally evaluate them, as Rima Basu has recently observed, but my interest here is in rational evaluation. Tim Williamson says: ‘forgetting is not irrational; it is just unfortunate’ (KAIL, 219). I think that’s wrong. Remembering incurs a cost—the storage space it uses; but it also brings a benefit—the accuracy of the credences it gives and their value as guides when you’re making decisions. And where there is a trade-off between cost and benefit, we can usually say how rationality requires us to approach that trade-off.1

There are many things connected with our doxastic states—our beliefs or our credences—that we can evaluate for rationality. The states themselves, of course—‘You are irrationally confident everything will be OK!’ The processes by which they’re formed—‘OK, you got it right this time, but wishful thinking isn’t a good way to form beliefs!’ The way we observe, attend to, or prod the world in order to gain new information about it—‘Don’t read that; it’ll only mislead you!’ In each case, I contend, we should take a teleological approach to such evaluations. That is, we should take the facts about the rationality of the state or process or action to be grounded in the value of the doxastic states connected with it—perhaps just the state itself if we’re evaluating the state; the states to which gives rise, if we’re evaluating a process; and states we’ll come to have after gaining the evidence and updating on it, if we’re evaluating some inquiry.

Mother and daughter elephants near Chiang Mai, Thailand

The value of credences

There are two sorts of value a doxastic state has: pragmatic and epistemic.

Its pragmatic value—which Allan Gibbard also calls its guidance value—measures how well it guides our choices.2 If I know I will use my credences to choose between two uncertain prospects A and B, and I will only use them for that purpose, then the pragmatic value of my credences at a particular state of the world is the utility of A at that state of the world if my credences would lead me to choose A, and it’s the utility of B at that state of the world if my credences would lead me to choose B. And, more generally, if I’m uncertain which decisions I’ll face, and place some probability distribution over the various possibilities, the value of my credences at a particular state of the world is my expectation of their value relative to that probability distribution. So, suppose I’m 30% sure I’ll face a choice between A and B, and 70% sure I’ll face a choice between C and D; and suppose my credences would lead me to choose A in the first choice and C in the second; then the value of my credences at a state of the world is 30% of the value of A at that state of the world plus 70% of the value of C at that world. And so on.

There’s much debate over how to measure the epistemic value of beliefs and credences, but a pretty standard claim is that, however, we do measure them, the measure should be strictly proper, where this means that any probabilities expect themselves to be uniquely best, epistemically speaking. That is, if you have a set of probabilities as your credences, and you look at my credences, which are different, and you take the epistemic value of my credences at each possible state of the world and discount it by your probability for that state of the world and add up these your-probability-discounted epistemic values of my credences, it will be lower than if you did the same for your own credences.

It’s worth noting that the measures of the pragmatic value of credences described above are not necessarily strictly proper, but they are always weakly proper, where this means that any probabilities expect themselves to be among the best, pragmatically speaking, though they might not consider themselves to be uniquely best; they might think some others equally good, but they’ll never think others better. Roughly, the others they’ll expect to be equally good are those that’ll lead to the same choices they’d make when faced with the decisions they think they might face.

So we’ve got measures of the pragmatic value of credences and measures of their epistemic value, and if we find a suitable exchange rate between them—how much epistemic value will you trade for a particular amount of pragmatic utility?—then we can combine them to give a measure of their all-things-considered value.

The value of forgetting

As I said above, on the teleological approach, facts about the rationality of a particular way of forming credences, or a particular action that ends up changing our credences, are grounded in facts about the value of those credences. Facts about pragmatic rationality are grounded in facts about pragmatic value; facts about epistemic rationality are grounded in facts about epistemic value.

What, then, does the teleological approach say about forgetting? Consider Ailsa’s friend, Iona, who is planning a hike for the following day. She has credences over three possibilities concerning tomorrow’s weather: Rain, Sunny, Very Sunny. She begins with credences 4/9, 4/9, 1/9, respectively. Then she learns from the forecast it won't be very sunny—she lives in Scotland, after all. So she updates so that now she has 1/2, 1/2, 0, respectively. Then, as the day progresses, she forgets what she learned. In order to evaluate this episode of forgetfulness, we need to know what happens to her credences at this point. For the sake of illustration, let’s say Iona carries around with her an ur-prior, and she sets her credences at any given time to the ur-prior conditional on her evidence at that time. If that’s what happens, when she forgets the evidence that took her from 4/9, 4/9, 1/9 to 1/2, 1/2, 0, she will revert to her original credences 4/9, 4/9, 1/9 when she forgets her evidence.

Now, from the point of view of her pre-forgetting credences—1/2, 1/2, 0—she can evaluate those post-forgetting credences—4/9, 4/9, 1/9. Epistemically, since her epistemic values are measured by a strictly proper score, she’ll judge them worse than her current ones. So, if that’s all that’s in play, she’ll judge her forgetfulness irrational. But of course that’s not all that’s in play. There is some cost to retaining the pre-forgetting credences. And it’s quite possible that, once that is factored in, and weighed against the expected loss in epistemic value, her forgetfulness is rational.

And similarly from the pragmatic perspective. If her post-forgetting credences will lead her to choose differently from her pre-forgetting credences when faced with some of the decisions she might use them to make, she’ll judge those post-forgetting credences to be worse, pragmatically speaking, than her pre-forgetting credences, in expectation—in expectation, they’ll get her less utility. But again, those post-forgetting credences might compensate for this loss in expected pragmatic utility by avoiding whatever cost Iona incurs by retaining the information—the cost of remembering.

Of course, there is no suggestion here that someone consciously goes through this reasoning process prior to forgetting something. For one thing, as any of us who have seen the film version of the musical Cats know, consciously deciding to forget something is probably the act most likely to ensure you don’t forget it. Rather, the teleological framework allows us to assess whether forgetting is rational from your point of view—if you were to have control over doing it, would you be rational to choose it?

An example

We can see all of this at work in an example, which we’ll present rather abstractly, so that the machinery is most apparent.

So let’s suppose that Eilidh has credences in three possible states of the world, w1, w2, w3. Her initial credences are p1, p2, p3, respectively. We’ll write that credal state (p1, p2, p3).

Then she learns that the actual world is either w1 or w2; that is, she learns she’s not at world w3, and so she updates to (p1/(p1+p2), p2/(p1+p2), 0).

And then she forgets what she learns, and so reverts to (p1, p2, p3).

We want to evaluate this episode of forgetfulness from the point of view of her pre-forgetfulness credences—that is, we want to evaluate (p1, p2, p3) from the point of view of (p1/(p1+p2), p2/(p1+p2), 0)—and we want to compare that evaluation with the alternative of retaining the information—that is, we want to evaluate (p1/(p1+p2), p2/(p1+p2), 0) from the point of view of (p1/(p1+p2), p2/(p1+p2), 0).

First, let’s suppose Eilidh measures epistemic value using the so-called Brier score. The Brier score of (p1, p2, p3) at world wi is:

\(\mathfrak{B}((p_1,p_2,p_3),w_i) = 1 - \frac{1}{3}[1 - 2p_i + (p_1^2 + p^2_2 + p_3^2)]\)

Let’s say Eilidh’s initial credences are (4/9, 4/9, 1/9). So her pre-forgetfulness credences are (1/2, 1/2, 0) and her post-forgetful ones are again (4/9, 4/9, 1/9), just like Iona’s above. Then:

the Brier score of (1/2, 1/2, 0) from the point of view of (1/2, 1/2, 0) is 5/6, which is about 0.833.
the Brier score of (4/9, 4/9, 1/9) from the point of view of (1/2, 1/2, 0) is 67/81, which is about 0.827.

Second, suppose she’s 30% confident she’ll face the choice between uncertain prospects A and B, and 70% confident she’ll face the choice between C and D.

\(\begin{array}{r|cc} & w_1 & w_2 & w_3 \\ \hline A & 0 & 0 & 0 \\ B & 0 & 1 & -8 \end{array} \hspace{10mm} \begin{array}{r|cc} & w_1 & w_2 & w_3 \\ \hline C & 0 & 0 & 0 \\ D & 0 & -1 & 0 \end{array}\)

Then her pre-forgetfulness credences (1/2, 1/2, 0) will choose B over A and C over D. So their pragmatic value at world w3, say, is 0.3 x (-8) + 0.7 x 0 = -2.4. That’s because the utility of B at w3 is -8 and the utility of C at w3 is 0. And her post-forgetfulness credences (4/9, 4/9, 1/9) will choose A over B and C over D. So their pragmatic value at world w3, say, is 0.

So:

the expected pragmatic value of (1/2, 1/2, 0) from the point of view of (1/2, 1/2, 0) is 0.15.
the expected pragmatic value of (4/9, 4/9, 1/9) from the point of view of (1/2, 1/2, 0) is 0.

Third, suppose the cost, in units of pragmatic utility, of retaining the information rather than forgetting it is 0.4.

Fourth, suppose the exchange rate between epistemic and pragmatic value is 1:1. So we’re indifferent between a unit of pragmatic value and a unit of epistemic value.

Then:

The all-things-considered expected value of retaining the information and keeping the pre-forgetfulness credences (1/2, 1/2, 0) from the point of view of those credences (1/2, 1/2, 0) is

\(\text{Expected prag value} + \text{Expected ep value} - \text{Cost} = 0.833+0.15-0.4 = 0.583.\)

The all-things-considered expected value of forgetting the information and adopting the post-forgetfulness credences (4/9, 4/9, 1/9) from the point of view of the pre-forgetfulness credences (1/2, 1/2, 0) is

\(\text{Expected prag value} + \text{Expected ep value} - \text{Cost} = 0.827+0-0 = 0.827.\)

And so, in the end, it’s rational to forget.

The bigger picture

For those familiar with the value of information framework, which grows out of Janina Hosiasson’s work on the value of inquiry in the late 1920s, it will be clear that what I suggest here is simply that framework with the order reversed.3 Where that framework allows us to ask the price we would pay to obtain new evidence, this asks what price we’d pay to offload evidence we already have. That framework emphasises the cost of inquiry, whereas I’m emphasising the cost of retention.

Both are neat illustrations of the way the teleological approach to the epistemic and pragmatic rationality of credences can accommodate a lot of the insights that non-ideal epistemology wishes to incorporate. In particular, it can recognise the finitude of our storage capacities and other cognitive resources, and the subsequent opportunity costs that come with deploying them in one way rather than another—if you retain this information, you can’t store this other information; if you retain this evidence, you won’t have the resources to reason out the consequences of this other evidence; and so on.

Another point of contact with the value of information framework lies in the question of non-luminous evidence. Perhaps the neatest illustration is lies in Tim Williamson’s example of the unmarked clock. Suppose that, in the next room, there is a large wheel of fortune on the wall with a single pointer. Like a very fine-grained roulette wheel, the face is divided into 1,000 equally wide segments, numbered from 1 to 1,000, and the pointer lies in exactly one of them. Before you go in and look, for each n from 1 to 1,000, you have credence 1/1,000 the pointer lies in the segment numbered n. What if you go into the room and look? Well, because your eyesight isn’t perfect, if the pointer is in segment n, you’ll learn it’s in n-1, n, or n+1, but nothing stronger than that. You’ll then update on that evidence, so you now assign credence 1/3 to n-1, 1/3 to n, 1/3 to n+1, and 0 to all the rest. Now, of course, if you know all this, and you know your evidence or your posterior credences, you can infer that the pointer is actually at n, since that’s the only situation in which you’d have that evidence and those posteriors. But we assume you can’t access your evidence and your new credences—they are not luminous to you.

Now, suppose that, before you get the opportunity to look at the clock, you learn that, tomorrow, you’ll have you choose between:

(EVEN) a bet that pays out £2m if the pointer is on an even number and loses you £3m if it’s on an odd number;

(ODD) a bet that pays out £2m if the pointer is on an odd number and loses you £3m if it’s on an even number; and

(NEITHER) which pays out nothing either way and loses you nothing either way.

Then you’ll prefer not to be able to see the clock and receive the evidence that would give you. Why not? Because, if you look and it’s on an odd number, you’ll become 2/3 confident it’s on an even number and you’ll take bet (EVEN) and you’ll lose £3m; if you look it’s on an even number, you’ll become 2/3 confident it’s on an odd number and you’ll take bet (ODD) and lose £3m. But if you just keep your uniform distribution over the 1,000 possibilities, you’ll take neither bet—you’ll opt for (NEITHER)—and you’ll lose nothing.

That’s a neat illustration of a case in which you should pay not to receive evidence—indeed you’ll pay up to £3m to avoid it! And that’s of course because you think the evidence will mislead you when you use your posterior credences to face the decision you know you’ll face.

But suppose you know instead that your eyesight is extremely good: if the pointer is at n, you’ll learn it’s at n. Then the Value of Information Theorem—often attributed to I. J. Good, but in fact proved by Hosiasson thirty years earlier—says that, whatever decision you think you’ll face later, you’ll want to receive this evidence, and may even pay to do so—if you face the choice between (ODD), (EVEN), and (NEITHER), you’ll pay up to £2m to receive it!

However, things change if you think that, should you take the evidence, you’ll learn the pointer is in the segment numbered n, for whichever segment it is in, and then you’ll forget some detail, so that you end up only with the evidence that it’s at n-1, n, or n+1. Then, from the teleological point of view, you are in the same situation as if you were when we imagined you had poor eyesight. That is, poor eyesight and perfect memory puts you in the put situation as good eyesight and crummy memory. And if you know you have a poor memory that is poor in precisely this way, you’ll not wish to receive the very precise evidence, because you know that, before you choose, you’ll loose some of that evidence.

Indeed, there are cases in which the following are all true: (i) from the point of view of your prior P, you’re rationally required to receive some evidence, if it comes for free; (ii) should you receive this evidence, your possible posteriors are P1, P2, …, Pn, where you have Pi at world wi; (iii) from the point of view of each Pi, you’re rationally required to forget a little of the evidence that led you to Pi because of the cost of retaining that evidence, and if you do forget in this way you’ll end up with the post-forgetting posterior Pi* instead; (iv) your prior P prefers moving to P1, P2, …, Pn in w1, w2, …, wn to moving to P1*, P2*, …, Pn* in w1, w2, …, wn. That is, you can be rationally required by P to learn evidence, and then, from the point of view of whichever posteriors you end up with (Pi), required to lose some of that evidence and move to an alternative posterior instead (Pi*), but also rationally required by P not to go directly to the alternatives, even when we factor in the cost of retaining the evidence. Though note this can only happen with non-luminous credences. The point is that, just as in Williamson’s unmarked clock case, there’s evidence we can receive that would give us posteriors that our priors don’t trust to make certain decisions—choose between (ODD), (EVEN), and (NEITHER)—so there’s evidence we can receive that would give us posteriors that our priors don’t trust to make decisions about whether to offload information or not.

Daniel Singer and his co-authors investigate the social epistemology of forgetting, rather than its individual epistemology. Like me, they take a teleological point of view. And indeed Singer’s recent book defends a thoroughgoing teleological approach in epistemology. My review is here.

This approach to the pragmatic value of credences begins with Frank Ramsey’s Dutch Book Argument for the probabilistic constraints on rational belief—and reflects the pragmatist approach to epistemology that Ramsey inherits from C. S. Peirce—and it’s developed through work by Mark Schervish, Allan Gibbard, and Ben Levinstein.

I’ve written up some notes on this framework and its application to contemporary questions in the epistemology of inquiry here.

Matt Weiner

Feb 3

I'm not sure the unmarked clock example works as a case where you'll pay not to receive evidence, because it depends on the fact that you'll bet according to your credences. But if my credence is non-luminous, how do I bet on it?

I guess I can make sense of the idea of non-luminous revealed credences, where your actions show that you believe (or are uncertain) about something that you didn't know you believed. But that seems uncomfortable in this case, because as soon as I take the bet I slap my forehead and say "Crap! Since I just took this bet, I now realize that my credence is 2/3 that the number is odd, which means it must in fact be even!" Or something like that.

In any case, it seems like making this work for the unmarked clock go requires something that winds up not as psychologically plausible as the original unmarked clock case? There will definitely be cases where you can know that receiving information will be bad for you because you systematically misevaluate the information, like certain Sleepy Detective cases. But then those may involve weakness of the will and/or misevaluation of so-called higher-order evidence, and it's maybe less clear that we can say there's no irrationality involved?

Also thanks for the Janina Hosiasson link, very cool! Though now I've looked at her Wikipedia entry and I'm depressed.

Expand full comment

2 replies by Richard Pettigrew and others

Kenny Easwaran

“the teleological framework allows us to assess whether forgetting is rational from your point of view—if you were to have control over doing it, would you be rational to choose it?”

I would put things slightly differently. The rationality of *choosing* to do something is sometimes quite different from the rationality of *being such that* you do it.

This is related to Williamson’s statement: “forgetting is not irrational; it is just unfortunate”, this is sometimes what people say about being the sort of agent that has the ability to freely choose in Newcomb problems or games of chicken, and thus two boxes or swerves.

My view is that finite physical beings like humans don’t have the abilities that classic causal decision theorists assume, of being able to freely choose at every moment what behavior we bring about - but we do have some abilities that are incompatible with these, of being able to form habits and set up our future attention.

It would be irrational to choose to one box or to choose to swerve, but being the kind of being that one boxes or swerves is often at least partly in our control (just like being the kind of person who remembers four digit numbers or being the kind of person who remembers them to two significant figures or being the kind of person who remembers the last digit or being the kind of person who has a notepad that important numbers are written down on), and it can thus be evaluated for rationality, just as classic causal decision theorists want to evaluate the actions themselves.

1 reply by Richard Pettigrew

4 more comments...

Richard’s Substack

Discussion about this post