Schurz on the problem of induction
Optimality Justifications, recently published by OUP, is the culmination of a research programme that Gerhard Schurz has pursued for a little over fifteen years. At its heart is a novel argument for reasoning by induction; or, more precisely, an argument that the estimates of numerical quantities formed on the basis of enumerative induction from observed cases are justified, at least when forming estimates in such a way has a track record of success in the past. So Schurz isn’t arguing that enumerative induction is always rational: there are perhaps domains of scientific study in which it has proved unreliable in the past, and applying it in the future in such domains is not justified. But there are domains in which it has proved reliable, Schurz claims, and when dealing with such a domain, continuing to use it is justified.
Of course, such a style of argument will recall part of Hume’s sceptical argument for exactly the opposite conclusion: we cannot hope to infer the future success of induction from its past success without using induction itself and thereby begging the question at issue. Schurz agrees, but he argues that, in order to justify using a method of inference to form our judgments, we needn’t show that it is reliable. We need only show that it is in some sense optimal among the various methods of inference available to us; it’s the best we can do. That, after all, is all that rationality requires: it’s doing the best you can within the bounds of the constraints imposed on you.
At this point, we recall another famous argument concerning induction, namely, Reichenbach’s anti-sceptical argument. This says that, while we cannot be sure induction is reliable, we can know a priori that, if anything is reliable, induction is. This gives a sort of dominance argument for induction: in worlds in which there is a reliable method of inference, induction is reliable; in worlds in which there is no reliable method of inference, nothing is. And so, Reichenbach claims, we are justified in using induction.
As Schurz notes, it’s long been known that Reichenbach’s argument fails: there are methods of inference that succeed in worlds in which induction fails. And indeed the so-called no free lunch theorems of Wolpert and Macready make this observation precise and greatly strengthens it, showing roughly that, in a particular formal representation of the problem, average performance over all possibilities is the same for all inference methods.
Nonetheless, Schurz argues, it’s possible to run an a priori argument inspired by Reichenbach’s at the meta-level rather than the object-level to show that a meta-method of hewing close to inference methods that have been successful in the past is optimal. And then we can appeal to the past track record of induction to argue that the optimality of this meta-method shows that hewing close to induction is justified. It’s an ingenious argument inspired by a remarkable result in computational learning theory, proved by Nicolò Cesa-Bianchi and Gábor Lugosi in their Prediction, Learning, and Games, which Schurz has adapted and extended over the fifteen years of his research programme.
To present this result and the argument Schurz builds around it, let me introduce the formal framework in which he considers the problem of induction. We represent the part of the world about which are reasoning by an infinite string of real numbers, such as (14, 15, 16, 15, 14, 15, 16, 15, …). Perhaps this lists the average temperature in degrees for each season in a particular city; perhaps it lists the stock price of a company; perhaps it lists the number of mountain hares observed during successive months on a particular Scottish hillside. We then represent a method of inference as a function that takes an initial segment of such an infinite string and returns its prediction for the next number in that string. So our inference method might take the sequence of average seasonal temperatures observed so far and predict the next season’s on that basis.
Now, Schurz supposes, there is a set of inference methods that are cognitively accessible to you—we’ll call those the original methods. Given the sequence that represents the world and given an inference method, we can generate the sequence of predictions the method makes when successively fed the initial sequences of the world’s data, and we can then judge the accuracy of those predictions as a measure of how close the predictions lie to the actual facts.
As well as these original object-level methods, there are also meta-methods. These are functions that take not just an initial segment of the world’s data sequence but also the track records of the various object-level methods as given by their accuracy scores and returns a prediction of the next piece of data in the sequence. So a meta-method builds a new object-level method out of the observations so far and the predictions of the various object-level methods.
A natural sort of meta-method is what Schurz calls an accuracy-weighted method. Its prediction for the next data point is a weighted average of the predictions made by the cognitively available methods, where the weight given to a method’s new prediction is greater the better its track record of prediction in the past. And it is about this sort of method that x and y prove a remarkable result. Regardless of the world, regardless of the available object-level methods, and for many many (dare I say all) natural ways of measuring the accuracy of predictions, the new object-level method created by this meta-method has the following two features: (i) in the long run, the average accuracy of this method, taken over the whole sequence of predictions it makes up to a given point in time, is guaranteed to converge to the maximum average accuracy attained by the original cognitively accessible methods up to that time; and (ii) in the short run, there is a bound on how far the average accuracy of this method can lie below the maximum average accuracy attained by the cognitively accessible methods up to that point, and that bound decreases steadily to zero as time progresses.
On the basis of this result, Schurz gives what he considers a dominance argument for using an accuracy-weighted meta-method to construct your object-level method. The goal of a method, Schurz says, is its long-run predictive accuracy. The result by Cesa-Bianchi and Lugosi shows that, relative to that goal, using the object-level method created by the accuracy-weighted meta-method weakly dominates using one of the original methods. In all situations, it is as good, namely when there’s a best among the original methods, and in some cases it’s better. And for this reason, Schurz submits, it’s justified. And that means that, if induction has been the most successful predictor in the past, to predict the future it is justified to use a weighted average of the available methods that gives a lot of weight to induction, and so will hew closely to its predictions. Thus, inferring by induction is justified.
That’s the shape of Schurz’s response to Hume’s scepticism about induction. An initial concern is one that it inherits from Reichenbach’s argument. Both Reichenbach and Schurz assume that if a method, whether object-level or meta-level, dominates, then it is justified. But that might be denied. If one is required to use a method of inference and make predictions about future cases, then a method that dominates is certainly permissible and indeed required. But it’s not clear that one is required to use a method of inference and make predictions. One might simply suspend judgment. And indeed that’s what Hume would have us do.
But let’s leave that aside. Another worry stems from the claim that high long-run average accuracy is the goal of a method of inference. Now, there’s no doubt it’s a nice thing to have, but surely a more fundamental goal is high average accuracy at each point of time. And when we look at that goal, the argument fails, since a method might be as good as another when we look at long-run average accuracy, but worse in terms of average accuracy at every single point in time, and in that case surely the latter would be preferable.
So, for instance, suppose there are two methods: the first guesses 0 for every data point, regardless of what has been observed, while the second guesses 1 for every data point, regardless of what has been observed. And suppose the true data points are all 1s. Then the accuracy-weighted method will move closer and closer to the second method, but never meet it, and so its accuracy will improve over time, but at every point in time its predictions will be less accurate than the second method, which is perfectly accurate at each time. So, while the average accuracy of the accuracy-weighted method converges to the perfect accuracy of the second method, it is always strictly worse. And so surely the second method is preferred. The accuracy-weighted method does not dominate when we focus on the true primary goal of these methods.
In the previous example, we compared the accuracy-weighed method with one of the original methods in a case in which there is a best method among the original methods. But we can also consider cases in which there is no best method among the originals and in which we look not only at the comparison between the meta-method of accuracy-weighting and the original methods but between the accuracy-weighting method and other meta-methods. And in that case again we can see that there are cases in which the accuracy-weighting method is not the best. Suppose we have two original methods: the first begins on 0 and alternates between 0 and 1, giving the predictions (0, 1, 0, 1, 0, 1,…) while the second begins on 1 and alternates in the same way, giving (1, 0, 1, 0, 1, 0,…) and suppose the actual world is (0, 0, 0, 0, 0,…). Now compare the accuracy-weighting method, which will oscillate between values between 0 and 1 forever, and the meta-method that takes first from the first method (giving 1), then from the second method (giving 1 again), then from the first method, then from the second, and so on, giving the perfectly accurate prediction (0, 0, 0, 0, 0,…). Then this alternative meta-method performs better than the accuracy-weighted meta-method. Indeed, even in the long run, it will outperform that method, since it is always perfectly accurate, while the accuracy-weighted method does not converge to perfect accuracy, first moving closer to 1, then to 0, then back to 1, and so on.
So, I think the dominance argument for the accuracy-weighted meta-method doesn’t work. There are cases in which one of the original methods outperforms the accuracy-weighted meta-method with respect to the goal that matters, namely, average predictive accuracy up to a time, for each time. And there are cases in which an alternative meta-method outperforms the accuracy-weighted method with respect to both Schurz’s goal of long-run average accuracy and the pointwise alternative goal that matters.
There is a tempting alternative argument one might hope to make on the basis of the results Schurz states. You might think they show at least that the accuracy-weighted method is one that a risk-averse believer might wish to opt for. After all, the results show that there’s a limit on how much worse than any of the original methods the accuracy-weighted method can be. Picking a particular method from among the originals is a risky business: it might be catastrophically worse than some alternative. If you use the accuracy-weighted meta-method, you might be bested by another method, but there is a lower limit on the extent to which you’re so bested. So it is a risk-averse option. And perhaps, as W. K. Clifford suggested, incurring the objection of William James, perhaps it is apt to be risk-averse in belief forming and the methods we use for it. And even if it isn’t demanded in the way Clifford suggests, perhaps it at least permitted, and that would be enough to justify using the accuracy-weighted meta-method.
The problem is that our second case above refutes this. In that case, where we compare accuracy-weighted meta-method with an alternative meta-method, we see that in fact the accuracy-weighted meta-method is risky when considered among other methods of the same sort, namely, meta-methods. It can perform much worse than those.
So, in the end, I think Schurz’s justification of induction will not work, and nor will this Cliffordian fallback option.