Probability Estimation: Bayes Risk

In my previous post on probability estimation, I introduced the notion of a proper loss. This is a way of assigning penalties to probability estimates so that the average loss is minimised by guessing the true conditional probability of a positive label for each example. This minimal possible risk is called the (conditional) Bayes risk and in this post I will highlight some of its properties.

To recap briefly, we denote the loss of predicting the probability equation when the label equation (1 for positive, 0 for negative) as equation. Then the conditional risk for equation of guessing equation when equation has probability equation of being positive is

equation
L(\eta,p) = (1-\eta)\,\ell(0,p) + \eta\,\ell(1,p).

Point-wise Bayes Risk

The best possible estimate under this loss in terms of minimising the risk at when the probability of a positive label is equation is the (point-wise) Bayes risk at equation, which I will denote as

equation
\displaystyle L^*(\eta) = \min_{p \in [0,1]} L(\eta, p)

As argued in the previous post, a sensible loss is one that is Fisher consistent, that is, one with a risk that is minimised when equation. Such a loss is called proper and its risk and Bayes risk are closely related. Specifically, equation.

This relationship makes it trivial to compute the point-wise Bayes risk for any proper loss. For example, square loss is defined to be equation and so its point-wise Bayes risk is

equation
L^*_{\text{sq}}(\eta) = L_{\text{sq}}(\eta,\eta) = \eta(1-\eta)^2 + (1-\eta)\eta^2 = \eta(1-\eta).

Log loss is equation and so its Bayes risk is

equation
L^*_{\text{log}}(\eta) = -\eta\log(\eta) - (1-\eta)\log(1-\eta).

Bayes Risk Functions are Concave

One useful property of point-wise Bayes risk functions for proper losses is that they are necessarily concave. That is, a line joining any two points on the graph of equation lies entirely below equation.

The quickest way to establish this is via a well-known result regarding concave functions is that the point-wise minimum of a set of concave functions is concave.1 Then, for note that for any fixed equation the function equation is linear in equation since the terms equation and equation are constant. Since linear functions are concave and, by definition, equation is their point-wise minimum we see that equation must also be concave.

Concave functions have many useful properties that have implications for the study of point-wise risks. Firstly, they are necessarily continuous, and secondly, if they are twice differentiable, their second derivatives are non-positive. That is, for all equation,

equation
(L^*)''(\eta) \leq 0

which also implies that their first derivatives are monotonically decreasing.2

As we will see in the next post, the converse of this holds too. That is, each concave function on equation can be interpreted as the point-wise Bayes risk for some proper loss.


  1. See, for example, ยง3.2.3 of Boyd & Vandenberghe’s freely available book Convex Optimization.

  2. You can easily check this is the case for the log- and square-losses.

Mark Reid 05 March 2009 Canberra, Australia