Probability Estimation: Bayes Risk

In my previous post on probability estimation, I introduced the notion of a proper loss. This is a way of assigning penalties to probability estimates so that the average loss is minimised by guessing the true conditional probability of a positive label for each example. This minimal possible risk is called the (conditional) Bayes risk and in this post I will highlight some of its properties.

To recap briefly, we denote the loss of predicting the probability \(p\) when the label \(y\) (1 for positive, 0 for negative) as \(\ell(y, p)\). Then the conditional risk for \(\ell\) of guessing \(p\) when \(y\) has probability \(\eta\) of being positive is \[ L(\eta,p) = (1-\eta)\,\ell(0,p) + \eta\,\ell(1,p). \]

Point-wise Bayes Risk

The best possible estimate under this loss in terms of minimising the risk at when the probability of a positive label is \(\eta\) is the (point-wise) Bayes risk at \(\eta\), which I will denote as \[ L^*(\eta) = \min_{p \in [0,1]} L(\eta, p). \]

As argued in the previous post, a sensible loss is one that is Fisher consistent, that is, one with a risk that is minimised when \(p=\eta\). Such a loss is called proper and its risk and Bayes risk are closely related. Specifically, \(L^*(\eta) = L(\eta,\eta)\).

This relationship makes it trivial to compute the point-wise Bayes risk for any proper loss. For example, square loss is defined to be \(\ell_{\text{sq}}(y,p) = y\,(1-p)^2 + (1-y)\,p^2\) and so its point-wise Bayes risk is \[ L^*_{\text{sq}}(\eta) = L_{\text{sq}}(\eta,\eta) = \eta(1-\eta)^2 + (1-\eta)\eta^2 = \eta(1-\eta). \]

Log loss is \(\ell_{\text{log}}(y,p) = -y\log(p) - (1-y)\log(1-p)\) and so its Bayes risk is \[ L^*_{\text{log}}(\eta) = -\eta\log(\eta) - (1-\eta)\log(1-\eta). \]

Bayes Risk Functions are Concave

One useful property of point-wise Bayes risk functions for proper losses is that they are necessarily concave. That is, a line joining any two points on the graph of \(L^*\) lies entirely below \(L^*\).

The quickest way to establish this is via a well-known result regarding concave functions is that the point-wise minimum of a set of concave functions is concave.1 Then, for note that for any fixed \(p\in[0,1]\) the function \(L(\eta,p)\) is linear in \(\eta\) since the terms \(\ell(1,p)\) and \(\ell(0,p)\) are constant. Since linear functions are concave and, by definition, \(L^*\) is their point-wise minimum we see that \(L^*\) must also be concave.

Concave functions have many useful properties that have implications for the study of point-wise risks. Firstly, they are necessarily continuous, and secondly, if they are twice differentiable, their second derivatives are non-positive. That is, for all \(\eta\), \[ (L^*)''(\eta) \leq 0 \]

which also implies that their first derivatives are monotonically decreasing.2

As we will see in the next post, the converse of this holds too. That is, each concave function on \([0,1]\) can be interpreted as the point-wise Bayes risk for some proper loss.

  1. See, for example, §3.2.3 of Boyd & Vandenberghe’s freely available book Convex Optimization.

  2. You can easily check this is the case for the log- and square-losses.

Mark Reid March 5, 2009 Canberra, Australia
Subscribe: Atom Feed