Now that I have had a much needed holiday I am feeling refreshed enough to write up some of my notes on the recent ICML and COLT conferences held in Montréal this year.

I did not get to see as much of either conference as I would have liked to as I was nursing an awful cold. Fortunately, I was able to fight it off long enough to finish and give my ICML and COLT presentations. The ICML one almost didn’t happen as I had completely lost my voice the day before.

One theory paper I really liked at ICML was *PAC-Bayesian Learning of Linear Classifiers* by Pascal Germain, Alexandre Lacasse, François Laviolette and Mario Marchand. They give a general statement of the PAC-Bayes bound in terms of a convex function measuring the divergence between the true and empirical risks and, most importantly, give a remarkably simple proof using only Markov and Jensen’s inequalities.

Although I didn’t really follow the active learning aspects of *Learning from Measurements in Exponential Families* by Percy Liang, Dan Klein and Michael Jordan, I did think their introduction of *measurements* as a way of combining labels and constraints was elegant. Rather than assuming instances *X* have observed labels *Y*, they assume the labels are hidden and their values are only available implicitly through a set of aggregated measurements *M(X,Y)* over the whole data set. Various choices of the function *M* result in many existing learning problems.

Of the presentations I managed to attend at COLT, there were two that I found thought-provoking.

*Learnability and Stability in the General Learning Setting* by Shai Shalev-Shwartz, Ohad Shamir, Nathan Srebro and Karthik Sridharan examined the “general learning setting” proposed by Vapnik where, instead of trying to minimising the expected cost of a function that predicts labels for instances \(\mathbb{E}[\ell(h(X), Y)]\) we just want to find a point \(h\) in a hypothesis space \(\mathcal{H}\) that minimises \(\mathbb{E}[\ell(h, Z)]\). The relationship between hypotheses and instances in the latter case is much more arbitrary than in the usual supervised setting.

Interestingly, the usual correspondence between uniform convergence of empirical risks to the expected risk and learnability which holds for supervised learning with empirical risk minimisation (ERM) breaks down in the general setting. In particular, it turns out uniform convergence is sufficient for learning with ERM but not necessary. The authors therefore explore generalised notions of ERM and show that learnability and stability are equivalent for these more general versions.

Finally, I really liked the connections made between online learning and traditional statistical learning theory in *A Stochastic View of Optimal Regret through Minimax Duality* by Jacob Abernethy, Alekh Agarwal, Peter Bartlett and Alexander Rakhlin. In this paper the authors use the von Neumann’s minimax theorem to relate the minimax regret that is normally used in online convex optimisation (OCO) games to a supremum over a set of distributions of the expectations of a loss. Through a neat geometrical interpretation of this loss they are able to establish upper and lower bounds on the optimal regret for various online learning problems.

If you are after more highlights, I recommend having a look at John and Hal’s overviews of the conferences.

Mark Reid July 7, 2009 Canberra, Australia