It’s been about three weeks since I jotted down some notes about COLT this year while I was on the train from Paris to Lille (for ICML). I’ve finally made some time to polish them a little and put them up in continuing attempt to get back into blogging.
COLT this year was held at the Jussieu campus of the Université Pierre et Marie Curie in the 5th arrondissement of Paris. I was fortunate to be staying a short walk away at the air-conditioned Hotels Des Nations Saint Germain since the temperature was over 40ºC on the first few days. Apart from the slightly uncomfortable temperature though, this was a very hard conference to fault: the venue, talks, poster sessions, invited lectures, catering, and events were all excellent.
I’ve tried to capture some of the highlights below, as well as the parts of the full program that I saw and intend to follow up on.
The talk by Fields medalist Cédric Villani on the Synthetic Theory of Ricci Curvature was a thought-provoking and entertaining highlight of COLT — at least the parts I was able to comprehend. He started his talk by explaining the distinction between analytic and synthetic theories by way of the example of convexity. The analytic take on convexity is what Villani called a “local” and “effective” theory: a function is convex if its Hessian is positive semi-definite. It’s local because the Hessian is defined using neighbourhoods of points and effective because often one can compute the and test the Hessian. The synthetic definition of a convex function is the usual one where the value at the average of two points be no more than the average of the values at those points. This, while typically harder to establish than the analytic definition, has the advantage of being easily generalised to non-differentiable functions and leads more directly to useful inequalities.
The rest of his talk I found a little more difficult to follow but from what I recall, he sketched out several definitions of positive Ricci curvature. In two dimensions it is the “correction” to the distance between two orthonormal vectors relative to Euclidean space or the expansion of the median of triangles due to the space; in three or more, the rate of change of a volume element along a geodesic. From there he listed the way this concept connected a variety of ideas and bounds from information theory and optimal transport.
It seems a large portion of the talk was taken from notes that were based on lectures he gave this year at Tsinghua University and ETH Zürich.
Slightly more down to earth, but no less engaging were Daniel Spielman’s and Tim Roughgarden’s talks.
Dan gave an excellent and intuitive introduction to Laplacians, their properties, and connections to finding solutions of special types of linear equations. He then went onto discuss some impressive results he and others developed in solving these systems using “sparsification” of graphs associated with the Laplacian matrices. The resulting, almost linear-time algorithms will likely form the basis of many efficient techniques in machine learning, maximum flow problems, and PDEs. An overview of part of his talk can be found in these notes.
Tim gave a fascinating overview of how several ideas from learning theory, including no-regret learning and PAC-style analysis, have recently made their way into economics and game theory. One focus of the talk that caught my attention was his discussion of what he calls an “extension theorem” for price of anarchy results.
Roughly speaking, price of anarchy results measure how inefficient multiagent games become when agents behave selfishly relative to a optimal, centrally coordinated plan (e.g., routing traffic). These results were originally stated in terms of Nash equilibria which are known to be “fragile” solution concepts. More recently, more robust analyses are possible by replacing the assumption that agents play their equilibrium strategy with a much weaker assumption that they engage in repeated play that generates no-regret outcome sequences. The striking thing about Tim’s extension theorem is that he shows how equilibrium-based price of anarchy proofs for a large class of “smooth” games can be automatically transformed into proofs for the weaker, no-regret versions.
It’s rare and exciting to see this type of “meta” theorem that applies to whole classes of existing results. To top it off, there seems to be a lot of scope and interest in developing more connections like these between economics, game theory, and machine learning.
Although I wasn’t able to attend all of the regular sessions, I did see a lot of interesting stuff that I hope to catch up on now that I’m home. I think the format for the talks this year worked really well. There were a handful of carefully picked 20 minute talks and the rest of the speakers got 5 minutes to pique the interest of the audience enough to have them come to their posters.
Of the longer talks, I really enjoyed Christos Papadimitriou’s one on his work with Santosh Vempala on Cortical Learning via Prediction and Sébastien Bubeck’s very well delivered blackboard talk on The Entropic Barrier with Ronen Eldan (arXiv preprint).
From what I’ve understood of it so far, Christos and Santosh have built upon Leslie Valiant’s neuroidal model of the brain.1 They show that by introducing a new operation, caled PJOIN for “predictive join”, they are able to implement pattern recognition algorithms that do not suffer the combinatorial explosion that occurs if limited to the model’s original operations (JOIN & LINK). I’m hoping to spend some time looking at this further and alongside some interesting recent work by David Balduzzi on Cortical Prediction Markets. I’ve been thinking about networks of traders with Raf recently and think these neurologically inspired takes on networks provide an interesting perspective.
Sébastien and Ronen’s work give a very natural construction of a self-concordant barrier for convex bodies. Given a compact convex body \(\mathcal{K} \subset \mathbb{R}^n\) they define \(f\) to be the log partition function for the exponential family of densities \(p_\theta\) with natural parameters \(\theta \in \mathcal{K}\) relative to the uniform distribution. The Fenchel dual, \(f^*(x) = -H(p_{\theta(x)})\) with \(\theta(x) = \nabla f^*(x)\) is then a \((1 + o(1))n\)-self-concordant barrier for \(\mathcal{K}\). Using this elegant connection between barriers, exponential families, and duals, they are able to recover near-optimal bounds for online linear optimisation problems with bandit feedback.
Raf and I have looked at convex dual interpretations and generalisations of exponental families and some similar ideas were used to get a new perspective on fast rates in online learning in our COLT paper this year with Nishant and Bob. One thing I’d like to understand better is the connection between universal barriers and what we call “entropic duals”, which I think coincide in the case of Shannon entropy. However, we show that fast rates in online prediction with expert advice can be obtained for losses satisfying a mixability condition defined in terms any Legendre function defined on convex bodies (what we call “generalised entropies”). I’d be curious to see whether there are similar implications for OLO bandit games.
Also in the “things I’d like to understand better” bucket is the connection between our generalised Aggregating Algorithm and the results Kamal, Bob, and Xinhua presented on their characterisation of Exp-concave proper losses and its relationship to mixability. From my brief discussions with them it seems that mixability and exp-concavity are effective the same condition, it’s just that the latter is a parameterisation-dependent version of the former.
It was good to see a number of other papers that looked at proper losses/scores and property elicitation, including:
On Consistent Surrogte Risk Minimization and Property Elicitation by Arpit Agarwal and Shivani Agarwal.
Convex Risk Minimization and Conditional Probability Estimation by Matus Telgarsky Miro Dudík and Rob Schapire.
There were also a couple of other bandit-related papers that I plan to look more closely at.
Ohad Shamir had a nice paper On the Complexity of Bandit Linear Optimization that provides some new bounds on rates for bandit games. Curiously, he shows that certain innocuous modifications that have no effect in full information games (such as translation of the action space) can adversely affect guarantees in the bandit setting.
Noga Alon and co. had a follow up to some of their earlier work on graph feedback models for bandits where taking an action will reveal the rewards for neighbouring actions on a known graph. In their new paper, Online Learning with Feedback Graphs: Beyond Bandits, they neatly characterise three rates regimes — roughly \(\sqrt{T}\), \(T^{2/3}\), and \(T\) — in terms of whether the feedback graph is “strongly observable” (i.e., neighbours of each vertex \(i\) include \(i\) or all vertices except \(i\)), “weakly observable” (i.e.,if all vertices have neighbours), or “unobservable” (i.e., one or more vertices have no neighbours).
Finally, I’d be remiss not to mention the conference events, which easily lived up to the quality of the conference content. The COLT cocktail party on top of the Zamansky tower gave us a stunning view of of Paris, as did the one hosted by Criteo at their lab. The conference dinner was also extremely scenic, cruising up and down the Seine on a floating restaurant while being treated to some delicious French food and wine.
All in all, this was a fantastic COLT — easily one of the best I’ve been to. Congratulations and thanks to Peter, Elad, and Vianney for a wonderful job organising it.
A notable coincidence (at least for me) is that the one and only previous time I was in Paris in 2001 I was reading a copy of Valiant’s Circuits of the Mind that I’d picked up in New York. Strangely, I hadn’t really seen much referencing that work in the in the interim.↩