Danny Lange from the General Manager at Amazon Machine Learning on Real-World Predictive Applications with Amazon Machine Learning; and

Bob Williamson from NICTA’s Machine Learning Group talking about Predictive Technologies and the Prediction of Technology

There are also a number of talks and tutorials by researchers and practitioners from Google, Microsoft, Big ML, NVIDIA, Upwork, Telefonica, and many others. For more details, please check out the conference schedule.

I’ll be part of a panel discussion on research challenges surrounding predictive APIs and applications with Poul Petersen, the Chief Infrastructure Officer at Big ML, and Misha Bilenko, the leader of the Algorithms team at Microsoft Azure Machine Learning.

I expect we will talk about challenges around managing privacy, large data sets, transparency, and interoperability of various systems, as well as some of the other issues that Beua Cronin raised last year in his post on challenges facing predictive APIs.

If you have any specific questions or topics you’d like us to discussion, please leave them in the comments below and I’ll see whether I can work them into the discussion and reort back.

To kick things off, there will be a huge Big Data Analytics Meetup tonight with over 350 registered attendees. Four of the speakers from the PAPIs conference representing and talking about the predictive API offerings from Google, Microsoft, Big ML, and Amazon.

The PAPIs conference starts tomorrow (August, 6th), and if any of the above look interesting it is still possible to register for tickets.

Hope to see you there!

]]>COLT this year was held at the Jussieu campus of the Université Pierre et Marie Curie in the 5th arrondissement of Paris. I was fortunate to be staying a short walk away at the air-conditioned Hotels Des Nations Saint Germain since the temperature was over 40ºC on the first few days. Apart from the slightly uncomfortable temperature though, this was a very hard conference to fault: the venue, talks, poster sessions, invited lectures, catering, and events were all excellent.

I’ve tried to capture some of the highlights below, as well as the parts of the full program that I saw and intend to follow up on.

The talk by Fields medalist Cédric Villani on the *Synthetic Theory of Ricci Curvature* was a thought-provoking and entertaining highlight of COLT — at least the parts I was able to comprehend. He started his talk by explaining the distinction between *analytic* and *synthetic* theories by way of the example of convexity. The analytic take on convexity is what Villani called a “local” and “effective” theory: a function is convex if its Hessian is positive semi-definite. It’s local because the Hessian is defined using neighbourhoods of points and effective because often one can compute the and test the Hessian. The synthetic definition of a convex function is the usual one where the value at the average of two points be no more than the average of the values at those points. This, while typically harder to establish than the analytic definition, has the advantage of being easily generalised to non-differentiable functions and leads more directly to useful inequalities.

The rest of his talk I found a little more difficult to follow but from what I recall, he sketched out several definitions of positive Ricci curvature. In two dimensions it is the “correction” to the distance between two orthonormal vectors relative to Euclidean space or the expansion of the median of triangles due to the space; in three or more, the rate of change of a volume element along a geodesic. From there he listed the way this concept connected a variety of ideas and bounds from information theory and optimal transport.

It seems a large portion of the talk was taken from notes that were based on lectures he gave this year at Tsinghua University and ETH Zürich.

Slightly more down to earth, but no less engaging were Daniel Spielman’s and Tim Roughgarden’s talks.

Dan gave an excellent and intuitive introduction to Laplacians, their properties, and connections to finding solutions of special types of linear equations. He then went onto discuss some impressive results he and others developed in solving these systems using “sparsification” of graphs associated with the Laplacian matrices. The resulting, almost linear-time algorithms will likely form the basis of many efficient techniques in machine learning, maximum flow problems, and PDEs. An overview of part of his talk can be found in these notes.

Tim gave a fascinating overview of how several ideas from learning theory, including no-regret learning and PAC-style analysis, have recently made their way into economics and game theory. One focus of the talk that caught my attention was his discussion of what he calls an “extension theorem” for price of anarchy results.

Roughly speaking, price of anarchy results measure how inefficient multiagent games become when agents behave selfishly relative to a optimal, centrally coordinated plan (*e.g.*, routing traffic). These results were originally stated in terms of Nash equilibria which are known to be “fragile” solution concepts. More recently, more robust analyses are possible by replacing the assumption that agents play their equilibrium strategy with a much weaker assumption that they engage in repeated play that generates no-regret outcome sequences. The striking thing about Tim’s extension theorem is that he shows how equilibrium-based price of anarchy proofs for a large class of “smooth” games can be automatically transformed into proofs for the weaker, no-regret versions.

It’s rare and exciting to see this type of “meta” theorem that applies to whole classes of existing results. To top it off, there seems to be a lot of scope and interest in developing more connections like these between economics, game theory, and machine learning.

Although I wasn’t able to attend all of the regular sessions, I did see a lot of interesting stuff that I hope to catch up on now that I’m home. I think the format for the talks this year worked really well. There were a handful of carefully picked 20 minute talks and the rest of the speakers got 5 minutes to pique the interest of the audience enough to have them come to their posters.

Of the longer talks, I really enjoyed Christos Papadimitriou’s one on his work with Santosh Vempala on *Cortical Learning via Prediction* and Sébastien Bubeck’s very well delivered blackboard talk on *The Entropic Barrier* with Ronen Eldan (arXiv preprint).

From what I’ve understood of it so far, Christos and Santosh have built upon Leslie Valiant’s neuroidal model of the brain.^{1} They show that by introducing a new operation, caled PJOIN for “predictive join”, they are able to implement pattern recognition algorithms that do not suffer the combinatorial explosion that occurs if limited to the model’s original operations (JOIN & LINK). I’m hoping to spend some time looking at this further and alongside some interesting recent work by David Balduzzi on Cortical Prediction Markets. I’ve been thinking about networks of traders with Raf recently and think these neurologically inspired takes on networks provide an interesting perspective.

Sébastien and Ronen’s work give a very natural construction of a self-concordant barrier for convex bodies. Given a compact convex body \(\mathcal{K} \subset \mathbb{R}^n\) they define \(f\) to be the log partition function for the exponential family of densities \(p_\theta\) with natural parameters \(\theta \in \mathcal{K}\) relative to the uniform distribution. The Fenchel dual, \(f^*(x) = -H(p_{\theta(x)})\) with \(\theta(x) = \nabla f^*(x)\) is then a \((1 + o(1))n\)-self-concordant barrier for \(\mathcal{K}\). Using this elegant connection between barriers, exponential families, and duals, they are able to recover near-optimal bounds for online linear optimisation problems with bandit feedback.

Raf and I have looked at convex dual interpretations and generalisations of exponental families and some similar ideas were used to get a new perspective on fast rates in online learning in our COLT paper this year with Nishant and Bob. One thing I’d like to understand better is the connection between universal barriers and what we call “entropic duals”, which I think coincide in the case of Shannon entropy. However, we show that fast rates in online prediction with expert advice can be obtained for losses satisfying a mixability condition defined in terms any Legendre function defined on convex bodies (what we call “generalised entropies”). I’d be curious to see whether there are similar implications for OLO bandit games.

Also in the “things I’d like to understand better” bucket is the connection between our generalised Aggregating Algorithm and the results Kamal, Bob, and Xinhua presented on their characterisation of Exp-concave proper losses and its relationship to mixability. From my brief discussions with them it seems that mixability and exp-concavity are effective the same condition, it’s just that the latter is a parameterisation-dependent version of the former.

It was good to see a number of other papers that looked at proper losses/scores and property elicitation, including:

*On Consistent Surrogte Risk Minimization and Property Elicitation*by Arpit Agarwal and Shivani Agarwal.*Convex Risk Minimization and Conditional Probability Estimation*by Matus Telgarsky Miro Dudík and Rob Schapire.

There were also a couple of other bandit-related papers that I plan to look more closely at.

Ohad Shamir had a nice paper *On the Complexity of Bandit Linear Optimization* that provides some new bounds on rates for bandit games. Curiously, he shows that certain innocuous modifications that have no effect in full information games (such as translation of the action space) can adversely affect guarantees in the bandit setting.

Noga Alon and co. had a follow up to some of their earlier work on graph feedback models for bandits where taking an action will reveal the rewards for neighbouring actions on a known graph. In their new paper, *Online Learning with Feedback Graphs: Beyond Bandits*, they neatly characterise three rates regimes — roughly \(\sqrt{T}\), \(T^{2/3}\), and \(T\) — in terms of whether the feedback graph is “strongly observable” (*i.e.*, neighbours of each vertex \(i\) include \(i\) or all vertices except \(i\)), “weakly observable” (*i.e.*,if all vertices have neighbours), or “unobservable” (*i.e.*, one or more vertices have no neighbours).

Finally, I’d be remiss not to mention the conference events, which easily lived up to the quality of the conference content. The COLT cocktail party on top of the Zamansky tower gave us a stunning view of of Paris, as did the one hosted by Criteo at their lab. The conference dinner was also extremely scenic, cruising up and down the Seine on a floating restaurant while being treated to some delicious French food and wine.

All in all, this was a fantastic COLT — easily one of the best I’ve been to. Congratulations and thanks to Peter, Elad, and Vianney for a wonderful job organising it.

A notable coincidence (at least for me) is that the one and only previous time I was in Paris in 2001 I was reading a copy of Valiant’s

*Circuits of the Mind*that I’d picked up in New York. Strangely, I hadn’t really seen much referencing that work in the in the interim.↩

I’m very pleased to announce that I’ll be the Research Chair for this year’s conference and will help look after the research track of the conference, which will be running for the first time this year. One of my jobs is to help get the word out the Call for Proposals that is requesting submissions for review by **March 29th, 2015**.

As described on the Call for Proposals page, the research track is looking for proposals up to 8 pages in length that discuss techniques and problems encountered by those of you that create predictive APIs and services. Topics for proposals include (but are not limited to):

- Software engineering: design patterns and best practises
- Distributed systems: scaling out services and APIs
- Machine Learning / Data Science automation
- Interoperability between services / APIs / tools etc.

This new track aims to complement the tutorials and other presentations that will focus more on describing how predictive APIs and apps are. Because of the solid mix of academics, developers, and business people I think this will be a great opportunity to get your work in front of a lot of people who use this sort of technology on a daily basis.

As an added incentive, PAPIs will take place just before KDD 2015, which will run from the 10th–13th of August. So if you are already planning on coming to KDD, consider coming a little earlier and attending PAPIs too.

Finally, it would be great if you could help me spread the word to anyone you think might be interested in submitting to or attending PAPIs. For example, by retweeting or otherwise sharing the following:

Calling for proposals! Real-world #machinelearning, #predictive #APIs & #apps: use cases, lessons learnt, research http://t.co/tJOfpYhwxp

— PAPIs.io (@papisdotio) February 26, 2015

Let me know if you plan to be in Sydney for the conference and would like to meet up. Hope to see you there!

]]>I’m not entirely sure why blogging fell by the wayside in 2014. As my news feed suggests, it’s not as though there has been a lack of things to write about in the last 16 months:

two papers at MaxEnt 2013, one on generalised exponential families and the other on their conjugate priors;

a couple of journal papers, one in PAMI on hybrid losses and the other in MLJ on a new boosting technique;

co-organising two workshops, one on divergence methods for probabilistic inference at ICML and the other on transactional ML and e-commerce at NIPS;

releasing the code and demo service for the Protocols and Structures for Inference project, as well as receiving a generous Amazon AWS in Education grant to launch the demo site on AWS;

a fantastic, month-long visit at Microsoft’s New York lab;

a two week visit to Tsinghua University as part of the Australia-China Young Scientist Exchange Program;

and last, but not least, Mindika Premachandra submitted her PhD thesis on prediction markets for review.

Amongst all that, I’ve been working with a number of collaborators on some fascinating connections via convex duality between fast rates for online learning, mirror descent, risk measures, prediction markets, and graphical models. You can grab preprints of some of this stuff on the arXiv (*Risk Dynamics in Trade Networks* and *Generalized Mixability via Entropic Duality*).

I’ve been meaning to write up some overviews of this most recent work for ages so expect some posts on risk measures and entropic duals very soon.

]]>Shortly after I joined, one of the other editors raised a question about how we are to interpret an item in the review criteria that states that reviewers should consider the “freedom of the code (lack of dependence on proprietary software)” when assessing submissions. What followed was an engaging email discussion amongst the Action Editors about the how to clarify our position.

After some discussion (summarised below), we settled on the following guideline which tries to ensure MLOSS projects are as open as possible while recognising the fact that MATLAB, although “closed”, is nonetheless widely used within the machine learning community and has an open “work-alike” in the form of GNU Octave:

Dependency on Closed Source SoftwareWe strongly encourage submissions that do not depend on closed source and proprietary software. Exceptions can be made for software that is widely used in a relevant part of the machine learning community and accessible to most active researchers; this should be clearly justified in the submission.

The most common case here is the question whether we will accept software written for Matlab. Given its wide use in the community, there is no strict reject policy for MATLAB submissions, but we strongly encourage submissions to strive for compatibility with Octave unless absolutely impossible.

There were a number of interesting arguments raised during the discussion, so I offered to write them up in this post for posterity and to solicit feedback from the machine learning community at large.

A couple of arguments were put forward in favour of a strict “no proprietary dependencies” policy.

Firstly, allowing proprietary dependencies may limit our ability to find reviewers for submissions — an already difficult job. Secondly, stricter policies have the benefit of being unambiguous, which would avoid future discussions about the acceptability of future submission.

An argument made in favour of accepting projects with proprietary dependencies was that doing so may actually increase the chances of its code being forked to produce a version with no such dependencies.

Mikio Braun explored this idea further along with some broader concerns in a blog post about the role of curation and how it potentially limits collaboration.

Some of us had concerns about what exactly constitutes a proprietary dependency and came up with a number of examples that possibly fall into a grey area.

For example, how do operating systems fit into the picture? What if the software in question only compiles on Windows or OS X? These are both widely used but proprietary. Should we ensure MLOSS projects also work on Linux?

Taking a step up the development chain, what if the code base is most easily built using proprietary development tools such as Visual Studio or XCode? What if libraries such as MATLAB’s Statistics Toolbox or Intel’s MKL library are needed for performance reasons?

Things get even more subtle when we note that certain data formats (*e.g.*, for medical imaging) are proprietary. Should such software be excluded even though the algorithms might work on other data?

These sorts of considerations suggested that a very strict policy may be difficult to enforce in practice.

It is pretty clear what position Richard Stallman or other fierce free software advocates would take on the above questions: reject all of them! It is not clear that such an extreme position would necessarily suit the goals of the MLOSS track of JMLR.

Put another way, is the focus of MLOSS the “ML” or the “OSS”? The consensus seemed to be that we want to promote open source software to benefit machine learning, not the other way around.

Towards the end of the discussion, I made the argument that if we cannot be coherent we should at least be consistent and presented some data on all the accepted MLOSS submissions. Table 1 below shows the breakdown of languages used by the 50 projects that have been accepted to the JMLR track to date. I’ll note that some projects use and/or target multiple languages and that, because I only spent half an hour surveying the projects, I may have inadvertently misrepresented some (if I’ve done so, let me know).

Language |
C++ | Java | Matlab | Octave | Python | C | R |

Count |
15 | 13 | 11 | 10 | 9 | 5 | 4 |

From this we can see that MATLAB is fairly well-represented amongst the accepted MLOSS projects. I took a closer look and found that of the 11 projects that are written in (or provide bindings for) MATLAB, all but one of them provide support for GNU Octave compatibility as well.

I think the position we’ve adopted is realistic, consistent, and suitably aspirational. We want to encourage and promote projects that strive for openness and the positive effects it enables (*e.g.*, reproducibility and reuse) but do not want to strictly rule out submissions that require a widely used, proprietary platform such as MATLAB.

Of course, a project like MLOSS is only as strong as the community it serves so we are keen to get feedback about this decision from people who use and create machine learning software so feel free to leave a comment or contact one of us by email.

**Shameless Plug**: If you are working on some open source software for machine learning, I encourage you to consider submitting your work to the JMLR MLOSS track or the upcoming NIPS 2013 Workshop on Machine Learning Open Source Software (I’m on the program committee).