<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 
 <title>Inductio Ex Machina</title>
 <link href="http://mark.reid.name/iem/atom.xml" rel="self"/>
 <link href="http://mark.reid.name/iem/"/>
 <updated>2012-05-01T15:55:33+10:00</updated>
 <id>http://mark.reid.name/iem/</id>
 <author>
   <name>Mark Reid</name>
   <email>mark@reid.name</email>
 </author>
 
 
 <entry>
   <title>Fisher Information and the Hessian of Log Likelihood</title>
   <link href="http://mark.reid.name/iem/fisher-information-and-log-likelihood.html"/>
   <updated>2012-04-04T00:00:00+10:00</updated>
   <id>id:/iem/fisher-information-and-log-likelihood</id>
   <content type="html">&lt;p&gt;I&amp;#8217;ve been taking some tentative steps into &lt;a href='http://cscs.umich.edu/~crshalizi/notabene/info-geo.html'&gt;information geometry&lt;/a&gt; lately which, like all good mathematics, involves sitting alone in a room being confused almost all the time.&lt;/p&gt;

&lt;p&gt;I was not off to a very good start when a seemingly key relationship between Fisher information and the second derivative of the log likelihood eluded me, despite being described as &amp;#8220;obvious&amp;#8221; or &amp;#8220;simple&amp;#8221; in &lt;a href='http://books.google.com.au/books?id=5-70HAAACAAJ&amp;amp;dq=watanabe+statistical+learning+theory'&gt;several&lt;/a&gt; &lt;a href='http://books.google.com.au/books/about/Methods_of_Information_Geometry.html?id=vc2FWSo7wLUC'&gt;books&lt;/a&gt;. I finally figured out the main trick and thought I&amp;#8217;d share it here in case someone else has trouble with it (e.g., me in six months).&lt;/p&gt;

&lt;h2 id='fisher_information'&gt;Fisher Information&lt;/h2&gt;

&lt;p&gt;Fisher information is a quantity associated with parametric families of probability distributions. Let &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/d6c93b9a814f0a2c6b1e866be9504d8b.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt; be a set of outcomes and for each parameter &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/5b11e05f05dbbee9ab42e5ffa80feff6.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt; in some set &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/6abec63bfa6c11f274e4863abd4f5126.png' alt='equation' style='vertical-align: -0.111111111111111ex;height: 1.88888888888889ex;' /&gt;&lt;/span&gt; let &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/2b6d1f91d68c6df9b2e2d4575e25cb56.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.33333333333333ex;' /&gt;&lt;/span&gt; be the distribution over &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/d6c93b9a814f0a2c6b1e866be9504d8b.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt; associated with &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/5b11e05f05dbbee9ab42e5ffa80feff6.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt;. The &lt;em&gt;Fisher information&lt;/em&gt; for the family &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/bb9d3ebf7cb5d127ba8ab27694ffca69.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.22222222222222ex;' /&gt;&lt;/span&gt; is the matrix valued function where the entry&lt;sup id='fnref:1'&gt;&lt;a href='#fn:1' rel='footnote'&gt;1&lt;/a&gt;&lt;/sup&gt; at the &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/61351f38a577ef7bb357e23b2ae497d0.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt;th row and &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/6f7474ea0da44dad381c0a6894aa2c0f.png' alt='equation' style='vertical-align: -0.444444444444444ex;height: 2.0ex;' /&gt;&lt;/span&gt;th column is&lt;/p&gt;
&lt;div class='maruku-equation'&gt;&lt;img class='maruku-png' src='/images/latex/a7aaee5c11521966c1d28e9fe0c54b64.png' alt='equation' style='height: 2.44444444444444ex;' /&gt;&lt;div class='maruku-eq-tex'&gt;&lt;code style='display: none'&gt;	\displaystyle
	I_{i,j}(\theta) 
	= \mathbb{E}_X
	\left[ 
		\left( D_i \log p_\theta(X) \right) \left( D_j \log p_\theta(X) \right)
	\right]

&lt;/code&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;where the expectation is over the random variable &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/d6c93b9a814f0a2c6b1e866be9504d8b.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt; drawn from the distribution &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/e8e9ff0e49b242eed6ad4c358a03e1fb.png' alt='equation' style='vertical-align: -0.444444444444444ex;height: 1.44444444444444ex;' /&gt;&lt;/span&gt;, and &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/be6481ca1bf0352c260e8f9df3635d4a.png' alt='equation' style='vertical-align: -0.333333333333333ex;height: 1.88888888888889ex;' /&gt;&lt;/span&gt; denotes the partial derivative &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/073c3df75def66f5558ceea289aea306.png' alt='equation' style='vertical-align: -1.0ex;height: 3.0ex;' /&gt;&lt;/span&gt;. The Fisher information is always symmetric and positive semi-definite and can be seen as measuring the &amp;#8220;sensitivity&amp;#8221; of the &lt;em&gt;log likelihood&lt;/em&gt; &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/5184af2366ce1ef7e817fce8fdb16814.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.33333333333333ex;' /&gt;&lt;/span&gt; on the outcomes in a neighbourhood of &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/5b11e05f05dbbee9ab42e5ffa80feff6.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt;.&lt;/p&gt;

&lt;h2 id='_and_the_hessian_of_log_likelihood'&gt;&amp;#8230; and the Hessian of log likelihood&lt;/h2&gt;

&lt;p&gt;The result that had me puzzled for some time was the &amp;#8220;obvious&amp;#8221; fact that&lt;/p&gt;
&lt;div class='maruku-equation'&gt;&lt;img class='maruku-png' src='/images/latex/f42767513aac6c7e6110c0bc284b3d7a.png' alt='equation' style='height: 2.44444444444444ex;' /&gt;&lt;div class='maruku-eq-tex'&gt;&lt;code style='display: none'&gt;	\displaystyle
	I_{i,j}(\theta) 
	= - \mathbb{E}_X 
	\left[ 
		D_{i,j} \log p_\theta(X) 
	\right]

&lt;/code&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;where &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/b34287a5e0d9592c285108230a2a7eab.png' alt='equation' style='vertical-align: -0.666666666666667ex;height: 2.22222222222222ex;' /&gt;&lt;/span&gt; denotes the second-order partial derivative &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/4153bafea6985196fac3fde157ec4cba.png' alt='equation' style='vertical-align: -1.22222222222222ex;height: 3.55555555555556ex;' /&gt;&lt;/span&gt;. What this says is that the Fisher information is closely related to the curvature of the log likelihood function, as measured by its &lt;em&gt;Hessian&lt;/em&gt; &amp;#8211; that is, the matrix of its second derivatives &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/c0ec730e015499d1d21431f9de481966.png' alt='equation' style='vertical-align: -0.888888888888889ex;height: 2.66666666666667ex;' /&gt;&lt;/span&gt;.&lt;/p&gt;

&lt;p&gt;After much head-scratching, I realised that the &amp;#8220;trick&amp;#8221; I was missing was the observation that (under some mild conditions) the second derivatives and integrals can be switched so&lt;/p&gt;
&lt;div class='maruku-equation'&gt;&lt;img class='maruku-png' src='/images/latex/fdea83b64561ff81462e1193fa332120.png' alt='equation' style='height: 4.11111111111111ex;' /&gt;&lt;div class='maruku-eq-tex'&gt;&lt;code style='display: none'&gt;	\displaystyle
	\int_X D_{i,j} p_\theta(X)\,dx 
	= D_{i,j} \int_X p_\theta(X)\,dx 
	= D_{i,j} 1
	= 0

&lt;/code&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;since each &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/e8e9ff0e49b242eed6ad4c358a03e1fb.png' alt='equation' style='vertical-align: -0.444444444444444ex;height: 1.44444444444444ex;' /&gt;&lt;/span&gt; is a distribution.&lt;/p&gt;

&lt;p&gt;With the above identity in hand, establishing the relationship between Fisher information and the Hessian of log likelihood is just an application of the chain and product rules and noting that &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/4d41954374e514ef45a476df0cef0216.png' alt='equation' style='vertical-align: -1.11111111111111ex;height: 3.33333333333333ex;' /&gt;&lt;/span&gt;. Thus,&lt;/p&gt;
&lt;div class='maruku-equation'&gt;&lt;img class='maruku-png' src='/images/latex/3ae91b4f4913a7e933da4de292da6a87.png' alt='equation' style='height: 5.66666666666667ex;' /&gt;&lt;div class='maruku-eq-tex'&gt;&lt;code style='display: none'&gt;	\displaystyle
	D_{i,j} \log p_\theta(x)
	= D_i \left( \frac{D_j p_\theta(x)}{p_\theta(x)} \right)
	= \frac{D_{i,j} p_\theta(x)}{p_\theta(x)} 
		- \frac{D_i p_\theta(x)}{p_\theta(x)} \frac{D_j p_\theta(x)}{p_\theta(x)}.

&lt;/code&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Taking expectations and using the aforementioned trick gives the result since &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/e5133e8d1d2271833c0106c9a979fdba.png' alt='equation' style='vertical-align: -1.44444444444444ex;height: 4.0ex;' /&gt;&lt;/span&gt;.&lt;/p&gt;

&lt;p&gt;Everything is obvious in hindsight!&lt;/p&gt;
&lt;div class='footnotes'&gt;&lt;hr /&gt;&lt;ol&gt;&lt;li id='fn:1'&gt;
&lt;p&gt;I&amp;#8217;m going to ignore issues such as convergence, existence, etc. Just assume things are &amp;#8220;well-behaved&amp;#8221; where necessary.&lt;/p&gt;
&lt;a href='#fnref:1' rev='footnote'&gt;&amp;#8617;&lt;/a&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;</content>
 </entry>
 
 <entry>
   <title>Prediction with Expert Advice as Online Convex Optimisation</title>
   <link href="http://mark.reid.name/iem/prediction-with-expert-advice-as-online-convex-optimisation.html"/>
   <updated>2011-09-15T00:00:00+10:00</updated>
   <id>id:/iem/prediction-with-expert-advice-as-online-convex-optimisation</id>
   <content type="html">&lt;p&gt;I have been working with &lt;a href='http://users.cecs.anu.edu.au/%7Ewilliams/'&gt;Bob Williamson&lt;/a&gt; and &lt;a href='http://www.timvanerven.nl/'&gt;Tim Van Erven&lt;/a&gt; recently to &lt;a href='http://mark.reid.name/files/pubs/colt11.pdf'&gt;better understand&lt;/a&gt; the notion of &lt;em&gt;mixability&lt;/em&gt; in what is known as the Prediction With Expert Advice (PWEA) setting for online learning. I was curious as to how this setting &lt;a href='http://rml.cecs.anu.edu.au/'&gt;relates&lt;/a&gt; to another one that is commonly studied in learning theory: &lt;a href='http://webdocs.cs.ualberta.ca/~maz/publications/ICML03.pdf'&gt;online convex optimisation&lt;/a&gt; (OCO).&lt;/p&gt;

&lt;p&gt;It is already known that PWEA is a special case of OCO (see, for example, Peter Bartlett&amp;#8217;s &lt;a href='http://www.stat.berkeley.edu/~bartlett/talks/BeijingCourse2010.html'&gt;summer school course&lt;/a&gt; or Kalai and Vempala&amp;#8217;s &lt;a href='http://people.cs.uchicago.edu/~kalai/papers/onlineopt/onlineopt.pdf'&gt;JCSS paper&lt;/a&gt;) but I wanted to work out the correspondence explicitly for myself. Since there is one of those obvious-in-hindsight tricks involved I thought it would be worth writing up and sharing it.&lt;/p&gt;

&lt;h2 id='introduction'&gt;Introduction&lt;/h2&gt;

&lt;p&gt;&lt;a href='http://onlineprediction.net/?n=Main.PredictionWithExpertAdvice'&gt;Prediction With Expert Advice&lt;/a&gt; is typically posed as a game where in each round a learner receives advice in the form of predictions from a set of experts about a future outcome and then merges these expert opinions to form its own prediction. The outcome is then revealed and the learner and all of the experts receive a penalty depending on how well their prediction fits with the revealed outcome. This penalty is determined by a fixed loss function that is known to the learner. The aim of the learner in this game is to incur an aggregate penalty over many rounds that is not much worse than the best expert.&lt;/p&gt;

&lt;p&gt;You can easily imagine playing such a game yourself: each day you check a dozen different weather forecasts then make up your own mind about the chance of rain tomorrow, e.g., you predict a 75% chance of rain. The next day it will either rain or not and imagine that you and the experts lose points depending on how bad your predictions were: predicting a 75% chance when it is sunny loses you more points than if you predicted a 20% chance of rain. The function that determines exactly how many points you lose for predicting p% chance of rain when the outcome is sunny is called the &lt;em&gt;loss function&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Mixability is a property of a loss function that characterises when learning can occur efficiently in a PWEA game. That is, when it is possible to make the difference between the learner and the best expert—the &lt;em&gt;regret&lt;/em&gt;—decrease rapidly (specifically like &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/e195794595fcf535464e030f5073512c.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.33333333333333ex;' /&gt;&lt;/span&gt; after &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/bd3fb80ef03d2b5da97eff08edade65e.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt; rounds).&lt;/p&gt;

&lt;p&gt;In our &lt;a href='http://mark.reid.name/files/pubs/colt11.pdf'&gt;recent COLT paper&lt;/a&gt; we were able to characterise mixability in terms of the curvature of the loss for a natural class of losses known as &lt;em&gt;&lt;a href='http://mark.reid.name/iem/proper-losses.html'&gt;proper losses&lt;/a&gt;&lt;/em&gt;. These losses are &amp;#8220;sensible&amp;#8221; in that if the true probability of an outcome is &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/1673ac8aa9b48203da9b40ba52ef26de.png' alt='equation' style='vertical-align: -0.444444444444444ex;height: 1.44444444444444ex;' /&gt;&lt;/span&gt; then the expected loss is minimised by predicting &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/1673ac8aa9b48203da9b40ba52ef26de.png' alt='equation' style='vertical-align: -0.444444444444444ex;height: 1.44444444444444ex;' /&gt;&lt;/span&gt;. This seemingly innocent requirement actually gives rise to a lot of geometric structure that has been well studied in the economics literature, and that we exploit in our paper.&lt;/p&gt;

&lt;p&gt;Online Convex Optimisation is a similar type of game to PWEA in that both are &lt;a href='http://onlineprediction.net/?n=Main.CompetitiveOn-linePrediction'&gt;competitive online prediction&lt;/a&gt; games: a learner repeatedly makes predictions and receives a penalty based on that prediction and its performance is compared to a class of simple alternatives. The main differences between PWEA and OCO are that: the learner does not have access to expert predictions and their penalties; the regret of the learner is relative to a possibly uncountable set of alternatives; and the loss functions involved are assumed to be convex.&lt;/p&gt;

&lt;p&gt;Despite these differences, it is possible to present Prediction With Expert Advice as a very special case of Online Convex Optimisation. After formalising the two games, I&amp;#8217;ll present the &amp;#8220;trick&amp;#8221; for turning the former into the latter.&lt;/p&gt;

&lt;h2 id='prediction_with_expert_advice'&gt;Prediction with Expert Advice&lt;/h2&gt;

&lt;p&gt;In the general Prediction with Expert Advice (PWEA) game a learner competes against &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/b5384a1ad4b6cdde3ac4c1560d5e1ef4.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt; experts in a game consisting of &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/bd3fb80ef03d2b5da97eff08edade65e.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt; rounds. Each round, the each expert reveals a prediction from &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/802946733d6c88b87fa8cd15aeaa3fec.png' alt='equation' style='vertical-align: -0.0ex;height: 1.77777777777778ex;' /&gt;&lt;/span&gt;, the set of probabilities over &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/f8861bb7b295baba67534c07efa4b074.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt; outcomes. The learner observes and combines these to form its own prediction from &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/802946733d6c88b87fa8cd15aeaa3fec.png' alt='equation' style='vertical-align: -0.0ex;height: 1.77777777777778ex;' /&gt;&lt;/span&gt;. The world then reveals one of &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/f8861bb7b295baba67534c07efa4b074.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt; outcomes &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/96d67901210153ee8b9ade02612d685d.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.22222222222222ex;' /&gt;&lt;/span&gt; and the experts&amp;#8217; and learner&amp;#8217;s predictions are assessed via a loss function &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/c9559e610f585fdfa41d157a2396ae42.png' alt='equation' style='vertical-align: -0.111111111111111ex;height: 1.88888888888889ex;' /&gt;&lt;/span&gt; so that a prediction &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/1673ac8aa9b48203da9b40ba52ef26de.png' alt='equation' style='vertical-align: -0.444444444444444ex;height: 1.44444444444444ex;' /&gt;&lt;/span&gt; incurs a penalty &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/8b6ae183b814c73f68a5174e6f437419.png' alt='equation' style='vertical-align: -0.666666666666667ex;height: 2.44444444444444ex;' /&gt;&lt;/span&gt; when outcome &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/4d879bde964e549c6085ad630f56ad41.png' alt='equation' style='vertical-align: -0.444444444444444ex;height: 1.44444444444444ex;' /&gt;&lt;/span&gt; occurs.&lt;/p&gt;

&lt;p&gt;Expressed in a kind of pseudo-code, the game is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/b345d7bdf8df6959ab05b36740ab651d.png' alt='equation' style='vertical-align: -0.444444444444444ex;height: 2.0ex;' /&gt;&lt;/span&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Experts make predictions &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/ab134481a44e98ae86659006578f544f.png' alt='equation' style='vertical-align: -0.444444444444444ex;height: 2.22222222222222ex;' /&gt;&lt;/span&gt;&lt;/li&gt;

&lt;li&gt;Learner predicts &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/2e3894b0d684fd5fdd29cc0d8bb70758.png' alt='equation' style='vertical-align: -0.444444444444444ex;height: 2.22222222222222ex;' /&gt;&lt;/span&gt; based on expert predictions&lt;/li&gt;

&lt;li&gt;World reveals outcome &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/d4bfb929e97e8812678d989f7b937772.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.33333333333333ex;' /&gt;&lt;/span&gt;&lt;/li&gt;

&lt;li&gt;Experts incur penalties &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/0eb6d6e28e7acc5c05f6880cb2891988.png' alt='equation' style='vertical-align: -0.666666666666667ex;height: 2.44444444444444ex;' /&gt;&lt;/span&gt; and the learner incurs &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/4f543328b8e01dea1703f6638ba1ad3d.png' alt='equation' style='vertical-align: -0.666666666666667ex;height: 2.44444444444444ex;' /&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;The aim of the learner in this game is to minimise its total loss &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/6fc5b8aa3522d475d7a7f3e05e817a82.png' alt='equation' style='vertical-align: -0.666666666666667ex;height: 2.44444444444444ex;' /&gt;&lt;/span&gt; relative to the smallest expert loss &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/4fcc21b5bb6893c1d7785b32f3803d1f.png' alt='equation' style='vertical-align: -0.666666666666667ex;height: 2.44444444444444ex;' /&gt;&lt;/span&gt;. The difference &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/cd52f06b1456e81e7a1491a58b5659b3.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.33333333333333ex;' /&gt;&lt;/span&gt; is called the &lt;em&gt;regret&lt;/em&gt;.&lt;/p&gt;

&lt;h2 id='online_convex_optimisation'&gt;Online Convex Optimisation&lt;/h2&gt;

&lt;p&gt;Online Convex Optimisation (OCO) is similar in that a sequential game is played over &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/bd3fb80ef03d2b5da97eff08edade65e.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt; rounds where a learner makes prediction from some convex set &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/8d23c83965f08ae14e9d6bda71592397.png' alt='equation' style='vertical-align: -0.111111111111111ex;height: 1.88888888888889ex;' /&gt;&lt;/span&gt;. However, as mentioned above, the OCO game is simpler in that there are no (explicit) experts and more general in that the finite number of outcomes that the world can reveal is replaced by an arbitrary set of convex functions &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/88596759fe8fe399aa737c5570db315a.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt;. The function &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/e0e3bb3e630020121a88d820081b7c7f.png' alt='equation' style='vertical-align: -0.444444444444444ex;height: 2.0ex;' /&gt;&lt;/span&gt; chosen by the world and used to assign a penalty &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/3a88ea4bbc1ec9d2df4b23c4f94ba52d.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.33333333333333ex;' /&gt;&lt;/span&gt; to the learner.&lt;/p&gt;

&lt;p&gt;Expressed in pseudo-code, OCO is the following game:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/b345d7bdf8df6959ab05b36740ab651d.png' alt='equation' style='vertical-align: -0.444444444444444ex;height: 2.0ex;' /&gt;&lt;/span&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Learner predicts &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/29e7ceb81e962bbca229474a9c0f35db.png' alt='equation' style='vertical-align: -0.111111111111111ex;height: 1.88888888888889ex;' /&gt;&lt;/span&gt;&lt;/li&gt;

&lt;li&gt;World reveals a convex function &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/128c8db8e8628eab4a5c5a64c6f8c537.png' alt='equation' style='vertical-align: -0.444444444444444ex;height: 2.22222222222222ex;' /&gt;&lt;/span&gt;&lt;/li&gt;

&lt;li&gt;Learner incurs penalty &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/6c381007f48a409a023a74da652d00d4.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.33333333333333ex;' /&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;The learner&amp;#8217;s aim here is to minimise the regret relative to the best single prediction &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/740cfec77aa1288a84d0b8f29b03f57c.png' alt='equation' style='vertical-align: -0.111111111111111ex;height: 1.66666666666667ex;' /&gt;&lt;/span&gt; in hindsight. That is, the learner wants to minimise the difference between &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/57c4bbddc1c0d4b9aa74fdcef84bb7b2.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.33333333333333ex;' /&gt;&lt;/span&gt; and &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/63b93595d2b81f7f8eac97830f859fce.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.33333333333333ex;' /&gt;&lt;/span&gt;. Once again, the difference &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/fb14f64cc4d1ed76d19ac80381412de8.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.33333333333333ex;' /&gt;&lt;/span&gt; is called the &lt;em&gt;regret&lt;/em&gt;.&lt;/p&gt;

&lt;h2 id='pwea_is_a_special_case_of_oco'&gt;PWEA is a special case of OCO&lt;/h2&gt;

&lt;p&gt;We can show that PWEA is a special case of OCO by defining an OCO game that mimics the PWEA game.&lt;/p&gt;

&lt;p&gt;The main trick is to define the set of functions &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/88596759fe8fe399aa737c5570db315a.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt; for OCO so that step 1 in the PWEA game (where the experts reveal their predictions) can be simulated. Specifically, if for each &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/ea5a3ab7ba6d22f1f2721e8c9f820bee.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.22222222222222ex;' /&gt;&lt;/span&gt; in the PWEA game expert &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/ef2f2ccff656de8167a58ce0bfb17f0b.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt; makes prediction &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/e7bafef7ef49e0e2f73968bc27044531.png' alt='equation' style='vertical-align: -0.444444444444444ex;height: 2.22222222222222ex;' /&gt;&lt;/span&gt; and the outcome is &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/a2b8c10942a930881e72582777a18710.png' alt='equation' style='vertical-align: -0.444444444444444ex;height: 2.22222222222222ex;' /&gt;&lt;/span&gt;, we define a OCO game via a sequence of linear (and thus convex) functions &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/4a5edf3dde0d980b446a843a1b9b8ab9.png' alt='equation' style='vertical-align: -0.444444444444444ex;height: 2.22222222222222ex;' /&gt;&lt;/span&gt;. These are defined so that &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/ee885959fd776de2b68224b766e2556d.png' alt='equation' style='vertical-align: -0.666666666666667ex;height: 2.44444444444444ex;' /&gt;&lt;/span&gt; where &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/aa99059097bcccafc362199f47df105f.png' alt='equation' style='vertical-align: -0.0ex;height: 1.77777777777778ex;' /&gt;&lt;/span&gt; the vertices of &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/d3663f5445a0c641f0f789987416b13f.png' alt='equation' style='vertical-align: -0.0ex;height: 1.77777777777778ex;' /&gt;&lt;/span&gt; and are linearly extended to all &lt;em&gt;mixtures&lt;/em&gt; of &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/b5384a1ad4b6cdde3ac4c1560d5e1ef4.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt; experts, denoted &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/b82e4aaab041803ca1ef80d83e48bd8a.png' alt='equation' style='vertical-align: -0.111111111111111ex;height: 1.88888888888889ex;' /&gt;&lt;/span&gt;, by defining &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/4886c9f7a8574aaf6bbfd953ee890945.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.33333333333333ex;' /&gt;&lt;/span&gt;.&lt;/p&gt;

&lt;p&gt;This construction means that the learner in the OCO game can always mimic the performance of a single, fixed expert &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/ef2f2ccff656de8167a58ce0bfb17f0b.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt; in the PWEA game by constantly playing &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/aa99059097bcccafc362199f47df105f.png' alt='equation' style='vertical-align: -0.0ex;height: 1.77777777777778ex;' /&gt;&lt;/span&gt;. In some sense, this is how step 1 of the PWEA game is recovered in the OCO game.&lt;/p&gt;

&lt;p&gt;Now consider what happens when we minimise the total loss for this OCO game. This involves finding a mixture &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/8228ef60556cdf93632cfd66bb446e50.png' alt='equation' style='vertical-align: -0.111111111111111ex;height: 1.88888888888889ex;' /&gt;&lt;/span&gt; such that &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/2321dcd4089bfbf1a1cf6ca4410687b0.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.33333333333333ex;' /&gt;&lt;/span&gt; is minimised. Since &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/1bebd537c36c9ab114e2dc6727f04203.png' alt='equation' style='vertical-align: -0.666666666666667ex;height: 2.44444444444444ex;' /&gt;&lt;/span&gt; we see that &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/97997b70d5b85bf867f4dea4b6f72688.png' alt='equation' style='vertical-align: -0.666666666666667ex;height: 2.44444444444444ex;' /&gt;&lt;/span&gt; where &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/fcbb63ea64a1011ad4e14f0d7048ad06.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.33333333333333ex;' /&gt;&lt;/span&gt; is the total loss for expert &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/ef2f2ccff656de8167a58ce0bfb17f0b.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt; in the PWEA game. This weighted sum is clearly minimised by choosing the mixture &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/8228ef60556cdf93632cfd66bb446e50.png' alt='equation' style='vertical-align: -0.111111111111111ex;height: 1.88888888888889ex;' /&gt;&lt;/span&gt; that puts all its mass on the single expert &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/ef2f2ccff656de8167a58ce0bfb17f0b.png' alt='equation' style='vertical-align: -0.0ex;height: 1.55555555555556ex;' /&gt;&lt;/span&gt; corresponding to the smallest &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/fcbb63ea64a1011ad4e14f0d7048ad06.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.33333333333333ex;' /&gt;&lt;/span&gt; term. Furthermore, for that choice of &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/94c54bcc566da7ce486fd45e2cd0c654.png' alt='equation' style='vertical-align: -0.0ex;height: 1.77777777777778ex;' /&gt;&lt;/span&gt; we have &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/a4fd1a340729705344fc40e0f840be1e.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.33333333333333ex;' /&gt;&lt;/span&gt; and so &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/08bfc9936a9e41082d8883b522ed8513.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.33333333333333ex;' /&gt;&lt;/span&gt;.&lt;/p&gt;

&lt;p&gt;The above argument shows that any PWEA game can be presented as an OCO game and that the best single expert in the PWEA game corresponds to the best single prediction in the corresponding OCO game.&lt;/p&gt;

&lt;h2 id='regret_bounds'&gt;Regret Bounds&lt;/h2&gt;

&lt;p&gt;Since the minimal total loss in the PWEA and OCO games are equivalent, we can look at the regrets for both games by just considering the total loss for each. If a learner playing the OCO game predicts &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/40038ab59a228b337c3ce24b69fb4b44.png' alt='equation' style='vertical-align: -0.0ex;height: 1.77777777777778ex;' /&gt;&lt;/span&gt; at round &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/7ca2c7b5d8aded9e019a267b8ca1b94d.png' alt='equation' style='vertical-align: -0.0ex;height: 1.44444444444444ex;' /&gt;&lt;/span&gt; the loss it incurs is &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/7ead01d3b59ab23a04ee176eeb2015d7.png' alt='equation' style='vertical-align: -0.666666666666667ex;height: 2.44444444444444ex;' /&gt;&lt;/span&gt;. If all of the partial losses &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/b6dba15fbaa03a37977846a5ba638e8b.png' alt='equation' style='vertical-align: -0.666666666666667ex;height: 2.22222222222222ex;' /&gt;&lt;/span&gt; are &lt;em&gt;convex&lt;/em&gt; then we see that predicting &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/f41fdf18da177a9fb2f77e922df92fb5.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.33333333333333ex;' /&gt;&lt;/span&gt; in the PWEA game will incur a penalty &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/7a4e8b51866ebde777c1f63a990892ed.png' alt='equation' style='vertical-align: -0.666666666666667ex;height: 2.44444444444444ex;' /&gt;&lt;/span&gt; in that round.&lt;/p&gt;

&lt;p&gt;Therefore, any regret bound that holds for OCO will also hold for an OCO-simulated PWEA game with convex losses since the OCO regret dominates the PWEA regret achieved by just playing convex combinations of the expert predictions. For a recent summary of lower and upper bounds for various types of online optimisation games, I point the reader to the &lt;a href='http://colt2008.cs.helsinki.fi/papers/111-Abernethy.pdf'&gt;COLT 2008 paper&lt;/a&gt; by &lt;a href='http://www.cs.berkeley.edu/~jake/'&gt;Jake Abernethy&lt;/a&gt; and co-authors.&lt;/p&gt;

&lt;p&gt;What happens if the PWEA losses &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/b6dba15fbaa03a37977846a5ba638e8b.png' alt='equation' style='vertical-align: -0.666666666666667ex;height: 2.22222222222222ex;' /&gt;&lt;/span&gt; are not convex? The same reduction argument can be run only if for every mixture &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/8228ef60556cdf93632cfd66bb446e50.png' alt='equation' style='vertical-align: -0.111111111111111ex;height: 1.88888888888889ex;' /&gt;&lt;/span&gt; there exists prediction &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/6894d9330e4a02afc83ccf4f6bdb9a7f.png' alt='equation' style='vertical-align: -0.555555555555556ex;height: 2.33333333333333ex;' /&gt;&lt;/span&gt; such that for all outcomes &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/4d879bde964e549c6085ad630f56ad41.png' alt='equation' style='vertical-align: -0.444444444444444ex;height: 1.44444444444444ex;' /&gt;&lt;/span&gt; we have &lt;span class='maruku-inline'&gt;&lt;img class='maruku-png' src='/images/latex/2bacbe46f6afb8a2ea31246e8c867baa.png' alt='equation' style='vertical-align: -0.666666666666667ex;height: 2.44444444444444ex;' /&gt;&lt;/span&gt;. This is similar condition required of the &lt;a href='http://onlineprediction.net/?n=Main.SubstitutionFunction'&gt;substitution function&lt;/a&gt; needed in the &lt;a href='http://onlineprediction.net/?n=Main.WeakAggregatingAlgorithm'&gt;Weak Aggregating Algorithm&lt;/a&gt; so I suspect this condition is related to mixability but will leave the details for another time.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>NIPS 2011 Workshop on Relations Between Machine Learning Problems</title>
   <link href="http://mark.reid.name/iem/NIPS-workshop-relations-between-ml-problems.html"/>
   <updated>2011-09-02T00:00:00+10:00</updated>
   <id>id:/iem/NIPS-workshop-relations-between-ml-problems</id>
   <content type="html">&lt;p&gt;There is now a bewildering array of inference problems that techniques from machine learning can address: classification, regression, density estimation, clustering, ranking, recommendation, feature selection, hypothesis testing, and more. Multiply this by some of the modes in which learning can occur (batch vs. online, active vs. passive, partial vs. full feedback, inductive vs. transductive, semi-/un-/supervised) and there is a veritable zoo of problems out there.&lt;/p&gt;

&lt;p&gt;As part of a continuing effort to make sense of it all, &lt;a href='http://users.cecs.anu.edu.au/~williams/'&gt;Bob Williamson&lt;/a&gt;, &lt;a href='http://hunch.net/~jl/'&gt;John Langford&lt;/a&gt;, &lt;a href='http://www.informatik.uni-hamburg.de/ML/contents/people/luxburg/'&gt;Ulrike von Luxburg&lt;/a&gt;, &lt;a href='http://www.cs.ucla.edu/~jenn/'&gt;Jennifer Wortman Vaughan&lt;/a&gt;, and I are running a workshop at NIPS this year titled &lt;a href='http://rml.cecs.anu.edu.au'&gt;Relations Between Machine Learning Problems–an approach to unify the field&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This workshop will focus on relations between machine learning problems. The idea is that by better understanding how different machine learning problems relate to each other, we will be able to better understand the field as a whole.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At this stage we plan to have &lt;a href='http://homepages.cwi.nl/~pdg/'&gt;Peter Grünwald&lt;/a&gt; and &lt;a href='http://homepages.inf.ed.ac.uk/amos/'&gt;Amos Storkey&lt;/a&gt; give invited talks and a panel discussion on &amp;#8220;How to build a map of all of machine learning&amp;#8221;.&lt;/p&gt;

&lt;p&gt;&lt;a href='http://rml.anu.edu.au/Call%20for%20Submissions.html'&gt;Submissions&lt;/a&gt; are due by the end of September.&lt;/p&gt;

&lt;p&gt;If you need any further encouragement to join us, I just note that the NIPS workshops are in Spain&amp;#8217;s &lt;a href='http://en.wikipedia.org/wiki/Sierra_Nevada_(Spain)'&gt;Sierra Nevada&lt;/a&gt; this year.&lt;/p&gt;

&lt;p&gt;Hope to see you there!&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Artificial AI v2.0</title>
   <link href="http://mark.reid.name/iem/artificial-ai-v2.html"/>
   <updated>2011-08-12T00:00:00+10:00</updated>
   <id>id:/iem/artificial-ai-v2</id>
   <content type="html">&lt;p&gt;The &lt;a href='/iem/artificial-ai.html'&gt;first Artificial AI&lt;/a&gt; that my &lt;a href='http://twitter.com/#!/JVLamond'&gt;co-developer&lt;/a&gt; and I started almost three years ago &amp;#8212; project Ada &amp;#8212; is coming along nicely. Her (non-computer) vision, (non-machine) learning, NLP, and planning systems are coming along nicely.&lt;/p&gt;

&lt;p&gt;Here&amp;#8217;s a recent screenshot&lt;sup id='fnref:1'&gt;&lt;a href='#fn:1' rel='footnote'&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;dl class='figure'&gt;
&lt;dt&gt;&lt;img src='/images/ada-2011.png' alt='Ada' /&gt;&lt;/dt&gt;

&lt;dd&gt;Ada (AAI v1.0)&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;Encouraged by the success of the first version of the project we decided to take the plunge and start on version 2.&lt;/p&gt;

&lt;p&gt;We call this version &amp;#8220;Edith Valerie Reid&amp;#8221;.&lt;/p&gt;

&lt;dl class='figure'&gt;
&lt;dt&gt;&lt;img src='/images/edith-2011.jpg' alt='Edith' /&gt;&lt;/dt&gt;

&lt;dd&gt;Edith (AAI v2.0)&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;Her initial release on the 26th of July and so far everything is going really well. As with the first version, there are some issues to resolve such as the sleep-synchronisation routines but these are very minor and, from what I&amp;#8217;ve heard, common to this type of project.&lt;/p&gt;

&lt;p&gt;All in all, I&amp;#8217;m declaring the version two release a resounding success.&lt;/p&gt;
&lt;div class='footnotes'&gt;&lt;hr /&gt;&lt;ol&gt;&lt;li id='fn:1'&gt;
&lt;p&gt;Both photos are courtesy of &lt;a href='http://users.cecs.anu.edu.au/~williams/'&gt;Bob Williamson&lt;/a&gt;.&lt;/p&gt;
&lt;a href='#fnref:1' rev='footnote'&gt;&amp;#8617;&lt;/a&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;</content>
 </entry>
 
 <entry>
   <title>ML Discuss for ICML 2011</title>
   <link href="http://mark.reid.name/iem/mldiscuss-for-2011.html"/>
   <updated>2011-06-20T00:00:00+10:00</updated>
   <id>id:/iem/mldiscuss-for-2011</id>
   <content type="html">&lt;p&gt;&lt;a href='http://www.icml-2011.org/'&gt;ICML 2011&lt;/a&gt; is just around the corner so, with the help of some of the &lt;a href='http://www.icml-2011.org/organization.php'&gt;conference organisers&lt;/a&gt;, I recently updated the &lt;a href='http://mldiscuss.appspot.com'&gt;ML Discuss site&lt;/a&gt; I set up last year&lt;sup id='fnref:1'&gt;&lt;a href='#fn:1' rel='footnote'&gt;1&lt;/a&gt;&lt;/sup&gt; to allow everyone to &lt;a href='http://mldiscuss.appspot.com/venue/ICML/2011/'&gt;comment on accepted papers at ICML 2011&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you are an &lt;a href='http://mldiscuss.appspot.com/author/'&gt;author&lt;/a&gt; of a paper in ICML 2011, I urge you to find your papers and subscribe to their comment feeds via email or RSS. Instructions can be found on the &lt;a href='http://mldiscuss.appspot.com/'&gt;main page&lt;/a&gt;. Of course, you can also subscribe to comments for papers you are interested in even if you are not an author.&lt;/p&gt;

&lt;p&gt;Last year at &lt;a href='http://mldiscuss.appspot.com/venue/ICML/2010/'&gt;ICML 2010&lt;/a&gt; most of the online discussion happened during the conference itself. I expect that will also be the case this year but I would encourage people—even those, like me, who are not attending ICML 2011—to start commenting now. If you are after somewhere to start I would recommend checking out the Best and Distinguished Papers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='http://mldiscuss.appspot.com/venue/ICML/2011/article/599/'&gt;Computational Rationalization: The Inverse Equilibrium Problem&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='http://mldiscuss.appspot.com/venue/ICML/2011/article/542/'&gt;Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='http://mldiscuss.appspot.com/venue/ICML/2011/article/456/'&gt;Variational Heteroscedastic Gaussian Process Regression&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='http://mldiscuss.appspot.com/venue/ICML/2011/article/480/'&gt;Minimum Probability Flow Learning&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='http://mldiscuss.appspot.com/venue/ICML/2011/article/235/'&gt;Approximate Dynamic Programming for Storage Problems&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='http://mldiscuss.appspot.com/venue/ICML/2011/article/333/'&gt;Predicting Legislative Roll Calls from Text&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='http://mldiscuss.appspot.com/venue/ICML/2011/article/125/'&gt;Parsing Natural Scenes and Natural Language with Recursive Neural Networks&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Looking ahead, I spent some time making sure the current version of ML Discuss can handle multiple conferences and separately track the top and recent comments for each. If you are a conference organiser for another machine learning-related conference and would like to have it discussed at ML Discuss, drop me an &lt;a href='mailto:mark@reid.name'&gt;email&lt;/a&gt; and let me know. All that I require is a text file containing all the accepted papers, along with their authors and abstracts.&lt;/p&gt;

&lt;p&gt;Happy commenting!&lt;/p&gt;
&lt;div class='footnotes'&gt;&lt;hr /&gt;&lt;ol&gt;&lt;li id='fn:1'&gt;
&lt;p&gt;I&amp;#8217;ve been running ICML discussion sites since &lt;a href='http://conflate.net/icml/'&gt;2008&lt;/a&gt; but built a custom system for ICML 2010. I hope to port the older discussion sites to the new system when I find some time.&lt;/p&gt;
&lt;a href='#fnref:1' rev='footnote'&gt;&amp;#8617;&lt;/a&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;</content>
 </entry>
 
 
</feed>
