I delivered a short, six hour version of my Information Theory course at Tsinghua University from Nov. 28th to Dec. 2nd, 2014.
Lecture 1: Overview & Introduction
Lecture 2: Key results & inequalities
Lecture 3: Some applications to machine learning
The material in lectures 1 and 2 are directly from the longer version of my Information Theory course. The primary reference for almost all that material is MacKay’s Information Theory, Inference, and Learning Algorithms and, to a lesser extent, Cover & Thomas’s Elements of Information Theory.
Lecture 3 draws on a variety of material, including:
Information, Divergence, and Risk (Reid & Williamson, 2011) and references therein for the geometric take on Bayes risk curves.
Cover & Thomas’s Elements of Information Theory for the treatment of Fano’s inequality.
A Game of Prediction with Expert Advice (Vovk, 1998) on mixability.
Ceas-Bianchi & Lugosi’s excellent Prediction, Learning, and Games book on online learning for the proof of the mixability result.
Entropic Duality and Generalised Mixability (Reid, Frongillo, Williamson, Mehta, 2014) for rewriting the mixability condition in terms of an optimisation.
Convex Foundations for Generalized MaxEnt Models (Frongillo & Reid, 2013) for the section on expressing exponential families in terms of convex conjugates.
There are a number of papers worth exploring here on the information bottleneck method that I briefly mentioned at the end of the last lecture.