I delivered a short, six hour version of my Information Theory course at Tsinghua University from Nov. 28th to Dec. 2nd, 2014.

Lecture 1: Overview & Introduction

Lecture 2: Key results & inequalities

Lecture 3: Some applications to machine learning

The material in lectures 1 and 2 are directly from the longer version of my Information Theory course. The primary reference for almost all that material is MacKay’s *Information Theory, Inference, and Learning Algorithms* and, to a lesser extent, Cover & Thomas’s *Elements of Information Theory*.

Lecture 3 draws on a variety of material, including:

*Information, Divergence, and Risk*(Reid & Williamson, 2011) and references therein for the geometric take on Bayes risk curves.Cover & Thomas’s

*Elements of Information Theory*for the treatment of Fano’s inequality.*A Game of Prediction with Expert Advice*(Vovk, 1998) on mixability.Ceas-Bianchi & Lugosi’s excellent

*Prediction, Learning, and Games*book on online learning for the proof of the mixability result.*Entropic Duality and Generalised Mixability*(Reid, Frongillo, Williamson, Mehta, 2014) for rewriting the mixability condition in terms of an optimisation.*Convex Foundations for Generalized MaxEnt Models*(Frongillo & Reid, 2013) for the section on expressing exponential families in terms of convex conjugates.There are a number of papers worth exploring here on the information bottleneck method that I briefly mentioned at the end of the last lecture.