Prediction Services

I have been thinking about learning and prediction as services for some time now. Like all good ideas, they tend to be thought of independently by several people when their time is ripe. Therefore I was not completely surprised when I heard the news yesterday that Google has released a new RESTful prediction API.

As a couple of other bloggers (John, Panos) have already noted, this is very exciting as it has the potential of making statistical inference a commodity and putting machine learning tools in the hands of everyday developers.

Using the API

The details are a little scant as the API is not yet open to the public at the moment but, as the FAQ and sample code explain, it appears to work as follows:

  1. A data set in CSV format is uploaded to Google storage. This can contain up to 100 million rows of text or numeric features. Each row can be associated with one of up to several hundred classes.

  2. The URL obtained after uploading the data set is POSTed to a second URL, /prediction/v1/train/DATA_ID, for Google’s learning algorithm (all URLs are relative to https://googleapis.com). It is not clear what algorithms are being used behind the scenes for this step but the home page says the API will automatically choose from a variety of techniques.

  3. The training occurs asynchronously and its progress can be queried by issuing a GET to /prediction/v1/query/DATA_ID. Once training is completed, this query will return a cross-validated estimate of the learned model’s accuracy.

  4. To make a new prediction with the trained model, a POST request containing the data to classify is sent to the /prediction/v1/query/DATA_ID URL and a label prediction is returned.

Although this is a relatively simple API and, at present, only deals with classification, I believe it has the potential to cover a large proportion of most web developers’ prediction needs (e.g., text classification, sentiment analysis, click-through analysis) as well as several scientific applications.

The Google Prediction API is not the first to offers inferential services over the web but I do think they are the first to focus on building reusable predictors and to do it with a clean API design.

Some other projects offering prediction services include:

Of course, there are also many machine learning toolkits such as Weka, Orange, Elefant, Rattle and more that provide implementations of algorithms, but these do not offer them as services.

Future Predictions

Over the last few years we’ve seen a dramatic increase in the amount of data being generated and made available over the web (e.g., Freebase, DBpedia, Data.gov, Netflix, and protein databases). Also, thanks to services by Google, Amazon and others, there has also been a large-scale commodification of computational power and storage.

There are a handful of companies at present — Flightcaster, for example — who have realised that there is immense opportunity at the intersection of these developments to start applying large-scale machine learning. Hopefully, what the Google Prediction API and other services will provide is the spark for an explosion of new and creative approaches to distilling knowledge from raw information.

I will be watching how this all unfolds with great interest.

Mark Reid May 21, 2010 Canberra, Australia
Subscribe: Atom Feed