No Comments

Avrim Blum and Tom Mitchell Win ICML/COLT 10-Year Best Paper Award

News

More and more conferences are giving “test of time” awards, to recognize research publications from ten or more years ago that have proven to be truly seminal. The ICML (International Conference on Machine Learning) and COLT (Annual Conference on Learning Theory) conferences, two of the major symposia in the areas of machine learning and learning theory, are the latest example. They have jointly started the ICML/COLT 10-Year Best Paper Award. And great news — Avrim Blum and Tom Mitchell have won the inaugural award, for their seminal paper, “Combining labeled and unlabeled data with co-training,” which was published in COLT 1998

In giving the award, the ICML co-chairs, Andrew McCallum and Sam Roweis, state that the “paper is wonderfully creative, and has been widely influential. We feel very fortunate and gratified to be able to use this paper as a precedent-setting first example of this 10-Year Award in our community.”

I asked Avrim and Tom for a few words, to explain what they think is important about this paper. Avrim and Tom did this work in the Computer Science Department. Tom is now the Fredkin Professor of AI and Machine Learning, and Chair of the Machine Learning Department, which is perhaps the premier academic department for machine learning research. Tom’s reflection on this paper gives a great sense of the technical contribution:

The paper is about a simple but counter-intuitive idea: Suppose you are interested in training a classifier given labeled training examples. For instance, in this paper we considered training a program to classify web pages into categories such as “course home page” and “faculty home page.” The obvious way to train is to collect some labeled training examples — web pages that are labeled as “course” or “faculty.” In this paper we asked the question “can we improve the accuracy of the learned classifier by using unlabeled web pages as well?” At first it might seem unlikely that unlabeled web pages would be helpful, because there is really no way for the learner to know which belongs to which category. However, in this case (and a fairly broad class of other classifier learning problems) one can show that unlabeled data is indeed useful for improving the accuracy of the learned classifier. The reason is that this problem has a nice structure: the features used to describe each web page can be partitioned into two sets, X1 and X2. One (X1) is the collection of words on the web page. The second is the collection of words on hyperlinks that point to the web page. In the paper we introduced the idea of co-training, in which one trains two distinct classifiers — one using feature set X1 to describe the web pages, and the other using feature set X2. In our particular algorithm, we trained the two classifiers first using labeled data. The classifier using X1 was then applied to unlabeled data to find cases that it classified wit high confidence, and those were then used to “co-train” the classifier for X2.

In essence, the key insight here is that for any unlabeled web page, the label assigned by the two classifiers must be the same. The two classifiers, given a new unlabeled example, can both predict “course” or they can both predict “faculty,” but they cannot disagree. Therefore, each unlabeled example can be used as a constraint on the joint choice of parameters defining the two classifiers. The paper goes on to develop a formal model of this problem setting showing that if the feature sets X1 and X2 are both sufficient to predict the class, and if they are conditionally independent given the class, then unlabeled data will be very useful (in a specific technical sense) in improving classification accuracy.

Since the paper was published, others have extended both the theory and the set of algorithms for using a combination of labeled and unlabeled data in such co-training problem settings. And as a practical matter, people have found there are many problems which approximately exhibit this structure, from classifying objects in videos, to email classification, to named entity recognition.

To this, Avrim adds:

My sense is the paper had two main impacts. One is that it has turned out there are many domains where you do indeed have two (or more) different kinds of information about your data and you can use them in this way. E.g., your eyes help train your ears and vice-versa. Or even just two cameras with different filters on them (in some cases one is more confident than the other, and they can help train each other). The other impact has been more broadly in saying that there are interesting and perhaps unexpected ways of using unlabeled data, and helping to open up the whole direction of “semi-supervised” learning (using both labeled and unlabeled data).

This is truly a research contribution that is irresistible in its theoretical depth and elegance, blended with impact on real-world practical problems. In my view represents the best type of computing research. Congratulations, Avrim and Tom!

Peter Lee @ July 15, 2008

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>