Mugizi Robert Rwebangira

Techniques for Exploiting Unlabeled Data

Degree Type: Ph.D. in Computer Science
Advisor(s): Avrim Blum, John Lafferty
Graduated: December 2008

Keywords: Semi-supervised, regression, unlabeled data, similarity

Abstract

In many machine learning application domains obtaining labeled data is expensive but obtaining unlabeled data is much cheaper. For this reason there has been growing interest in algorithms that are able to take advantage of unlabeled data. In this thesis we develop several methods for taking advantage of unlabeled data in classification and regression tasks.

Specific contributions include:

A method for improving the performance of the graph mincut algorithm of Blum and Chawla [12] by taking randomized mincuts. We give theoretical motivation for this approach and we present empirical results showing that randomized mincut tends to outperform the original graph mincut algorithm, especially when the number of labeled examples is very small.
An algorithm for semi-supervised regression based on manifold regularization using local linear estimators. This is the first extension of local linear regression to the semi-supervised setting. In this thesis we present experimental results on both synthetic and real data and show that this method tends to perform better than methods which only utilize the labeled data.
An investigation of practical techniques for using the Winnow algorithm (which is not directly kernelizable) together with kernel functions and general similarity functions via unlabeled data. We expect such techniques to be particularly useful when we have a large feature space as well as additional similarity measures that we would like to use together with the original features. This method is also suited to situations where the best performing measure of similarity does not satisfy the properties of a kernel. We present some experiments on real and synthetic data to support this approach.

Thesis Committee

Avrim Blum (Co-Chair)
John Lafferty (Co-Chair)
William Cohen
Xiaojin (Jerry) Zhu (Wisconsin)

Peter Lee, Head, Computer Science Department
Randy Bryant, Dean, School of Computer Science

Thesis Document

CMU-CS-08-164.pdf (889.9 KB) (114 pages)

At a Glance

Academic Offerings

Admissions

Directory Submenu

People

Explore the Field

Mugizi Robert Rwebangira

Techniques for Exploiting Unlabeled Data

Abstract

Thesis Committee

Thesis Document

At a Glance

Academic Offerings

Admissions

Directory Submenu

People

Explore the Field

What can we help you find?

Mugizi Robert Rwebangira

Techniques for Exploiting Unlabeled Data

Abstract

Thesis Committee

Thesis Document