Understanding and analyzing the effectiveness of uncertainty sampling

Placeholder Show Content

Abstract/Contents

Abstract
Active learning techniques attempt to reduce the amount of data required to learn a classifier by leveraging adaptivity. In particular, an algorithm iteratively selects and labels points from an unlabeled pool of data points. Over the history of active learning, many algorithms have been developed, though one heuristic algorithm, uncertainty sampling, stands out by its popularity, effectiveness, simplicity, and intuitiveness. Despite this, uncertainty sampling has known failure modes and lacks the theoretical underpinnings of some other algorithms such as those based on disagreement. Here, we present a few analyses of uncertainty sampling. First, we find that uncertainty sampling iterations implicitly optimizes the (generally non-convex) zero-one loss, explaining how uncertainty sampling can achieve lower error than labeling the entire unlabeled pool and highlighting the importance of a good initialization. Second, for logistic regression, we show that the extent to which uncertainty sampling outperforms random sampling is inversely proportional to the asymptotic error, both theoretically and empirically. Finally, we use the previous insights to show uncertainty sampling works very well on a particular NLP task due to extreme label imbalance. Taken together, these results provide a sturdier foundation for understanding and using uncertainty sampling.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2021; ©2021
Publication date 2021; 2021
Issuance monographic
Language English

Creators/Contributors

Author Mussmann, Stephen Oscar
Degree supervisor Liang, Percy
Thesis advisor Liang, Percy
Thesis advisor Ré, Christopher
Thesis advisor Sadigh, Dorsa
Degree committee member Ré, Christopher
Degree committee member Sadigh, Dorsa
Associated with Stanford University, Computer Science Department

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Stephen Mussmann.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2021.
Location https://purl.stanford.edu/gw920rq6947

Access conditions

Copyright
© 2021 by Stephen Oscar Mussmann
License
This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

Loading usage metrics...