Understanding and analyzing the effectiveness of uncertainty sampling
Abstract/Contents
- Abstract
- Active learning techniques attempt to reduce the amount of data required to learn a classifier by leveraging adaptivity. In particular, an algorithm iteratively selects and labels points from an unlabeled pool of data points. Over the history of active learning, many algorithms have been developed, though one heuristic algorithm, uncertainty sampling, stands out by its popularity, effectiveness, simplicity, and intuitiveness. Despite this, uncertainty sampling has known failure modes and lacks the theoretical underpinnings of some other algorithms such as those based on disagreement. Here, we present a few analyses of uncertainty sampling. First, we find that uncertainty sampling iterations implicitly optimizes the (generally non-convex) zero-one loss, explaining how uncertainty sampling can achieve lower error than labeling the entire unlabeled pool and highlighting the importance of a good initialization. Second, for logistic regression, we show that the extent to which uncertainty sampling outperforms random sampling is inversely proportional to the asymptotic error, both theoretically and empirically. Finally, we use the previous insights to show uncertainty sampling works very well on a particular NLP task due to extreme label imbalance. Taken together, these results provide a sturdier foundation for understanding and using uncertainty sampling.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2021; ©2021 |
Publication date | 2021; 2021 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Mussmann, Stephen Oscar |
---|---|
Degree supervisor | Liang, Percy |
Thesis advisor | Liang, Percy |
Thesis advisor | Ré, Christopher |
Thesis advisor | Sadigh, Dorsa |
Degree committee member | Ré, Christopher |
Degree committee member | Sadigh, Dorsa |
Associated with | Stanford University, Computer Science Department |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Stephen Mussmann. |
---|---|
Note | Submitted to the Computer Science Department. |
Thesis | Thesis Ph.D. Stanford University 2021. |
Location | https://purl.stanford.edu/gw920rq6947 |
Access conditions
- Copyright
- © 2021 by Stephen Oscar Mussmann
- License
- This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).
Also listed in
Loading usage metrics...