Understanding and analyzing the effectiveness of uncertainty sampling

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fgw920rq6947" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Active learning techniques attempt to reduce the amount of data required to learn a classifier by leveraging adaptivity. In particular, an algorithm iteratively selects and labels points from an unlabeled pool of data points. Over the history of active learning, many algorithms have been developed, though one heuristic algorithm, uncertainty sampling, stands out by its popularity, effectiveness, simplicity, and intuitiveness. Despite this, uncertainty sampling has known failure modes and lacks the theoretical underpinnings of some other algorithms such as those based on disagreement. Here, we present a few analyses of uncertainty sampling. First, we find that uncertainty sampling iterations implicitly optimizes the (generally non-convex) zero-one loss, explaining how uncertainty sampling can achieve lower error than labeling the entire unlabeled pool and highlighting the importance of a good initialization. Second, for logistic regression, we show that the extent to which uncertainty sampling outperforms random sampling is inversely proportional to the asymptotic error, both theoretically and empirically. Finally, we use the previous insights to show uncertainty sampling works very well on a particular NLP task due to extreme label imbalance. Taken together, these results provide a sturdier foundation for understanding and using uncertainty sampling.

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2021; ©2021
Publication date	2021; 2021
Issuance	monographic
Language	English

Author	Mussmann, Stephen Oscar
Degree supervisor	Liang, Percy
Thesis advisor	Liang, Percy
Thesis advisor	Ré, Christopher
Thesis advisor	Sadigh, Dorsa
Degree committee member	Ré, Christopher
Degree committee member	Sadigh, Dorsa
Associated with	Stanford University, Computer Science Department

Genre	Theses
Genre	Text

Statement of responsibility	Stephen Mussmann.
Note	Submitted to the Computer Science Department.
Thesis	Thesis Ph.D. Stanford University 2021.
Location	https://purl.stanford.edu/gw920rq6947

License: This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

View in SearchWorks

Loading usage metrics...