Weak supervision from high-level abstractions
Abstract/Contents
- Abstract
- The interfaces for interacting with machine learning models are changing. Consider, for example, that while computers run on 1s and 0s, that is no longer the level of abstraction we use to program most computers. Instead, we use higher-level abstractions such as assembly language, high-level languages, or declarative languages to more efficiently convert our objectives into code. Similarly, most machine learning models are trained with "1s and 0s" (individually labeled examples), but we need not limit ourselves to interacting with them at this low level. Instead, we can use higher-level abstractions to more efficiently convert our domain knowledge into the inputs our models require. In this work, we show that weak supervision from high-level abstractions can be used to train high-performance machine learning models. At three different levels of abstraction, we describe the system we built to enable such interaction. We begin with Snorkel, which elevates label generation from a manual process to a programmatic one. With this system, domain experts encode their knowledge in potentially noisy and correlated black-box functions called labeling functions. These functions can then be automatically denoised and applied to unlabeled data to create large training sets quickly. Next, with Fonduer we enable an abstraction one step higher where advanced primitives defined over multiple modalities (visual, textual, structural, and tabular) allow users to programmatically supervise over richly formatted data (e.g., PDFs with tables and formatting). Finally, in BabbleLabble we show that we can even utilize supervision given in the form of natural language explanations, maintaining the benefits of programmatic supervision while removing the burden of writing code. For all of these systems, we demonstrate their effectiveness with empirical results and present real-world use cases where they have enabled rapid development of machine learning applications, including in bio-medicine, commerce, and defense.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2019; ©2019 |
Publication date | 2019; 2019 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Hancock, Braden Jay | |
---|---|---|
Degree supervisor | Ré, Christopher | |
Thesis advisor | Ré, Christopher | |
Thesis advisor | Jurafsky, Dan, 1962- | |
Thesis advisor | Liang, Percy | |
Degree committee member | Jurafsky, Dan, 1962- | |
Degree committee member | Liang, Percy | |
Associated with | Stanford University, Computer Science Department. |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Braden Jay Hancock. |
---|---|
Note | Submitted to the Computer Science Department. |
Thesis | Thesis Ph.D. Stanford University 2019. |
Location | electronic resource |
Access conditions
- Copyright
- © 2019 by Braden Jay Hancock
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...