Provenance in data-oriented workflows
Abstract/Contents
- Abstract
- Data-processing tasks are commonly managed using data-oriented workflows, in which input data sets are processed by a graph of transformations to produce output data. In data-oriented workflows, it can be useful to track data provenance (also sometimes called lineage), which describes where data came from and how it has been manipulated and combined. We begin by giving a new general definition of provenance, introducing the notions of correctness, precision, and minimality. We then: (1) Describe a wrapper-based approach for capturing provenance in workflows in which all transformations are either map or reduce functions; (2) Describe a provenance-based approach for selectively refreshing one or more elements in the output data, i.e., computing the latest values of particular output elements based on modified input data; (3) Show how logical provenance, i.e., provenance information stored at the transformation level, can often capture precise provenance relationships in a compact fashion; (4) Describe our prototype system called Panda (for Provenance And Data) that supports refresh in data-oriented workflows, as well as debugging and drill-down using logical provenance. Overall, our work provides a comprehensive foundation, set of algorithms, and prototype system for provenance in data-oriented workflows.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2012 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Ikeda, Robert Michael |
---|---|
Associated with | Stanford University, Computer Science Department |
Primary advisor | Widom, Jennifer |
Thesis advisor | Widom, Jennifer |
Thesis advisor | Das Sarma, Anish |
Thesis advisor | Garcia-Molina, Hector |
Advisor | Das Sarma, Anish |
Advisor | Garcia-Molina, Hector |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Robert Ikeda. |
---|---|
Note | Submitted to the Department of Computer Science. |
Thesis | Thesis (Ph.D.)--Stanford University, 2012. |
Location | electronic resource |
Access conditions
- Copyright
- © 2012 by Robert Michael Ikeda
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...