Provenance-Based Debugging and Drill-Down in Data-Oriented Workflows

Robert Ikeda, Junsang Cho, Charlie Fang, Semih Salihoglu, Satoshi Torikai, Jennifer Widom

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Scopus citations

Abstract

Panda (for Provenance and Data) is a system that supports the creation and execution of data-oriented workflows, with automatic provenance generation and built-in provenance tracing operations. Workflows in Panda are arbitrary a cyclic graphs containing both relational (SQL) processing nodes and opaque processing nodes programmed in Python. For both types of nodes, Panda generates logical provenance - provenance information stored at the processing-node level - and uses the generated provenance to support record-level backward tracing and forward tracing operations. In our demonstration we use Panda to integrate, process, and analyze actual education data from multiple sources. We specifically demonstrate how Panda's provenance generation and tracing capabilities can be very useful for workflow debugging, and for drilling down on specific results of interest. © 2012 IEEE.
Original languageEnglish (US)
Title of host publication2012 IEEE 28th International Conference on Data Engineering
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages1249-1252
Number of pages4
ISBN (Print)9780769547473
DOIs
StatePublished - Apr 2012
Externally publishedYes

Fingerprint Dive into the research topics of 'Provenance-Based Debugging and Drill-Down in Data-Oriented Workflows'. Together they form a unique fingerprint.

Cite this