Data analytics : integration and privacy

Placeholder Show Content

Abstract/Contents

Abstract
Data analytics has become an extremely important and challenging problem in disciplines like computer science, biology, medicine, finance, and homeland security. As massive amounts of data are available for analysis, scalable integration techniques become important. At the same time, new privacy issues arise where one's sensitive information can easily be inferred from the large amounts of data. In this thesis, we first cover the problem of entity resolution (ER), which identifies database records that refer to the same real-world entity. The recent explosion of data has now made ER a challenging problem in a wide range of applications. We propose scalable ER techniques and new ER functionalities that have not been studied in the past. We also view ER as a black-box operation and provide general techniques that can be used across applications. Next, we introduce the problem of managing information leakage, where one must try to prevent important bits of information from being resolved by ER, to guard against loss of data privacy. As more of our sensitive data gets exposed to a variety of merchants, health care providers, employers, social sites and so on, there is a higher chance that an adversary can "connect the dots" and piece together our information, leading to even more loss of privacy. We propose a measure for quantifying information leakage and use "disinformation" as a tool for containing information leakage.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2012
Issuance monographic
Language English

Creators/Contributors

Associated with Whang, Steven Euijong
Associated with Stanford University, Computer Science Department
Primary advisor Garcia-Molina, Hector
Thesis advisor Garcia-Molina, Hector
Thesis advisor Leskovec, Jurij
Thesis advisor Widom, Jennifer
Advisor Leskovec, Jurij
Advisor Widom, Jennifer

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Steven E. Whang.
Note Submitted to the Department of Computer Science.
Thesis Thesis (Ph.D.)--Stanford University, 2012.
Location electronic resource

Access conditions

Copyright
© 2012 by Steven Euijong Whang
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...