Data analytics : integration and privacy
- Data analytics has become an extremely important and challenging problem in disciplines like computer science, biology, medicine, finance, and homeland security. As massive amounts of data are available for analysis, scalable integration techniques become important. At the same time, new privacy issues arise where one's sensitive information can easily be inferred from the large amounts of data. In this thesis, we first cover the problem of entity resolution (ER), which identifies database records that refer to the same real-world entity. The recent explosion of data has now made ER a challenging problem in a wide range of applications. We propose scalable ER techniques and new ER functionalities that have not been studied in the past. We also view ER as a black-box operation and provide general techniques that can be used across applications. Next, we introduce the problem of managing information leakage, where one must try to prevent important bits of information from being resolved by ER, to guard against loss of data privacy. As more of our sensitive data gets exposed to a variety of merchants, health care providers, employers, social sites and so on, there is a higher chance that an adversary can "connect the dots" and piece together our information, leading to even more loss of privacy. We propose a measure for quantifying information leakage and use "disinformation" as a tool for containing information leakage.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Whang, Steven Euijong
|Stanford University, Computer Science Department
|Statement of responsibility
|Steven E. Whang.
|Submitted to the Department of Computer Science.
|Thesis (Ph.D.)--Stanford University, 2012.
- © 2012 by Steven Euijong Whang
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...