Protecting privacy when mining and sharing user data
- In today's digital age, online services, such as search engines and social networks, collect large amounts of data about their users and their users' online activities. Large-scale mining and sharing of this data has been a key driver of innovation and improvement in the quality of these services, but has also raised major user privacy concerns. This thesis aims to help companies find ways to mine and share user data for the purpose of furthering innovation while all the while protecting their users' privacy, and to motivate and help them reason about the privacy-utility trade-offs using a rigorous quantifiable definition of privacy. To achieve this we explore examples of privacy violations, propose privacy-preserving algorithms, and analyze the trade-offs between utility and privacy for several concrete algorithmic problems in search and social network domains. We propose and execute two novel privacy attacks on an advertising system of a social network that lead to breaches of user privacy. The attacks take advantage of the advertising system's use of users' private profile data, the powerful microtargeting capabilities provided by the system, and the detailed ad campaign performance reports provided to advertisers, in order to infer private information about users. The proposed attacks build a case for a need to reason about data sharing and mining practices using a rigorous definition of privacy, elucidate the privacy and utility trade-offs that may arise in advertising systems that allow fine-grained targeting based on user profile and activity characteristics, and have contributed to changes in the social network's advertising system aimed at increasing the barriers to practical execution of such attacks in the future. We propose a practical algorithm for sharing a subset of user search data consisting of queries and clicks in a provably privacy-preserving manner. The algorithm protects privacy by limiting the amount of each user's data used and, non-deterministically, throwing away infrequent elements in the data, with the specific parameters of the algorithm being determined by the privacy guarantees desired. The proposed algorithm, and the insights gained from its analysis offer a systematic and practical approach towards sharing counts of user actions while satisfying a rigorous privacy definition, and can be applied to improve privacy in applications that rely on mining and sharing user search data. We then present a quantitative analysis of privacy-utility trade-offs in the social recommendations and social data sharing domains using formal models of privacy and utility. For social recommendations, we present a lower bound on the minimum loss in privacy for link-based recommendation algorithms achieving good utility. For social data sharing, we present a theoretical and experimental analysis of the relationship between visibility of connections in the social network and the difficulty for a competing service to obtain knowledge of a large fraction of connections in the network. The methods of analysis introduced and the harsh trade-offs identified can be useful for guiding privacy-conscious development of social products and algorithms, and give a refined understanding of the privacy-utility trade-offs. Few topics today arouse as much heated discussion as issues of user privacy. This thesis focuses on making practical and constructive strides towards understanding and providing tools for achieving a viable balance between two seemingly opposing needs -- user data-driven innovation and privacy.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Stanford University, Computer Science Department
|Statement of responsibility
|Submitted to the Department of Computer Science.
|Ph.D. Stanford University 2012
- © 2012 by Aleksandra Korolova
Also listed in
Loading usage metrics...