Machine learning and social science for fair and private data products

Placeholder Show Content

Abstract/Contents

Abstract
Technological innovations are fundamentally transforming information flow and social behavior. Digital platforms are increasingly embedded in established social systems highlighting the importance of developing fair and private machine learning approaches to managing these platforms. As social networks and social media data are increasingly leveraged for business and policy decisions, this thesis discusses work to develop methods for analyzing data privacy in attribute-rich network data, and to promote fair and equitable digital systems. Chapter 2 discusses work on privacy in digital systems. Privacy attacks in networked systems commonly leverage homophilous interactions, or what's commonly referred to as "birds of a feather flock together". While homophily describes a bias in attribute preferences for similar others, it gives limited attention to variability. We observe that attribute preferences can exhibit variation beyond what can be explained by models of homophily, and observe that this excess variation can induce a similarity among friends-of-friends on a network without requiring any similarity among friends. These findings offer an alternative perspective on network structure and attributes in general and prediction in particular, complicating the already difficult task of protecting privacy on social networks. Then, Chapter 3 examines node attribute prediction tasks more generally and provides a framework for distinguishing different types of prediction tasks as within-network, across-network, or across-layer tasks. This work highlights that methods aimed at across-network tasks are in fact evaluated on across-layer problems and have limited performance on across-network problems. Chapter 4 assesses novel uses of social media data to target regulatory enforcement efforts. This work provides a more cautionary perspective on the leading case that Yelp reviews are useful to target food safety inspections. This work highlights that prior results are sensitive to ``extreme imbalanced sampling'': extreme because the dataset was restricted from roughly 13k inspections to a sample of only 612 inspections with only extremely high or low inspection scores, and imbalanced by not accounting for class imbalance in the population. Finally, Chapter 5 studies the challenge of class imbalance more generally, comparing more standard approaches like Synthetic Minority Oversampling Technique to more modern approaches like Weighted Random Forest and Balanced Random Forest. Taken together, this thesis highlights the importance of understanding how new technologies intersect with existing consumer behavior and demonstrate the unique role of machine learning and social science research in the design and management of data products. As technology continues to promote instantaneous connectivity, there is a pressing need for analytical tools that facilitate secure and equal-opportunity platforms

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2019; ©2019
Publication date 2019; 2019
Issuance monographic
Language English

Creators/Contributors

Author Altenburger, Kristen Marie
Degree supervisor Ugander, Johan
Thesis advisor Ugander, Johan
Thesis advisor Ho, Daniel
Thesis advisor Johari, Ramesh, 1976-
Degree committee member Ho, Daniel
Degree committee member Johari, Ramesh, 1976-
Associated with Stanford University, Department of Management Science and Engineering.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Kristen M. Altenburger
Note Submitted to the Department of Management Science and Engineering
Thesis Thesis Ph.D. Stanford University 2019
Location electronic resource

Access conditions

Copyright
© 2019 by Kristen Marie Altenburger
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...