Entity resolution and tracking on social networks

Placeholder Show Content

Abstract/Contents

Abstract
In this thesis we study two interesting aspects of the problem of Entity Resolution (ER). The goal of ER is to identify and merge records that refer to the same underlying entity. The recent rise in adoption of social networks (Facebook, Google+, Twitter, and others) introduces new issues and twists to the traditional ER problem: crowdsourcing and limited information. We first study a hybrid human-machine approach to solving ER problems. Machine learning models can predict the probabilities of entity pairs referring to the same entity. However, machines make mistakes. Humans can help verify the equality of entity pairs, and social systems like Facebook allow users to help resolve entities on their platforms. We propose hybrid human-machine strategies with theoretical guarantees that leverage transitivity relations (e.g. a = c can be inferred given a = b and b = c). Next, we study the problem of ER with limited information. Social systems impose limits on API calls that constrain access to their full social graphs. We focus on the resolution of a single node g from one social graph G against a second social graph T. We want to find the best match for g in T, by dynamically probing T (using a public API), limited by the number of API calls that these social systems allow. We propose two ER strategies that are designed for limited information and can be adapted to different API limits. Finally, we study the problem of updating social graph snapshots when one has limited information. Effective social network ER requires up-to-date snapshots. Limited by the number of API calls that social systems allow, we seek to efficiently update a snapshot. We want to avoid re-crawling all of the nodes and minimize the number of API calls. We propose novel snapshot update strategies that are designed for limited information and can be adapted to different levels of staleness.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2016
Issuance monographic
Language English

Creators/Contributors

Associated with Vesdapunt, Norases
Associated with Stanford University, Department of Computer Science.
Primary advisor Garcia-Molina, Hector
Thesis advisor Garcia-Molina, Hector
Thesis advisor Paepcke, Andreas
Thesis advisor Ré, Christopher
Advisor Paepcke, Andreas
Advisor Ré, Christopher

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Norases Vesdapunt.
Note Submitted to the Department of Computer Science.
Thesis Thesis (Ph.D.)--Stanford University, 2016.
Location electronic resource

Access conditions

Copyright
© 2016 by Norases Vesdapunt
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...