Correlation clustering at scale
Abstract/Contents
- Abstract
- Correlation clustering is a useful method in machine learning and data analysis that aims to partition data into groups based on the notion of pairwise similarity. Given a collection of objects and a labeling of every pair of objects as either "similar" or "dissimilar", the goal is to find a partitioning of the objects such that similar pairs tend to be clustered together while dissimilar pairs tend not to. Correlation clustering has become desirable in many applications ranging from locating boundaries in digital images to detecting communities in social networks. In many of these applications, the order of magnitude of the input significantly exceeds the resources available to the algorithm such as time and space, which has led to increasing demand for designing efficient algorithms at large scale. In this thesis, we study the problem of correlation clustering in models of efficient computation with large-scale data and propose new algorithms with much improved guarantees. We first present a (3+epsilon)-approximate parallel algorithm for minimizing disagreements with labels that runs in O(1/epsilon) rounds. The approximation ratio almost matches a natural target of 3 for combinatorial algorithms. Then, we give a single-pass 5-approximate streaming algorithm and show that the approximation analysis is tight. Both algorithms are simple to implement and have implications in various models including massively parallel computation, streaming, distributed local model, local computation algorithms, and fully dynamic algorithms.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2023; ©2023 |
Publication date | 2023; 2023 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Ma, Weiyun |
---|---|
Degree supervisor | Charikar, Moses |
Thesis advisor | Charikar, Moses |
Thesis advisor | Ahmadipouranari, Nima |
Thesis advisor | Tan, Li-Yang |
Degree committee member | Ahmadipouranari, Nima |
Degree committee member | Tan, Li-Yang |
Associated with | Stanford University, School of Engineering |
Associated with | Stanford University, Computer Science Department |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Weiyun Ma. |
---|---|
Note | Submitted to the Computer Science Department. |
Thesis | Thesis Ph.D. Stanford University 2023. |
Location | https://purl.stanford.edu/cc668dr3145 |
Access conditions
- Copyright
- © 2023 by Weiyun Ma
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...