Correlation clustering at scale

Placeholder Show Content

Abstract/Contents

Abstract
Correlation clustering is a useful method in machine learning and data analysis that aims to partition data into groups based on the notion of pairwise similarity. Given a collection of objects and a labeling of every pair of objects as either "similar" or "dissimilar", the goal is to find a partitioning of the objects such that similar pairs tend to be clustered together while dissimilar pairs tend not to. Correlation clustering has become desirable in many applications ranging from locating boundaries in digital images to detecting communities in social networks. In many of these applications, the order of magnitude of the input significantly exceeds the resources available to the algorithm such as time and space, which has led to increasing demand for designing efficient algorithms at large scale. In this thesis, we study the problem of correlation clustering in models of efficient computation with large-scale data and propose new algorithms with much improved guarantees. We first present a (3+epsilon)-approximate parallel algorithm for minimizing disagreements with labels that runs in O(1/epsilon) rounds. The approximation ratio almost matches a natural target of 3 for combinatorial algorithms. Then, we give a single-pass 5-approximate streaming algorithm and show that the approximation analysis is tight. Both algorithms are simple to implement and have implications in various models including massively parallel computation, streaming, distributed local model, local computation algorithms, and fully dynamic algorithms.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Ma, Weiyun
Degree supervisor Charikar, Moses
Thesis advisor Charikar, Moses
Thesis advisor Ahmadipouranari, Nima
Thesis advisor Tan, Li-Yang
Degree committee member Ahmadipouranari, Nima
Degree committee member Tan, Li-Yang
Associated with Stanford University, School of Engineering
Associated with Stanford University, Computer Science Department

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Weiyun Ma.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/cc668dr3145

Access conditions

Copyright
© 2023 by Weiyun Ma
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...