Randomized algorithms for large-scale strongly over-determined linear regression problems

Placeholder Show Content

Abstract/Contents

Abstract
In the era of big data, distributed systems built on top of clusters of commodity hardware provide cheap and reliable storage and scalable data processing. With cheap storage, instead of storing only currently relevant data, most people choose to store data as much as possible, expecting that its value can be extracted later. In this way, exabytes (10^18) of data are being created on a daily basis. However, extracting value from big data requires scalable implementation of advanced analytical algorithms beyond simple data processing, e.g., regression analysis and optimization. Many traditional methods are designed to minimize floating-point operations, which is the dominant cost of in-memory computation on a single machine. In a distributed environment, load balancing and communication including disk and network I/O can easily dominate computation. These factors greatly increase the complexity and challenge the way of thinking in the design of distributed algorithms. Randomized methods for big data analysis have received a great deal of attention in recent years because they are generally faster and simpler to implement than traditional methods, it is easier to distribute the work, and they sometimes have theoretically provable performance. In this work, we are most interested in random projection and random sampling algorithms for l_2 regression and its robust alternative, l_1 regression, with strongly rectangular data. Random projection and random sampling are used to create preconditioned systems that are easier to solve or sub-sampled problems that provide relative-error approximations. Our main result shows that in near input-sparsity time and only a few passes through the data we can obtain a good approximate solution, with high probability. Our theory holds for general p in [1, 2], and thus we formulate our results in l_p.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2014
Issuance monographic
Language English

Creators/Contributors

Associated with Meng, Xiangrui
Associated with Stanford University, Institute for Computational and Mathematical Engineering.
Primary advisor Saunders, Michael
Thesis advisor Saunders, Michael
Thesis advisor Gerritsen, Margot (Margot G.)
Thesis advisor Mahoney, Michael
Advisor Gerritsen, Margot (Margot G.)
Advisor Mahoney, Michael

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Xiangrui Meng.
Note Submitted to the Institute for Computational and Mathematical Engineering.
Thesis Thesis (Ph.D.)--Stanford University, 2014.
Location electronic resource

Access conditions

Copyright
© 2014 by Xiangrui Meng
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...