Data sparse algorithms and mathematical theory for large-scale machine learning problems
- This dissertation presents scalable algorithms for high-dimensional large-scale datasets in machine learning applications. The ability to generate data at the scale of millions and even billions has increased rapidly, posing computational challenges to most machine learning algorithms. I propose fast kernel-matrix-based algorithms that avoid intensive kernel matrix operations and neural-network-based algorithms that efficiently learn feature interactions. My contributions include: 1) A structured low-rank approximation method--the Block Basis Factorization (BBF)--that reduces the training time and memory for kernel methods from quadratic to linear and enjoys better accuracy than state-of-art kernel approximation algorithms. 2) Mathematical theories for the ranks of RBF kernel matrices generated from high-dimensional datasets. 3) A parallel black-box fast multipole method (FMM) software library--PBBFMM3D--that evaluates particle interactions in 3D. 4) A neural network--the Deep & Cross Network (DCN)--for web-scale data predictions that requires no exhaustive feature searching nor manual feature engineering and efficiently learns bounded-degree feature interactions combined with complex deep representations. Chapter 2 presents BBF, which accelerates kernel methods by factorizing an n by n kernel matrix into a sparse representation with O(n) nonzero entries as compared to O(n^2). By identifying the low-rank properties of certain blocks, BBF extends the domain of applicability of low-rank approximation methods to the cases where traditional low-rank approximations are inefficient. By leveraging the knowledge from numerical linear algebra and randomized algorithms, the factorization can be constructed in O(n) time complexity while being accurate and stable. Our empirical results demonstrate the stability and superiority over the state-of-art kernel approximation algorithms. Chapter 3 presents a theoretical analysis of the RBF kernel matrix rank. Our three main results are as follows. First, we study the kernel rank, which for a fixed precision grows algebraically with the data dimension (in the worst case), and where the power is related to the accuracy. Second, we derive precise error bounds for the low-rank approximation in the L_infty norm in terms of the function smoothness and the domain diameters. And third, we analyze a group pattern in the magnitude of the singular values of the RBF kernel matrix. We explain this pattern by a grouping of the expansion terms in the kernel's low-rank representation. Empirical results verify the theoretical results. Chapter 4 presents PBBFMM3D, which is a parallel implementation of the fast multipole method (FMM) for evaluating pair-wise particle interactions (matrix-vector product) in three dimensions. PBBFMM3D applies to all non-oscillatory smooth kernel functions and only requires the kernel evaluations at data points. It has O(N) complexity as opposed to O(N^2) complexity from a direct computation. We discuss several algorithmic improvements and performance optimizations, such as shared memory parallelism using OpenMP. We present convergence and scalability results, as well as applications including particle potential evaluations, which frequently occur in PDE-related simulations, and covariance matrix computations that are essential parts in parameter estimation techniques, e.g., Kriging and Kalman filtering. Chapter 5 presents DCN, which is designed for datasets with dense and sparse combined features and enables automatic and efficient feature learning. Feature engineering is the key to the success of prediction models; however, the process often requires manual feature engineering or exhaustive searching. DCN combines a deep neural network that learns complex but implicit feature interactions, with a novel cross network that is more efficient in learning certain explicit bounded-degree feature interactions. Our experimental results have demonstrated its superiority over the state-of-art algorithms on the click-through-rate prediction dataset and dense classification dataset, in terms of both model accuracy and memory usage.
|Type of resource
|electronic resource; remote; computer; online resource
|1 online resource.
|Mahoney, Michael J
|Degree committee member
|Mahoney, Michael J
|Degree committee member
|Stanford University, Institute for Computational and Mathematical Engineering.
|Statement of responsibility
|Submitted to the Institute for Computational and Mathematical Engineering.
|Thesis Ph.D. Stanford University 2018.
- © 2018 by Ruoxi Wang
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...