Code Supplement for "Multidimensional Scaling of Noisy High Dimensional Data"
Abstract/Contents
- Abstract
- Multidimensional Scaling (MDS) is a classical technique for embedding data in low dimensions, still in widespread use today. Originally introduced in the 1950's, MDS was not designed with high-dimensional data in mind; while it remains popular with data analysis practitioners, no doubt it should be adapted to the high-dimensional data regime. In this paper we study MDS under modern setting, and specifically, high dimensions and ambient measurement noise. We show that, as the ambient noise level increase, MDS suffers a sharp breakdown that depends on the data dimension and noise level, and derive an explicit formula for this breakdown point in the case of white noise. We then introduce MDS+, an extremely simple variant of MDS, which applies a carefully derived shrinkage nonlinearity to the eigenvalues of the MDS similarity matrix. Under a loss function measuring the embedding quality, MDS+ is the unique asymptotically optimal shrinkage function. We prove that MDS+ offers improved embedding, sometimes significantly so, compared with classical MDS. Furthermore, MDS+ does not require external estimates of the embedding dimension (a famous difficulty in classical MDS), as it calculates the optimal dimension into which the data should be embedded.
Description
Type of resource | software, multimedia |
---|---|
Date created | January 2018 |
Creators/Contributors
Author | Peterfreund, Erez | |
---|---|---|
Author | Gavish, Matan |
Subjects
Subject | MDS |
---|---|
Subject | Multidimensional Scaling |
Subject | optimal shrinkage |
Subject | optimal threshold |
Subject | low-dimensional embedding |
Subject | noisy data |
Bibliographic information
Related Publication | http://arxiv.org/abs/1801.10229 |
---|---|
Related item |
|
Location | https://purl.stanford.edu/kh576pt3021 |
Access conditions
- Use and reproduction
- User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
- License
- This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).
Preferred citation
- Preferred Citation
- Peterfreund, Erez, Code Supplement for "Multidimensional Scaling of Noisy High Dimensional Data", https://purl.stanford.edu/kh576pt3021
Collection
Stanford Research Data
View other items in this collection in SearchWorksContact information
- Contact
- gavish@stanford.edu
Also listed in
Loading usage metrics...