Visualizing and modeling joint behavior of categorical variables with a large number of levels

Placeholder Show Content

Abstract/Contents

Abstract
Large social, internet, or biological networks frequently have a long tailed degree distribution. Often there are two long tail phenomena. For bipartite networks two different entity types may each have a power law degree distribution, and for directed networks both in-degree and out-degree may have power law distributions. We present a graphical display to visualize the affinities. The display also exposes some interpretable anomalies in network data. Our graphic is based on ordering the entities by size. We show in a Zipf--Poisson model that the largest entities accounting for the majority of data are correctly ordered with probability near one. A saturation model produces head to tail affinities similar to those seen in the data, though a bipartite preferential attachment model gives a better fit to the marginal distributions. Sharp bounds are obtained on the behavior of the margins in both of these cases. Extensions of these models are also considered. As our graphical display has a close connection to copulas, we explore the utility of parametric and nonparametric copula models for our data and we present two new ways of generating smooth valid copula-density estimates. We also develop a new class of discrete-choice models in which the choice set grows with time. By exploiting the structure that arises when the covariate vector takes a special form, we are able to fit a rich model via maximum-likelihood to a data set with over fifty million observations. Interesting issues arise when attempting to simulate from the fitted model, which leads to a proposal for how to sample from discrete distributions with a very large number of possible outcomes and with probabilities that evolve over time.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Copyright date 2011
Publication date 2010, c2011; 2010
Issuance monographic
Language English

Creators/Contributors

Associated with Dyer, Justin S
Associated with Stanford University, Department of Statistics
Primary advisor Owen, Art B
Thesis advisor Owen, Art B
Thesis advisor Cover, T. M, 1938-2012
Thesis advisor Walther, Guenther
Advisor Cover, T. M, 1938-2012
Advisor Walther, Guenther

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Justin S Dyer.
Note Submitted to the Department of Statistics.
Thesis Thesis (Ph.D.)--Stanford University, 2011.
Location electronic resource

Access conditions

Copyright
© 2011 by Justin S Dyer
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...