Reproducible Aggregation of Sample-Split Statistics

Placeholder Show Content

Abstract/Contents

Abstract
Statistical inference is often simplified by sample-splitting. This simplification comes at the cost of the introduction of randomness that is not native to the data. We propose a simple procedure for sequentially aggregating statistics constructed with multiple splits of the same sample. The user specifies a bound and a nominal error rate. If the procedure is implemented twice on the same data, the nominal error rate approximates the chance that the results differ by more than the bound. We provide a non-asymptotic analysis of the accuracy of the nominal error rate and illustrate the application of the procedure to several widely applied statistical methods.

Description

Type of resource text
Publication date November 30, 2023

Creators/Contributors

Author Ritzwoller, D.M.
Author Romano, J.P.

Subjects

Subject sample-splitting
Subject cross-validation
Subject replicability
Subject exchangeable pairs
Subject stability
Genre Text
Genre Technical report

Bibliographic information

Access conditions

Use and reproduction
User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
License
This work is licensed under a Creative Commons Attribution Non Commercial No Derivatives 4.0 International license (CC BY-NC-ND).

Preferred citation

Preferred citation
Ritzwoller, D. and Romano, J. (2023). Reproducible Aggregation of Sample-Split Statistics. Department of Statistics Technical Report, Stanford University. Available from the Stanford Digital Repository at https://purl.stanford.edu/jt589nw1637. https://doi.org/10.25740/jt589nw1637.

Collection

Statistics Department Technical Reports

Contact information

Loading usage metrics...