US Youths Reporting LGB Identity: 2019 State-Level Population Estimation with Complex Missingness

Placeholder Show Content

Abstract/Contents

Abstract
Lesbian, gay, and bisexual (LGB) youths are at higher risk of experiencing violence and developing mental health disorders. Accurately estimating the proportion of high school students identifying as LGB in each US state is crucial for disease rate calculations, school interventions, and resource allocation decisions. One essential measure to estimate the LGB population size is self-reported sexual identity. The Youth Risk Behavior Survey (YRBS) collects data on health related behaviors among high school students in most US states, including self-reported sexual identity. However, missing data make it challenging to estimate the state-level LGB population sizes accurately. To address this issue, we studied the complex missingness and proposed using the Heckman selection model to impute missing data in self reported sexual identity as we assumed a missing-not-at-random (MNAR) mechanism. As the Heckman selection model requires exclusion-restriction criteria, essentially including an instrumental variable in the selection equation but excluded from the outcome equation, we proposed a framework to identify valid and strong instruments, which is specifically tailored to datasets with binary outcomes and a large number of categorical independent variables. To meet the exclusion-restriction criteria, we developed 14 strategies to construct candidates for instruments and/or conduct data transformation for all independent variables. We were unable to find enough valid and strong instruments for most states using any of the strategies. We differentiated strategies based on the number of states that had valid and strong instrumental variables to fit the Heckman selection model and generate imputations. Even for the strategy that performed best in terms of these criteria, model performance was relatively poor as the estimates of the correlation coefficient between the error terms of the two stages in the Heckman selection model were statistically insignificant (p value being 1). We discussed limitations with respect to both data and methods and suggested more future work is needed. Overall, accurately estimating the proportion of LGB adolescents in each state remains a critical challenge.

Description

Type of resource text
Publication date June 5, 2023; June 2, 2023

Creators/Contributors

Author Zhao, Jiayi
Degree granting institution Stanford University
Department Department of Health Policy
Funder CDC DASH
Thesis advisor Rose, Sherri
Research team head Salomon, Joshua A.
Researcher Jahagirdar, Deepa

Subjects

Subject Imputation
Subject MNAR
Subject Heckman
Subject LGB
Subject YRBS
Genre Text
Genre Thesis

Bibliographic information

Access conditions

Use and reproduction
User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
License
This work is licensed under a Creative Commons Attribution 4.0 International license (CC BY).

Preferred citation

Preferred citation
Zhao, J. (2023). US Youths Reporting LGB Identity: 2019 State-Level Population Estimation with Complex Missingness. Stanford Digital Repository. Available at https://purl.stanford.edu/jg930km8025. https://doi.org/10.25740/jg930km8025.

Collection

Contact information

Loading usage metrics...