Multimodal, Self-Supervised Deep Learning-Based Estimation of Symptoms and Severity of Depression and Anxiety

Placeholder Show Content

Abstract/Contents

Abstract
Depression and anxiety are common mental health disorders, affecting 264 million and 284 million people worldwide, respectively. Although effective treatments for depression exist, healthcare providers in America diagnose only 47% of patients. Consequently, barriers to effective care include the lack of accurate, feasible screening and diagnosis tools. Currently, screening and diagnosis methods employ self-reported patient surveys, which are highly subjective, along with in-person clinical interviews to assess symptom severity, which are resource-intensive due to shortage of clinicians. In this work, we collect data from 110 patients before their regularly scheduled visit, either in-person or virtual, to the Stanford Family Medicine clinic. We collect ground truth PHQ-9 and GAD-7 scores, which are clinically validated instruments to screen for depression and anxiety, and record video, audio, and text transcriptions of patient responses to 11 questions. We train unimodal deep learning models that use audio, video, or text transcriptions of the conversations to detect the presence (binary classification) and severity (four-class classification) of depression and anxiety. We also train multimodal models that use all three modalities. Multimodal deep learning models detect the presence and severity of depression and anxiety with high balanced accuracy: 93.8% and 60.4% for depression, and 100% and 62.5% for anxiety. Additionally, we utilize a self-supervised, multi-modal pre-training scheme that jointly pre-trains audio, video, and text autoencoders to converge representations across the three modalities, followed by decoders that classify the patient into score buckets. Finally, we investigate model interpretability using saliency mapping to uncover specific behavioral markers, such as facial areas and emotionally valent words, that our models utilize in making estimations of depression or anxiety. Saliency mapping also illuminates specific questions out of the 11 total questions that provide the maximum signal to our text-based models. These results have both high-level technical and clinical significance. Technically, we suggest ways of conducting optimal representation learning in small data regimes by leveraging semantic alignment in data across several modalities. Clinically, we show the potential of an objective, automated tool for depression and anxiety screening that takes an average of three minutes per person to administer during casual conversations in clinical settings. Such a tool could be easily incorporated into current clinical workflows and facilitate low-cost, universal access to mental health care.

Description

Type of resource text
Publication date November 23, 2022; June 2022

Creators/Contributors

Author Srivathsa, Neha
Advisor Adeli, Ehsan
Advisor Li, Fei-Fei
Degree granting institution Stanford University
Department Department of Computer Science

Subjects

Subject Machine learning
Subject Deep learning (Machine learning)
Subject Depression, Mental > Diagnosis
Subject Anxiety disorders
Genre Text
Genre Thesis

Bibliographic information

Access conditions

Use and reproduction
User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
License
This work is licensed under a Creative Commons Attribution Non Commercial 4.0 International license (CC BY-NC).

Preferred citation

Preferred citation
Srivathsa, N., Stanford University, and Department of Computer Science (2022). Multimodal, Self-Supervised Deep Learning-Based Estimation of Symptoms and Severity of Depression and Anxiety. Stanford Digital Repository. Available at https://purl.stanford.edu/vp366kq1885. https://doi.org/10.25740/vp366kq1885.

Collection

Undergraduate Theses, School of Engineering

View other items in this collection in SearchWorks

Contact information

Also listed in

Loading usage metrics...