The pragmatics of image description generation

Placeholder Show Content

Abstract/Contents

Abstract
Finding true words that describe an image seems easy, but it is finding the right words for an image that poses a communicative challenge. Describing an image to make it accessible for someone who can't see it additionally carries a lot of social responsibility when selecting which information to put into words, since those words fundamentally determine what our interlocutor will learn and what they might miss. With recent advancements in computational vision-language models comes an opportunity for building systems that can describe the visual world (e.g., images) for us -- and that instantly. This bears special promise for advancing nonvisual accessibility for people who are blind or have low vision, as their access to visual content online is often restricted to rarely present alt text descriptions. In this dissertation, I argue that the development of vision-language systems that are useful for social needs such as accessibility needs to be centered around pragmatic communicative principles. This includes the datasets we construct, the way we design and train our models, and how we evaluate them. The opportunities are vast, but now is the time where we can start to center computational modeling progress around the intricacies and complexities that make our communication so rich and effective.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Kreiss, Elisa
Degree supervisor Potts, Christopher
Thesis advisor Potts, Christopher
Thesis advisor Goodman, Noah
Thesis advisor Jurafsky, Dan
Degree committee member Goodman, Noah
Degree committee member Jurafsky, Dan
Associated with Stanford University, School of Humanities and Sciences
Associated with Stanford University, Department of Linguistics

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Elisa Kreiss.
Note Submitted to the Department of Linguistics.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/th574rc1780

Access conditions

Copyright
© 2023 by Elisa Kreiss
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...