Text to 3D scene generation

Placeholder Show Content

Abstract/Contents

Abstract
The ability to form a visual interpretation of the world from natural language is pivotal to human communication. Similarly, from a computational perspective, mapping descriptions of scenes to 3D geometric representations is useful in many areas such as robotics, interior design and even education. Text to 3D scene generation is a task which addresses this problem space. A user provides natural language as input and the output is a plausible 3D scene interpretation. This is a challenging domain connecting NLP and computer graphics. The few existing systems for generating 3D scenes from text are severely restricted in scope and robustness. The key challenge, and focus of this dissertation, is in incorporating prior knowledge which is essential for successfully generating 3D scenes from highly under-specified natural scene descriptions. Prior systems do not leverage such priors, requiring explicit and verbose language. This dissertation formalizes and decomposes the problem of text to 3D scene generation, and describes the implementation of a new text to scene framework that enables incorporation of priors learned from data. I propose viewing the problem as extracting a set of explicit constraints from input descriptions, combining them with learned common-sense priors for inferring implicit constraints, and then selecting objects and positioning them to satisfy the constraints and generate plausible scenes. To capture the basic semantics of a scene, I define the scene template representation which consists of the objects, their attributes, and relations between them. A given scene template, can be used to generate many matching scenes whose plausibility can be scored. I then define two subtasks: scene template parsing where templates are parsed from natural language, and scene inference where templates are expanded with additional objects and spatial constraints. From the expanded scene templates, my system grounds object references by selecting appropriate 3D models, and then computationally arranges the selected objects to satisfy spatial constraints and maximize plausibility. I then demonstrate how to extend the text to scene system to allow iterative refinement of the generated scenes using natural language commands to add, remove, replace, and manipulate objects. In building the text to scene framework presented here, I learn a set of common-sense priors using datasets of 3D models and scenes and evaluate their impact on the quality of generated 3D scenes. From the scene data, I collect several sets of priors: (1) object occurrence priors to determine what other objects should be present, (2) support and relative position priors to determine where objects are placed, and (3) attachment priors to determine how objects are attached. In addition, I collect a new dataset of 3D scenes corresponded with textual descriptions and use it to learn how to ground spatial relation language and object descriptions. I provide this dataset to the community and perform an empirical evaluation of the output of the system against manually designed scenes and simpler rule-based baselines. Using a perceptual evaluation study, I show that the system can generate high quality 3D scenes given natural language input. This initial step in connecting language with 3D geometry opens up many areas of research for bridging the gap between language, semantics and geometry.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2015
Issuance monographic
Language English

Creators/Contributors

Associated with Chang, Angel Xuan
Associated with Stanford University, Department of Computer Science.
Primary advisor Manning, Christopher D
Thesis advisor Manning, Christopher D
Thesis advisor Hanrahan, P. M. (Patrick Matthew)
Thesis advisor Liang, Percy
Advisor Hanrahan, P. M. (Patrick Matthew)
Advisor Liang, Percy

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Angel Xuan Chang.
Note Submitted to the Department of Computer Science.
Thesis Thesis (Ph.D.)--Stanford University, 2015.
Location electronic resource

Access conditions

Copyright
© 2015 by Angel Xuan Chang
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...