Designing syntactic representations for NLP : an empirical investigation

Silveira, Natalia G; Stanford University, Department of Linguistics.

Designing syntactic representations for NLP : an empirical investigation

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fkv949cx3011" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: This dissertation is a study on the use of linguistic structure in Natural Language Processing (NLP) applications. Specifically, it investigates how different ways of packaging syntactic information have consequences for goals such as representing linguistic properties, training statistical parsers, and sourcing features for information extraction. The focus of these investigations is the design of Universal Dependencies (UD), a multilingual syntactic representation for NLP. Chapter 2 discusses the theoretical foundations of UD and its relations to other frameworks for the study of syntax. This discussion shows specific design decisions that characterize UD, and the principles motivating those decisions. The rationale for headedness criteria and type distinctions in UD is introduced there. Chapter 3 studies how choices of headedness in dependency representations have consequences for parsing and crosslinguistic parallelism. UD strongly prefers lexical heads in dependency trees, and this chapter presents quantitative results supporting this preference for its impact on parallelism. However, that design can be suboptimal for parsing, and in some languages parsing accuracy can be improved by using a parser-internal representation that favors function words as heads. Chapter 4 presents the first detailed linguistic analysis of UD-represented data, taking four Romance languages for a case study. UD's conciseness and orientation to surface syntax allows for a simple and straightforward analysis of Romance SE constructions, which are very difficult to unify in generative syntax. On the other hand, complex predicates require us to choose between representing syntactic or semantic properties. The Romance case also shows why maximizing the crosslinguistic uniformity of the distinction between function and content words requires a small amount of semantic information, in addition to syntactic cues. Chapter 5 investigates the actual usage of UD in a pipeline, with an extrinsic evaluation that compares UD to minimally transformed versions of it. The main takeaway is methodological: it is very difficult to obtain consistent improvements across data sets by manipulating the dependency representation. The most consistent result obtained was an improvement in performance when using a version of UD that is restructured and relabeled to have shorter predicate-argument paths. The results and analyses presented in this work show that the main (and perhaps only) reason to use a lexical-head design is to support crosslinguistic parallelism. However, that is only possible if function words are defined uniformly across languages, and doing so satisfactorily requires the use of criteria outside syntax. Moreover, the complexity of the results shows that a single design cannot necessarily serve every purpose equally well. Knowing this, one of the most useful things that designers can do is provide a discussion of the properties of their representation for users, empowering them to make transformations such as the many examples illustrated in this dissertation. A deep understanding of syntactic representations creates flexibility for users exploit their properties in the way that is most suitable for a particular task and data set. This dissertation creates such a deep understanding about UD, thereby, hopefully, enabling users to utilize it in the way that is most suitable for them.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2016
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Silveira, Natalia G
Associated with	Stanford University, Department of Linguistics.
Primary advisor	Manning, Christopher D
Thesis advisor	Manning, Christopher D
Thesis advisor	Jurafsky, Dan, 1962-
Thesis advisor	Potts, Christopher, 1977-
Thesis advisor	De Marneffe, Marie-Catherine
Advisor	Jurafsky, Dan, 1962-
Advisor	Potts, Christopher, 1977-
Advisor	De Marneffe, Marie-Catherine

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Natalia G. Silveira.
Note	Submitted to the Department of Linguistics.
Thesis	Thesis (Ph.D.)--Stanford University, 2016.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

View in SearchWorks

Loading usage metrics...