Designing syntactic representations for NLP : an empirical investigation
Abstract/Contents
- Abstract
- This dissertation is a study on the use of linguistic structure in Natural Language Processing (NLP) applications. Specifically, it investigates how different ways of packaging syntactic information have consequences for goals such as representing linguistic properties, training statistical parsers, and sourcing features for information extraction. The focus of these investigations is the design of Universal Dependencies (UD), a multilingual syntactic representation for NLP. Chapter 2 discusses the theoretical foundations of UD and its relations to other frameworks for the study of syntax. This discussion shows specific design decisions that characterize UD, and the principles motivating those decisions. The rationale for headedness criteria and type distinctions in UD is introduced there. Chapter 3 studies how choices of headedness in dependency representations have consequences for parsing and crosslinguistic parallelism. UD strongly prefers lexical heads in dependency trees, and this chapter presents quantitative results supporting this preference for its impact on parallelism. However, that design can be suboptimal for parsing, and in some languages parsing accuracy can be improved by using a parser-internal representation that favors function words as heads. Chapter 4 presents the first detailed linguistic analysis of UD-represented data, taking four Romance languages for a case study. UD's conciseness and orientation to surface syntax allows for a simple and straightforward analysis of Romance SE constructions, which are very difficult to unify in generative syntax. On the other hand, complex predicates require us to choose between representing syntactic or semantic properties. The Romance case also shows why maximizing the crosslinguistic uniformity of the distinction between function and content words requires a small amount of semantic information, in addition to syntactic cues. Chapter 5 investigates the actual usage of UD in a pipeline, with an extrinsic evaluation that compares UD to minimally transformed versions of it. The main takeaway is methodological: it is very difficult to obtain consistent improvements across data sets by manipulating the dependency representation. The most consistent result obtained was an improvement in performance when using a version of UD that is restructured and relabeled to have shorter predicate-argument paths. The results and analyses presented in this work show that the main (and perhaps only) reason to use a lexical-head design is to support crosslinguistic parallelism. However, that is only possible if function words are defined uniformly across languages, and doing so satisfactorily requires the use of criteria outside syntax. Moreover, the complexity of the results shows that a single design cannot necessarily serve every purpose equally well. Knowing this, one of the most useful things that designers can do is provide a discussion of the properties of their representation for users, empowering them to make transformations such as the many examples illustrated in this dissertation. A deep understanding of syntactic representations creates flexibility for users exploit their properties in the way that is most suitable for a particular task and data set. This dissertation creates such a deep understanding about UD, thereby, hopefully, enabling users to utilize it in the way that is most suitable for them.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2016 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Silveira, Natalia G |
---|---|
Associated with | Stanford University, Department of Linguistics. |
Primary advisor | Manning, Christopher D |
Thesis advisor | Manning, Christopher D |
Thesis advisor | Jurafsky, Dan, 1962- |
Thesis advisor | Potts, Christopher, 1977- |
Thesis advisor | De Marneffe, Marie-Catherine |
Advisor | Jurafsky, Dan, 1962- |
Advisor | Potts, Christopher, 1977- |
Advisor | De Marneffe, Marie-Catherine |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Natalia G. Silveira. |
---|---|
Note | Submitted to the Department of Linguistics. |
Thesis | Thesis (Ph.D.)--Stanford University, 2016. |
Location | electronic resource |
Access conditions
- Copyright
- © 2016 by Natalia Giordani Silveira
- License
- This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).
Also listed in
Loading usage metrics...