Arc-factored biaffine dependency parsing

Placeholder Show Content

Abstract/Contents

Abstract
This thesis describes a simple approach to neural arc-factored dependency parsing, building on neural machine learning techniques that have gained considerable popularity in recent years. Dependency parsing is a way of identifying the latent syntactic and semantic relationships between words in a sentence, with solid foundations in linguistic theory that I describe in some detail. In this work, I introduce new classification techniques that extend the affine softmax classifier ubiquitous in machine learning that would otherwise be inappropriate for parsing. What's more, I demonstrate that the new biaffine classification techniques can be derived mathematically from the same principles that yield the affine softmax classifier. Related works either use an alternative to the proposed biaffine classifiers---based on feedforward neural attention---or else use an entirely different parsing algorithm---known as transition-based parsing---based on constituency parsing. In this work, I find evidence that the biaffine classifiers outperform the traditional attention-based classifiers, and that the arc-factored system outperforms transition-based parsers more broadly. I also demonstrate that the hyperparameter choices are optimal or near optimal, with significant deviations either leading to overfitting or underfitting. Consequently, any modifications to the architecture that yield better accuracy are unlikely to be due to simply compensating for poor hyperparameters. The basic system can be batched to parse large documents very quickly, and achieves accuracy comparable to state-of-the-art on the most popular English benchmark. However, the original system makes a few design choices that introduce complications for other languages, namely a reliance on whole word tokens and part-of-speech tags. To solve the first limitation, I have the system construct word representations from characters, so that the model can learn how morphology expressed through orthography reflects syntactic structure. To solve the second, I minimally adapt the architecture of the parser so it can be trained as a sequence labeler. A tagger that directly uses insights gleaned from the parser can be trained on any dependency treebank with gold part-of-speech tags. This approach achieved the highest performance at tagging and parsing on the 2017 CoNLL shared task on dependency parsing, inspiring most of the top-performing systems of the 2018 shared task. I also extend the system for multitask tagging, such that morphological features and language-specific part-of-speech tags are conditioned on the predicted coarse-grained universal tag. Finally, I modify the edge classifier to condition predictions directly on the relative location of words, so the system can more effectively leverage linearization and distance. Both of these make statistically significant improvements to accuracy. In order to accommodate dependency formalisms that don't adhere to strict tree structures, I minimally adapt the parser once more to produce arbitrary dependency graphs instead of dependency trees. I again ablate the system to explore how important the different hyperparameters and components of the system are, finding that while most of them do make a statistically significant difference, in general the differences are very small and the system is very robust. The work in this thesis not only contributes narrowly to the field of dependency parsing, but also more broadly provides tools for tasks with more complex dependencies than sequence labeling or classification.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2019; ©2019
Publication date 2019; 2019
Issuance monographic
Language English

Creators/Contributors

Author Dozat, Timothy Allen
Degree supervisor Manning, Christopher D
Thesis advisor Manning, Christopher D
Thesis advisor Jurafsky, Dan, 1962-
Thesis advisor Kay, Martin
Degree committee member Jurafsky, Dan, 1962-
Degree committee member Kay, Martin
Associated with Stanford University, Department of Linguistics.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Timothy Dozat.
Note Submitted to the Department of Linguistics.
Thesis Thesis Ph.D. Stanford University 2019.
Location electronic resource

Access conditions

Copyright
© 2019 by Timothy Allen Dozat
License
This work is licensed under a Creative Commons Attribution Share Alike 3.0 Unported license (CC BY-SA).

Also listed in

Loading usage metrics...