Bilingual and cross-lingual learning of sequence models with bitext
Abstract/Contents
- Abstract
- Information extraction technologies such as detecting the names of people and places in natural language texts are becoming ever more prevalent, as the amount of unstructured text data grows exponentially. Tremendous progress has been made in the past decade in learning supervised sequence models for such tasks, and current state-of-the-art results are in the lower 90s in term of F1 score, for resource-rich languages like English and widely-studied datasets such as the CoNLL newswire corpus. However, the performance of existing supervised methods lag by a significant margin when evaluated for non-English languages, new datasets, and domains other than newswire. Furthermore, for resource-poor languages where there is often little or no annotated training data, neither supervised nor existing unsupervised methods tend to work well. This thesis describes a series of models and experiments in response to these challenges in three specific areas. Firstly, we address the problem of balancing between feature weight undertraining and overtraining in learning log-linear models. We explore the use of two novel regularization techniques--a mixed L2L1 norm in a product of experts ensemble, and an adaptive regularization with feature noising--to show that they can be very effective in improving system performance. Secondly, we challenge the conventional wisdom of employing a linear architecture and sparse discrete feature representation for sequence labeling tasks, and closely examine the connection and tradeoff between a linear versus nonlinear architecture, as well as a discrete versus continuous feature representation. We show that a nonlinear architecture enjoys a significant advantage over linear architecture when used with continuous feature vectors, but does not seem to offer benefits over traditional sparse features. Lastly, we explore methods that leverage readily available unlabeled parallel text from translation as a rich source of constraints for learning bilingual models that transfer knowledge from English to resource-poor languages. We formalize the model as loopy Markov Random Fields, and propose a suite of approximate inference methods for decoding. Evaluated on standard test sets for five non-English languages, our semi-supervised models yield significant improvements over the state-of-the-art results for all five languages. We further propose a cross-lingual projection method that is capable of learning sequence models for languages where there are no annotated resources at all. Our method projects model posteriors from English to the foreign side over word alignments on bitext, and handles missing and noisy labels via expectation regularization. Learned with no annotated data at all, our model attains the same accuracy as supervised models trained with thousands of labeled examples.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2014 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Wang, Mengqiu | |
---|---|---|
Associated with | Stanford University, Department of Computer Science. | |
Primary advisor | Manning, Christopher D | |
Thesis advisor | Manning, Christopher D | |
Thesis advisor | Jurafsky, Dan, 1962- | |
Thesis advisor | Liang, Percy | |
Advisor | Jurafsky, Dan, 1962- | |
Advisor | Liang, Percy |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Mengqiu Wang. |
---|---|
Note | Submitted to the Department of Computer Science. |
Thesis | Ph.D. Stanford University 2014 |
Location | electronic resource |
Access conditions
- Copyright
- © 2014 by Mengqiu Wang
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...