Bilingual and cross-lingual learning of sequence models with bitext

Placeholder Show Content

Abstract/Contents

Abstract
Information extraction technologies such as detecting the names of people and places in natural language texts are becoming ever more prevalent, as the amount of unstructured text data grows exponentially. Tremendous progress has been made in the past decade in learning supervised sequence models for such tasks, and current state-of-the-art results are in the lower 90s in term of F1 score, for resource-rich languages like English and widely-studied datasets such as the CoNLL newswire corpus. However, the performance of existing supervised methods lag by a significant margin when evaluated for non-English languages, new datasets, and domains other than newswire. Furthermore, for resource-poor languages where there is often little or no annotated training data, neither supervised nor existing unsupervised methods tend to work well. This thesis describes a series of models and experiments in response to these challenges in three specific areas. Firstly, we address the problem of balancing between feature weight undertraining and overtraining in learning log-linear models. We explore the use of two novel regularization techniques--a mixed L2L1 norm in a product of experts ensemble, and an adaptive regularization with feature noising--to show that they can be very effective in improving system performance. Secondly, we challenge the conventional wisdom of employing a linear architecture and sparse discrete feature representation for sequence labeling tasks, and closely examine the connection and tradeoff between a linear versus nonlinear architecture, as well as a discrete versus continuous feature representation. We show that a nonlinear architecture enjoys a significant advantage over linear architecture when used with continuous feature vectors, but does not seem to offer benefits over traditional sparse features. Lastly, we explore methods that leverage readily available unlabeled parallel text from translation as a rich source of constraints for learning bilingual models that transfer knowledge from English to resource-poor languages. We formalize the model as loopy Markov Random Fields, and propose a suite of approximate inference methods for decoding. Evaluated on standard test sets for five non-English languages, our semi-supervised models yield significant improvements over the state-of-the-art results for all five languages. We further propose a cross-lingual projection method that is capable of learning sequence models for languages where there are no annotated resources at all. Our method projects model posteriors from English to the foreign side over word alignments on bitext, and handles missing and noisy labels via expectation regularization. Learned with no annotated data at all, our model attains the same accuracy as supervised models trained with thousands of labeled examples.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2014
Issuance monographic
Language English

Creators/Contributors

Associated with Wang, Mengqiu
Associated with Stanford University, Department of Computer Science.
Primary advisor Manning, Christopher D
Thesis advisor Manning, Christopher D
Thesis advisor Jurafsky, Dan, 1962-
Thesis advisor Liang, Percy
Advisor Jurafsky, Dan, 1962-
Advisor Liang, Percy

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Mengqiu Wang.
Note Submitted to the Department of Computer Science.
Thesis Ph.D. Stanford University 2014
Location electronic resource

Access conditions

Copyright
© 2014 by Mengqiu Wang
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...