Mixed-initiative natural language translation
Abstract/Contents
- Abstract
- There are two classical applications of the automatic translation of natural language. Assimilation is translation when a gist of the meaning is sufficient, and speed and convenience are prioritized. Dissemination is translation with the intent to communicate, so there is usually a predefined quality threshold. The most common assimilation scenario is cross-lingual web browsing, where fully automatic machine translation (MT) best satisfies the speed and convenience requirements. Dissemination is the setting for professional translators, who produce translations with the intent to communicate. MT output does not yet come with quality guarantees, so it is best incorporated as an assistive technology in this setting. This dissertation proposes a mixed-initiative approach to translation for the dissemination scenario. In a mixed-initiative system, human users and intelligent machine agents collaborate to complete some task. The central question is how to design an efficient human/machine interface. By efficient we mean that human productivity should be enhanced, and the machine should be able to self-correct its model by observing human interactions. We separate human productivity into two measurable components: translation time and quality. We first compare unaided translation to post-editing, the simplest form of machine assistance. Human translators manipulate machine output to arrive at a final translation. We find that simple post-editing decisively improves translation along both coordinates, a result that motivates more advanced machine assistance. However, it is widely observed in prior work that users regard post-editing as a tedious task. The main contribution of this dissertation is therefore a more interactive mode of machine assistance that can improve both productivity and the user experience. We present Predictive Translation Memory (PTM), a new interactive, mixed-initiative translation system. The machine suggests future translations based on previous interactions. For example, if the user has typed part of a translation for a given input sentence, PTM can propose a completion. We also show how PTM can self-correct its model via incremental machine learning. A human evaluation shows that PTM helps translators produce higher quality translations than post-editing when baseline MT quality is high. This is the desired result for dissemination. The translators are slightly slower, but we observe a significant learning curve, suggesting practice may close the time gap. In addition, PTM enables better translation model adaptation than post-editing. We describe novel machine learning techniques that result in significant reductions in human Translation Edit Rate (HTER), which is an interpretable measure of human effort. Our results suggest that adaptation could amplify time and quality gains by shifting the balance of routinizable work toward the machine agent.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2014 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Green, Spence |
---|---|
Associated with | Stanford University, Computer Science Department. |
Primary advisor | Heer, Jeffrey Michael |
Primary advisor | Manning, Christopher D |
Thesis advisor | Heer, Jeffrey Michael |
Thesis advisor | Manning, Christopher D |
Thesis advisor | DeNero, John |
Thesis advisor | Jurafsky, Dan, 1962- |
Advisor | DeNero, John |
Advisor | Jurafsky, Dan, 1962- |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Spence Green. |
---|---|
Note | Submitted to the Department of Computer Science. |
Thesis | Thesis (Ph.D.)--Stanford University, 2014. |
Location | electronic resource |
Access conditions
- Copyright
- © 2014 by William Spence Green
- License
- This work is licensed under a Creative Commons Attribution Non Commercial Share Alike 3.0 Unported license (CC BY-NC-SA).
Also listed in
Loading usage metrics...