Language Technologies Institute
Machine Learning Department (affiliated faculty)
School of Computer Science
Carnegie Mellon University
Keynote speech: Better Translation by Modeling Translation Models
Learning to translate from parallel data involves two related, but distinct, problems: (i) acquiring an inventory of translation rules—usually phrase pairs or synchronous grammar rules—and (ii) setting parameters of a model that combines these to produce new translations. Tremendous progress has been made on the second problem: translation models are trained using state-of-the-art machine learning algorithms, from neural networks to models with millions of hand-engineered features. However, while better alignment and grammar induction algorithms remain a staple MT conferences, the translation rule inventories we learn seldom generalize beyond lexical n-grams observed in the parallel training corpus. This is a serious limitation, particularly when translating into morphologically richer languages than the source and when the two languages lexicalize different kinds of information.
In this talk, I propose a two-step approach to translation: (i) modeling the inventory of translations rules that could apply to a particular sentence prior to constructing translations, and then (ii) using these rules to translate the sentence. This approach has several advantages. Perhaps most importantly, the strong baseline models we have developed through years of experimentation are augmented, rather than replaced. As a result, that models targeting specific phenomena (e.g., named entities or words whose translation have a great deal of morphological complexity) can be developed without harming baseline performance. Second, the first-pass model(s) can be trained using algorithmically convenient objectives (e.g., likelihood) and on large numbers of noisy translations, without sacrificing the advantages of tuning the full system to maximize BLEU on a high-quality development set. Finally, context-sensitive models of lexical translation are straightforward to incorporate. Several applications of this model on a variety of translation problems will be presented.
This is joint work with Victor Chahuneau, Eva Schlinger, Yulia Tsvetkov, Noah Smith, and Austin Matthews.
Dr. Chris Dyer is an Assistant Professor of Language Technologies in the School of Computer Science (SCS) at Carnegie Mellon University. He also holds an affiliated appointment in the Machine Learning Department at the same institution. Dr. Dyer received his Ph.D. in Linguistics from the University of Maryland at College Park in December 2010, where his research with Prof. Philip Resnik developed tractable statistical models of machine translation that are robust to errors in automatic linguistic analysis components by simultaneously considering billions of multiple analyses during translation, and deferring analysis of uncertain inputs until the complete translation pipeline completes. Tractability relied on using automata theoretic insights that previously had their primary application in compiler design. The techniques he developed have been widely adopted in other research labs and in commercial translation applications. The translation and learning software developed during Dr. Dyer’s thesis work is publicly available and has been used in courses in natural language processing and machine translation at a several institutions, in three Ph.D. dissertations, and in numerous publications by other authors. In addition to numerous scientific articles and one patent, Dr. Dyer co-authored a book with Dr. Jimmy Lin, Data Intensive Text-Processing with MapReduce, published by Morgan & Claypool. Since its publication, the book has been widely used in courses around the world. Following his graduate work, Dr. Dyer was a post-doctorial associate at Carnegie Mellon with Dr. Noah Smith. Their work developed probabilistic models of natural language that incorporate structured prior linguistic knowledge to achieve better predictive performance than uninformative priors alone produced, in particular in low-resource languages.
Nara Institute of Science and Technology
Speech: Deep Learning for Natural Language Processing and Machine Translation
Deep Learning is a family of methods that exploits using deep architectures to learn high-level feature representations from data. Recently, these methods have helped researchers achieve impressive results in fields such as speech recognition and computer vision. The first half of the tutorial will provide a review of Deep Learning, covering multi-layer perceptrons, deep belief nets, and stacked auto-encoders. The second half will overview recent Deep Learning applications in Natural Language Processing and Machine Translation. The goal is to establish a foundational understanding at a level sufficient to start research in this exciting and growing area.
Kevin Duh is an assistant professor at the Nara Institute of Science and Technology (NAIST), Graduate School of Information Science. He received his B.S. in 2003 from Rice University, and PhD in 2009 from the University of Washington, both in Electrical Engineering. Prior to joining NAIST, he worked at the NTT Communication Science Laboratories (2009-2012). His research interests are at the intersection of Natural Language Processing and Machine Learning, with particular focus on statistical machine translation, structured prediction, semi-supervised learning, and deep learning. Website: http://cl.naist.jp/~kevinduh
Department of Computer Science and Technology
Speech: Writing Tips for Machine Translation Papers
Writing is a critical research skill for graduate students and researchers. This talk introduces writing guidelines and tips tailored to machine translation. At a high level, I will introduce the goal of writing, evaluation criteria, paper organization, and important principles. At a low level, this talk covers the writing tips for sentences, paragraphs, and sections, organizing the underlying logic, the use of examples, visualization, common pitfalls, and advice on writing MT papers. The talk closes with analyses of excellent ACL and EMNLP papers.
Yang Liu is an Associate Professor in the Department of Computer Science and Technology, Tsinghua University. He received his PhD degree from Institute of Computing Technology, Chinese Academy of Sciences in 2007. His research focuses on natural language processing and machine translation. He has published about 30 papers on leading NLP/AI journals and conferences such as Computational Linguistics, ACL, AAAI, EMNLP, and COLING. He won the COLING/ACL 2006 Meritorious Asian NLP Paper Award and the Second Prize of Beijing Science and Technology Award. He serves as ACL 2014 Tutorial Co-Chair, ACL 2015 Local Arrangement Co-Chair, SIGHAN Information Officer, and the General Secretary of Computational Linguistics Technical Committee of Chinese Information Processing Society. Homepage: http://nlp.csai.tsinghua.edu.cn/~ly/.
Center for Machine Intelligence
Institute for Interdisciplinary Information Sciences
Speech: Ensemble Learning for Machine Translation
Ensemble methods are machine learning algorithms that construct a set of classifiers and combine their results in a way that outperforms each of them. Ensemble learning is a very exciting area where pragmatical and rich theory provides accuracy guarantees for predictors, and at the same time these techniques are the off-the-shelf choice for significant improvements in the machine translation quality. We will discuss the bridge between applications and theory on the ensemble learning applied in contemporary machine translation with a focus on two issues: (1) How to produce different classifiers through approaches such as corpus mining (via IR, pivot language and comparable corpus), bagging and clustering; (2) How to combine the results of single classifiers on the level of models and systems, for example synchronous model training, mixture phrase and language model, as well as typical system combination methods.
I am an assistant professor in the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University. I joined IIIS in 2012, and before that I was a project leader and senior researcher in the language technology group at DFKI, Germany. As a graduate student I was supervised by Hermann Ney at RWTH Aachen with occasional (each several-months long) and very fruitful visits to the speech group in IBM Watson and the NLP group in MSR Redmond. My current research interests are in Machine Learning with a focus towards highly competitive machine translation systems. Lately, I have developed interest in techniques related to ensemble learning. I am a regular of mainstream venues in Computational Linguistics such as ACL, COLING and EMNLP. In the past I led the teams which won first place in WMT(EN-DE in BLEU, 2010), TC-Star (CN-EN 2005-2007), and I was a key member of the teams which won the 1st (MSR as an Intern, 2008) and 2nd place (Aachen, 2005) in the NIST. I'm leading a similar group here in Tsinghua.
Research Assistant Professor
Natural Language Processing Group
National Laboratory of Pattern Recognition
Institute of Automation Chinese Academy of Sciences
Speech: Machine Translation by Minimizing the Semantic Gap in the Vector Embedding Space
Measuring the quality of the translation rules and their composition is an essential issue in the conventional statistical machine translation (SMT) framework. To express the translation quality, the previous lexical and phrasal probabilities are calculated only according to the co-occurrence statistics in the bilingual corpus, and may be not reliable due to the data sparseness problem. To address this issue, we propose to measure the quality of the translation rules and their composition in the semantic Vector Embedding Space (VES). We present a Recursive Neural Network (RNN) based translation framework, which includes two sub-models. One is the bilingually-constrained recursive auto-encoder, which is proposed to convert the lexical translation rules into compact real-valued vectors in the semantic VES. The other is a type-dependent recursive neural network, which is proposed to perform the decoding process by minimizing the semantic gap (meaning distance) between the source language string and its translation candidates at each state in a bottom-up structure. The RNN-based translation model is trained using a max-margin objective function that maximizes the margin between the reference translation and the nbest translations in forced decoding. In the experiments, we first show that the proposed vector representations for the translation rules are very reliable to be applied in translation modelling. We further show that the proposed type-dependent RNN-based model can significantly improve the translation quality in the large-scale end-to-end Chinese-to-English translation evaluation.
Research Assistant Professor. He received the BS degree in Computer Science from Jilin University in 2006, and the Ph.D degree in Pattern Recognition and Intelligent Systems from NLPR, Institute of Automation, Chinese Academy of Sciences, in June, 2011. From July 2011, he joined NLPR and now works as a research assistant professor. His research interests include natural language processing, statistical machine translation, and statistical learning. He has published more than 20 research papers, including international journal papers IEEE TASLP, TACL, Language Resources and Evaluation (LRE), Journal of Computer Science and Technology (JCST), and top conference papers AAAI, ACL, EMNLP, COLING. He has won the best paper award in PACLIC23, NLPCC2012 and best poster award at Young Scholar Symposium on Natural Language Processing (YSSNLP2013).
Harbin Institute of Technology
Speech: Soft Dependency Matching for Hierarchical Phrase-based Machine Translation
In this talk, I would like to present a soft dependency matching model for hierarchical phrase-based (HPB) machine translation. When a HPB rule is extracted, we enrich it with dependency knowledge automatically learnt from the training data. The dependency knowledge not only encodes the dependency relations between the components inside the rule, but also contains the dependency relations between the rule and its context. When a rule is applied to translate a sentence, the dependency knowledge is used to compute the syntactic structural consistency of the rule against the dependency tree of the sentence. We characterize the structure consistency by three features and integrate them into the standard SMT log-linear model to guide the translation process. Our method is evaluated on multiple Chinese-to-English machine translation test sets. The experimental results show that our soft matching model achieves 0.7-1.4 BLEU points improvements over a strong baseline system.
Hailong Cao received his PhD from Harbin Institute of Technology (HIT) on 2006 and worked as a postdoc researcher in NICT Japan for a machine translation project. Then he returned School of Computer Science and Technology, HIT to be a lecturer in the Machine Intelligence and Translation Lab (MITLAB). Now he is focusing on the teaching and research about natural language processing (NLP). He is interested in statistical machine translation and syntactic parsing and other areas as well. His papers appeared on ACL, COLING and EMNLP etc.
College of Information Science and Engineering
Speech: Attempts to Effectively Use Source-Language Structural Information for MT
This talk presents some work on the effective use of source-language structural information for machine translation, in particular: 1) a simple and effective approach to augmenting hierarchical phrase-based models with syntax-based translation rules; 2) and a skeleton-based model for MT. Experimental results show that these methods can improve different types of SMT systems significantly. This work was published in COLING'14 and ACL'14.
Xiao Tong obtained his Ph.D. degree from Northeastern University (NEU) and is now a lecturer at the College of information science and engineering, NEU. By now, he has visited several institutes for MT research, including Fuji Xerox (Japan), MSAR (China) and University of Cambridge (UK). Dr. Xiao is a co-PI of the NiuTrans open-source project and led the development of several good SMT systems for CWMT and NTCIR translation tasks. His current research interests are syntactic and semantic MT and skeleton-based models. He has published papers in several top conferences and journals (ACL, AI and etc.)
Institute of Computing Technology
Chinese Academy of Sciences
Speech: Automatic Adaptation of Annotations
Manually annotated corpora are indispensable resources, yet for many annotation tasks such as the creation of treebanks, there exist multiple corpora with different and incompatible annotation guidelines. This leads to an inefficient use of human expertise, but it could be remedied by integrating knowledge across corpora with different annotation guidelines. In this talk we describe the problem of annotation adaptation and the intrinsic principles of the solutions, and present a series of successively enhanced models that can automatically adapt the divergence between different annotation formats. We evaluate our algorithms on the tasks of Chinese word segmentation and dependency parsing. In both experiments, automatic annotation adaptation brings significant improvement, achieving state-of-the-art performance despite the use of purely local features in training.
Dr. Wenbin Jiang is an assistant researcher in Institute of Computing Technology, Chinese Academy of Sciences. He researches in natural language processing, especially in lexical analysis, syntactic analysis, and machine translation. In recent years, he has publishes more than 10 papers in international conferences and journals such as ACL and CL. He is also a committee member of Youth Working Committee, Chinese Information Processing Society of China.
Senior Algorithm Engineer, Alibaba.com
PhD candidate of University of Macau
Speech: Toward Better Chinese Word Segmentation for SMT via Bilingual Constraint
This talk will introduce our investigations on building a better Chinese word segmentation model for statistical machine translation. It aims at leveraging word boundary information, automatically learned by bilingual character-based alignments, to induce a preferable segmentation model. We propose dealing with the induced word boundaries as soft constraints to bias the continuous learning of a supervised CRFs model, trained by the Treebank data (labeled), on the bilingual data (unlabeled). The induced word boundary information is encoded as a graph propagation constraint. The constrained model induction is accomplished by using posterior regularization algorithm. The experiments on a Chinese-to-English machine translation task reveal that the proposed model can bring positive segmentation effects to translation quality.
Mr. Xiaodong Zeng presently is a senior algorithm engineer on machine translation and natural language processing in Alibaba.com. He is also a PhD candidate of University of Macau, jointly supervised by Dr. Derek F. Wong (University of Macau), Lidia S. Chao (University of Macau), and Prof. Isabel Trancoso (Instituto Superior Técnico). His main research is on semi-supervised learning for natural language processing via graph propagation. Mr. Zeng received a B.S. with honors in E-commerce technology from Beijing Normal University at Zhuhai in 2009, and a M.Sc. with excellent in computer and information science from University of Macau in 2012. He worked as a research assistant for the Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory in 2009 ~ 2014. He has published more than 15 research papers, including the journal and conference papers in Pattern Recogn Lett, ACL and CoNLL.