Thang Luong | Semantic Scholar

Effective Approaches to Attention-based Neural Machine Translation

Thang LuongHieu PhamChristopher D. Manning

Computer Science, Linguistics

Conference on Empirical Methods in Natural…

17 August 2015

A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.

ACL

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Jiahui YuYuanzhong Xu Yonghui Wu

Computer Science

Trans. Mach. Learn. Res.

22 June 2022

The Pathways Autoregressive Text-to-Image (Parti) model is presented, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge and explores and highlights limitations of the models.

arXiv

Addressing the Rare Word Problem in Neural Machine Translation

Thang LuongI. SutskeverQuoc V. LeO. VinyalsWojciech Zaremba

Computer Science

Annual Meeting of the Association for…

29 October 2014

This paper proposes and implements an effective technique to address the problem of end-to-end neural machine translation's inability to correctly translate very rare words, and is the first to surpass the best result achieved on a WMT’14 contest task.

ACL

Better Word Representations with Recursive Neural Networks for Morphology

Thang LuongR. SocherChristopher D. Manning

Computer Science, Linguistics

Conference on Computational Natural Language…

1 August 2013

This paper combines recursive neural networks, where each morpheme is a basic unit, with neural language models to consider contextual information in learning morphologicallyaware word representations and proposes a novel model capable of building representations for morphologically complex words from their morphemes.

ACL

Bilingual Word Representations with Monolingual Quality in Mind

Thang LuongHieu PhamChristopher D. Manning

Computer Science, Linguistics

VS@HLT-NAACL

1 June 2015

This work proposes a joint model to learn word representations from scratch that utilizes both the context coocurrence information through the monolingual component and the meaning equivalent signals from the bilingual constraint to learn high quality bilingual representations efficiently.

ACL

Symbolic Discovery of Optimization Algorithms

Xiangning ChenChen Liang Quoc V. Le

Computer Science

Neural Information Processing Systems

13 February 2023

Lion is a simple and effective optimization algorithm that requires a smaller learning rate than Adam due to the larger norm of the update produced by the sign function and is more memory-efficient than Adam as it only keeps track of the momentum.

arXiv

Learning Longer-term Dependencies in RNNs with Auxiliary Losses

Trieu H. TrinhAndrew M. DaiThang LuongQuoc V. Le

Computer Science

International Conference on Machine Learning

12 February 2018

This paper proposes a simple method that improves the ability to capture long term dependencies in RNNs by adding an unsupervised auxiliary loss to the original objective, making truncated backpropagation feasible for long sequences and also improving full BPTT.

arXiv

Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout

Zhao ChenJiquan Ngiam Dragomir Anguelov

Computer Science

Neural Information Processing Systems

14 October 2020

This work presents Gradient Sign Dropout (GradDrop), a probabilistic masking procedure which samples gradients at an activation layer based on their level of consistency, and discusses how GradDrop reveals links between optimal multiloss training and gradient stochasticity.

arXiv

When Are Tree Structures Necessary for Deep Learning of Representations?

Jiwei LiThang LuongDan JurafskyE. Hovy

Computer Science

Conference on Empirical Methods in Natural…

28 February 2015

This paper benchmarks recursive neural models against sequential recurrent neural models, enforcing applesto-apples comparison as much as possible, and introduces a method for allowing recurrent models to achieve similar performance: breaking long sentences into clause-like units at punctuation and processing them separately before combining.

ACL

Learning Distributed Representations for Multilingual Text Sequences

Hieu PhamThang LuongChristopher D. Manning

Computer Science, Linguistics

VS@HLT-NAACL

1 June 2015

This work is similar in spirit to the recent paragraph vector approach but extends to the bilingual context so as to efficiently encode meaning-equivalent text sequences of multiple languages in the same semantic space.

ACL