Effective Approaches to Attention-based Neural Machine Translation
- Thang LuongHieu PhamChristopher D. Manning
- 17 August 2015
Computer Science, Linguistics
A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
- Jiahui YuYuanzhong Xu Yonghui Wu
- 22 June 2022
Computer Science
Trans. Mach. Learn. Res.
The Pathways Autoregressive Text-to-Image (Parti) model is presented, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge and explores and highlights limitations of the models.
Addressing the Rare Word Problem in Neural Machine Translation
- Thang LuongI. SutskeverQuoc V. LeO. VinyalsWojciech Zaremba
- 29 October 2014
Computer Science
This paper proposes and implements an effective technique to address the problem of end-to-end neural machine translation's inability to correctly translate very rare words, and is the first to surpass the best result achieved on a WMT’14 contest task.
Better Word Representations with Recursive Neural Networks for Morphology
- Thang LuongR. SocherChristopher D. Manning
- 1 August 2013
Computer Science, Linguistics
This paper combines recursive neural networks, where each morpheme is a basic unit, with neural language models to consider contextual information in learning morphologicallyaware word representations and proposes a novel model capable of building representations for morphologically complex words from their morphemes.
Bilingual Word Representations with Monolingual Quality in Mind
- Thang LuongHieu PhamChristopher D. Manning
- 1 June 2015
Computer Science, Linguistics
VS@HLT-NAACL
This work proposes a joint model to learn word representations from scratch that utilizes both the context coocurrence information through the monolingual component and the meaning equivalent signals from the bilingual constraint to learn high quality bilingual representations efficiently.
Symbolic Discovery of Optimization Algorithms
- Xiangning ChenChen Liang Quoc V. Le
- 13 February 2023
Computer Science
Lion is a simple and effective optimization algorithm that requires a smaller learning rate than Adam due to the larger norm of the update produced by the sign function and is more memory-efficient than Adam as it only keeps track of the momentum.
Learning Longer-term Dependencies in RNNs with Auxiliary Losses
- Trieu H. TrinhAndrew M. DaiThang LuongQuoc V. Le
- 12 February 2018
Computer Science
This paper proposes a simple method that improves the ability to capture long term dependencies in RNNs by adding an unsupervised auxiliary loss to the original objective, making truncated backpropagation feasible for long sequences and also improving full BPTT.
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
- Zhao ChenJiquan Ngiam Dragomir Anguelov
- 14 October 2020
Computer Science
This work presents Gradient Sign Dropout (GradDrop), a probabilistic masking procedure which samples gradients at an activation layer based on their level of consistency, and discusses how GradDrop reveals links between optimal multiloss training and gradient stochasticity.
When Are Tree Structures Necessary for Deep Learning of Representations?
- Jiwei LiThang LuongDan JurafskyE. Hovy
- 28 February 2015
Computer Science
This paper benchmarks recursive neural models against sequential recurrent neural models, enforcing applesto-apples comparison as much as possible, and introduces a method for allowing recurrent models to achieve similar performance: breaking long sentences into clause-like units at punctuation and processing them separately before combining.
Learning Distributed Representations for Multilingual Text Sequences
- Hieu PhamThang LuongChristopher D. Manning
- 1 June 2015
Computer Science, Linguistics
VS@HLT-NAACL
This work is similar in spirit to the recent paragraph vector approach but extends to the bilingual context so as to efficiently encode meaning-equivalent text sequences of multiple languages in the same semantic space.
...
...