Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
- Kyunghyun ChoB. V. Merrienboer Yoshua Bengio
- 3 June 2014
Computer Science
Qualitatively, the proposed RNN Encoder‐Decoder model learns a semantically and syntactically meaningful representation of linguistic phrases.
Neural Machine Translation by Jointly Learning to Align and Translate
- Dzmitry BahdanauKyunghyun ChoYoshua Bengio
- 1 September 2014
Computer Science
It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a _target word, without having to form these parts as a hard segment explicitly.
On the Properties of Neural Machine Translation: Encoder–Decoder Approaches
- Kyunghyun ChoB. V. MerrienboerDzmitry BahdanauYoshua Bengio
- 3 September 2014
Computer Science
SSST@EMNLP
It is shown that the neural machine translation performs relatively well on short sentences without unknown words, but its performance degrades rapidly as the length of the sentence and the number of unknown words increase.
Theano: A Python framework for fast computation of mathematical expressions
- Rami Al-RfouGuillaume Alain Ying Zhang
- 9 May 2016
Computer Science, Mathematics
The performance of Theano is compared against Torch7 and TensorFlow on several machine learning models and recently-introduced functionalities and improvements are discussed.
Attention-Based Models for Speech Recognition
- J. ChorowskiDzmitry BahdanauDmitriy SerdyukKyunghyun ChoYoshua Bengio
- 24 June 2015
Computer Science
The attention-mechanism is extended with features needed for speech recognition and a novel and generic method of adding location-awareness to the attention mechanism is proposed to alleviate the issue of high phoneme error rate.
PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models
- Torsten ScholakNathan SchucherDzmitry Bahdanau
- 10 September 2021
Computer Science
On the challenging Spider and CoSQL text-to-SQL translation tasks, it is shown that PICARD transforms fine-tuned T5 models with passable performance into state-of-the-art solutions.
An Actor-Critic Algorithm for Sequence Prediction
- Dzmitry BahdanauPhilemon Brakel Yoshua Bengio
- 24 July 2016
Computer Science
An approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL) that condition the critic network on the ground-truth output, and shows that this method leads to improved performance on both a synthetic task, and for German-English machine translation.
End-to-end attention-based large vocabulary speech recognition
- Dzmitry BahdanauJ. ChorowskiDmitriy SerdyukPhilemon BrakelYoshua Bengio
- 18 August 2015
Computer Science
This work investigates an alternative method for sequence modelling based on an attention mechanism that allows a Recurrent Neural Network (RNN) to learn alignments between sequences of input frames and output labels.
StarCoder: may the source be with you!
- Raymond LiLoubna Ben Allal H. D. Vries
- 9 May 2023
Computer Science
Trans. Mach. Learn. Res.
This work performs the most comprehensive evaluation of Code LLMs to date and shows that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model.
The Stack: 3 TB of permissively licensed source code
- Denis KocetkovRaymond Li H. D. Vries
- 20 November 2022
Computer Science
Trans. Mach. Learn. Res.
It is found that near-deduplicating the data significantly boosts performance across all experiments, and it is possible to match previously reported HumanEval and MBPP performance using only permissively licensed data.
...
...