Dzmitry Bahdanau | Semantic Scholar

Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

Kyunghyun ChoB. V. Merrienboer Yoshua Bengio

Computer Science

Conference on Empirical Methods in Natural…

3 June 2014

Qualitatively, the proposed RNN Encoder‐Decoder model learns a semantically and syntactically meaningful representation of linguistic phrases.

ACL

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry BahdanauKyunghyun ChoYoshua Bengio

Computer Science

International Conference on Learning…

1 September 2014

It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a _target word, without having to form these parts as a hard segment explicitly.

arXiv

On the Properties of Neural Machine Translation: Encoder–Decoder Approaches

Kyunghyun ChoB. V. MerrienboerDzmitry BahdanauYoshua Bengio

Computer Science

SSST@EMNLP

3 September 2014

It is shown that the neural machine translation performs relatively well on short sentences without unknown words, but its performance degrades rapidly as the length of the sentence and the number of unknown words increase.

ACL

Theano: A Python framework for fast computation of mathematical expressions

Rami Al-RfouGuillaume Alain Ying Zhang

Computer Science, Mathematics

arXiv.org

9 May 2016

The performance of Theano is compared against Torch7 and TensorFlow on several machine learning models and recently-introduced functionalities and improvements are discussed.

arXiv

Attention-Based Models for Speech Recognition

J. ChorowskiDzmitry BahdanauDmitriy SerdyukKyunghyun ChoYoshua Bengio

Computer Science

Neural Information Processing Systems

24 June 2015

The attention-mechanism is extended with features needed for speech recognition and a novel and generic method of adding location-awareness to the attention mechanism is proposed to alleviate the issue of high phoneme error rate.

arXiv

PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models

Torsten ScholakNathan SchucherDzmitry Bahdanau

Computer Science

Conference on Empirical Methods in Natural…

10 September 2021

On the challenging Spider and CoSQL text-to-SQL translation tasks, it is shown that PICARD transforms fine-tuned T5 models with passable performance into state-of-the-art solutions.

ACL

An Actor-Critic Algorithm for Sequence Prediction

Dzmitry BahdanauPhilemon Brakel Yoshua Bengio

Computer Science

International Conference on Learning…

24 July 2016

An approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL) that condition the critic network on the ground-truth output, and shows that this method leads to improved performance on both a synthetic task, and for German-English machine translation.

arXiv

End-to-end attention-based large vocabulary speech recognition

Dzmitry BahdanauJ. ChorowskiDmitriy SerdyukPhilemon BrakelYoshua Bengio

Computer Science

IEEE International Conference on Acoustics…

18 August 2015

This work investigates an alternative method for sequence modelling based on an attention mechanism that allows a Recurrent Neural Network (RNN) to learn alignments between sequences of input frames and output labels.

IEEE

StarCoder: may the source be with you!

Raymond LiLoubna Ben Allal H. D. Vries

Computer Science

Trans. Mach. Learn. Res.

9 May 2023

This work performs the most comprehensive evaluation of Code LLMs to date and shows that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model.

arXiv

The Stack: 3 TB of permissively licensed source code

Denis KocetkovRaymond Li H. D. Vries

Computer Science

Trans. Mach. Learn. Res.

20 November 2022

It is found that near-deduplicating the data significantly boosts performance across all experiments, and it is possible to match previously reported HumanEval and MBPP performance using only permissively licensed data.

arXiv