I. Sutskever | Semantic Scholar

ImageNet classification with deep convolutional neural networks

A. KrizhevskyI. SutskeverGeoffrey E. Hinton

Computer Science

3 December 2012

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

ACM

Learning Transferable Visual Models From Natural Language Supervision

Alec RadfordJong Wook Kim I. Sutskever

Computer Science

International Conference on Machine Learning

26 February 2021

It is demonstrated that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet.

arXiv

Distributed Representations of Words and Phrases and their Compositionality

Tomas MikolovI. SutskeverKai ChenG. CorradoJ. Dean

Computer Science

Neural Information Processing Systems

16 October 2013

This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

arXiv

Language Models are Few-Shot Learners

Tom B. BrownBenjamin Mann Dario Amodei

Computer Science, Linguistics

Neural Information Processing Systems

28 May 2020

GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

arXiv

Dropout: a simple way to prevent neural networks from overfitting

Nitish SrivastavaGeoffrey E. HintonA. KrizhevskyI. SutskeverR. Salakhutdinov

Computer Science

Journal of machine learning research

2014

It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

Publisher

Language Models are Unsupervised Multitask Learners

Alec RadfordJeff WuR. ChildD. LuanDario AmodeiI. Sutskever

Computer Science, Linguistics

2019

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

GPT-4 Technical Report

OpenAI Josh AchiamSteven Adler Barret Zoph

Computer Science

15 March 2023

GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs, is developed, a Transformer-based model pre-trained to predict the next token in a document which exhibits human-level performance on various professional and academic benchmarks.

arXiv

Sequence to Sequence Learning with Neural Networks

I. SutskeverO. VinyalsQuoc V. Le

Computer Science

Neural Information Processing Systems

10 September 2014

This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the _target sentence which made the optimization problem easier.

arXiv

Intriguing properties of neural networks

Christian SzegedyWojciech Zaremba R. Fergus

Computer Science

International Conference on Learning…

20 December 2013

It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.

arXiv

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Martín AbadiAshish Agarwal Xiaoqiang Zheng

Computer Science, Engineering

arXiv.org

14 March 2016

The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.

arXiv