ImageNet classification with deep convolutional neural networks
- A. KrizhevskyI. SutskeverGeoffrey E. Hinton
- 3 December 2012
Computer Science
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Learning Transferable Visual Models From Natural Language Supervision
- Alec RadfordJong Wook Kim I. Sutskever
- 26 February 2021
Computer Science
It is demonstrated that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet.
Distributed Representations of Words and Phrases and their Compositionality
- Tomas MikolovI. SutskeverKai ChenG. CorradoJ. Dean
- 16 October 2013
Computer Science
This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Language Models are Few-Shot Learners
- Tom B. BrownBenjamin Mann Dario Amodei
- 28 May 2020
Computer Science, Linguistics
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
Dropout: a simple way to prevent neural networks from overfitting
- Nitish SrivastavaGeoffrey E. HintonA. KrizhevskyI. SutskeverR. Salakhutdinov
- 2014
Computer Science
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Language Models are Unsupervised Multitask Learners
- Alec RadfordJeff WuR. ChildD. LuanDario AmodeiI. Sutskever
- 2019
Computer Science, Linguistics
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
GPT-4 Technical Report
- OpenAI Josh AchiamSteven Adler Barret Zoph
- 15 March 2023
Computer Science
GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs, is developed, a Transformer-based model pre-trained to predict the next token in a document which exhibits human-level performance on various professional and academic benchmarks.
Sequence to Sequence Learning with Neural Networks
- I. SutskeverO. VinyalsQuoc V. Le
- 10 September 2014
Computer Science
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the _target sentence which made the optimization problem easier.
Intriguing properties of neural networks
- Christian SzegedyWojciech Zaremba R. Fergus
- 20 December 2013
Computer Science
It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
- MartÃn AbadiAshish Agarwal Xiaoqiang Zheng
- 14 March 2016
Computer Science, Engineering
The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.
...
...