Regularized Evolution for Image Classifier Architecture Search
- Esteban RealA. AggarwalYanping HuangQuoc V. Le
- 5 February 2018
Computer Science
This work evolves an image classifier---AmoebaNet-A---that surpasses hand-designs for the first time and gives evidence that evolution can obtain results faster with the same hardware, especially at the earlier stages of the search.
Scaling Instruction-Finetuned Language Models
- Hyung Won ChungLe Hou Jason Wei
- 20 October 2022
Computer Science
It is found that instruction finetuning with the above aspects dramatically improves performance on a variety of model classes (PaLM, T5, U-PaLM), prompting setups, and evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generation).
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
- Yanping HuangYonglong Cheng Z. Chen
- 16 November 2018
Computer Science, Engineering
GPipe is introduced, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers by pipelining different sub-sequences of layers on separate accelerators, resulting in almost linear speedup when a model is partitioned across multiple accelerators.
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
- Dmitry LepikhinHyoukJoong Lee Z. Chen
- 30 June 2020
Computer Science
GShard enabled us to scale up multilingual neural machine translation Transformer model with Sparsely-Gated Mixture-of-Experts beyond 600 billion parameters using automatic sharding and it is demonstrated that such a giant model can efficiently be trained on 2048 TPU v3 accelerators in 4 days to achieve far superior quality for translation from 100 languages to English compared to the prior art.
LaMDA: Language Models for Dialog Applications
- R. ThoppilanDaniel De Freitas Quoc Le
- 20 January 2022
Computer Science
It is demonstrated that fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements towards the two key challenges of safety and factual grounding.
PaLM 2 Technical Report
- Rohan AnilAndrew M. Dai Yonghui Wu
- 17 May 2023
Computer Science, Linguistics
PaLM 2 is a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM and enables inference-time control over toxicity without additional overhead or impact on other capabilities.
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
- Nan DuYanping Huang Claire Cui
- 13 December 2021
Computer Science
This paper proposes and develops a family of language models named GLaM (Generalist Language Model), which uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants.
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
- Zhao ChenJiquan Ngiam Dragomir Anguelov
- 14 October 2020
Computer Science
This work presents Gradient Sign Dropout (GradDrop), a probabilistic masking procedure which samples gradients at an activation layer based on their level of consistency, and discusses how GradDrop reveals links between optimal multiloss training and gradient stochasticity.
Mixture-of-Experts with Expert Choice Routing
- Yan-Quan ZhouTao Lei J. Laudon
- 18 February 2022
Computer Science
This work proposes a heterogeneous mixture-of-experts employing an expert choice method that improves training convergence time by more than 2x and demonstrates higher performance in fine-tuning 11 selected tasks in the GLUE and SuperGLUE benchmarks.
ST-MoE: Designing Stable and Transferable Sparse Expert Models
- Barret ZophIrwan Bello W. Fedus
- 17 February 2022
Computer Science, Linguistics
This work scales a sparse model to 269B parameters, with a computational cost comparable to a 32B dense encoder-decoder Transformer (Stable and Transferable Mixture-of-Experts or ST-MoE-32B), and for the first time achieves state-of theart performance in transfer learning.
...
...