Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model

AI, artificial intelligence, language models, Machine Learning, machine learning analytics, Natural language processing
2897 Views

3 years ago
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model

Originally published in Microsoft Research Blog, Oct 11, 2021

We are excited to introduce the DeepSpeed- and Megatron-powered Megatron-Turing Natural Language Generation model (MT-NLG), the largest and the most powerful monolithic transformer language model trained to date, with 530 billion parameters. It is the result of a research collaboration between Microsoft and NVIDIA to further parallelize and optimize the training of very large AI models.

As the successor to Turing NLG 17B and Megatron-LM, MT-NLG has 3x the number of parameters compared to the existing largest model of this type and demonstrates unmatched accuracy in a broad set of natural language tasks such as:

Completion prediction
Reading comprehension
Commonsense reasoning
Natural language inferences
Word sense disambiguation

The 105-layer, transformer-based MT-NLG improved upon the prior state-of-the-art models in zero-, one-, and few-shot settings and set the new standard for large-scale language models in both model scale and quality.

Large-scale language models

Transformer-based language models in natural language processing (NLP) have driven rapid progress in recent years fueled by computation at scale, large datasets, and advanced algorithms and software to train these models.

Language models with large numbers of parameters, more data, and more training time acquire a richer, more nuanced understanding of language. As a result, they generalize well as effective zero– or few-shot learners, with high accuracy on many NLP tasks and datasets. Exciting downstream applications include summarization, automatic dialogue generation, translation, semantic search, and code autocompletion. It’s no surprise that the number of parameters in state-of-the-art NLP models have grown at an exponential rate.

To continue reading this article, click here.

EXCLUSIVE HIGHLIGHTS

Related

3 years ago
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model

Originally published in Microsoft Research Blog, Oct 11, 2021

Leave a Reply Cancel reply

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2025 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact

EXCLUSIVE HIGHLIGHTS

Related

3 years agoUsing DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model

Originally published in Microsoft Research Blog, Oct 11, 2021

Recommended

Five Trends in AI and Data Science for 2025

AI data readiness: C-suite fantasy, big IT problem

AI Optimism vs. Skepticism: Bridging the Gap Between Hype and Practicality

How Gen AI and Analytical AI Differ — and When to Use Each

Leave a Reply Cancel reply

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2025 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190 Produced by: Rising Media & Prediction Impact

3 years ago
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model

The Machine Learning Times © 2025 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact