Megatron-LM: Training Multi-Billion Parameter Language Models...
Recent work in language modeling demonstrates that training large transformer
models advances the state of the art in Natural Language Processing
applications. However, very large models can be...