Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training#Large Language Models#Pretrained Models#Stanford#Paper#PDF·arxiv.org·May 27, 2023Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training