Accurate Prediction of Antibody Function and Structure Using Bio-Inspired Antibody Language Model
In recent decades, antibodies have emerged as indispensable therapeutics for
combating diseases, particularly viral infections. However, their development
has been hindered by limited structural information and labor-intensive
engineering processes. Fortunately, significant advancements in deep learning
methods have facilitated the precise prediction of protein structure and
function by leveraging co-evolution information from homologous proteins.
Despite these advances, predicting the conformation of antibodies remains
challenging due to their unique evolution and the high flexibility of their
antigen-binding regions. Here, to address this challenge, we present the
Bio-inspired Antibody Language Model (BALM). This model is trained on a vast
dataset comprising 336 million 40% non-redundant unlabeled antibody sequences,
capturing both unique and conserved properties specific to antibodies. Notably,
BALM showcases exceptional performance across four antigen-binding prediction
tasks. Moreover, we introduce BALMFold, an end-to-end method derived from BALM,
capable of swiftly predicting full atomic antibody structures from individual
sequences. Remarkably, BALMFold outperforms those well-established methods like
AlphaFold2, IgFold, ESMFold, and OmegaFold in the antibody benchmark,
demonstrating significant potential to advance innovative engineering and
streamline therapeutic antibody development by reducing the need for
unnecessary trials.