3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in protein-language-model-2
Use these libraries to find protein-language-model-2 models and implementations
No subtasks available.
Experimental results on benchmark data sets demonstrate that the prediction performance of ESM2 feature representation comprehensively outperforms evolutionary information-based hidden Markov model (HMM) features.
A protein language model which takes as input a set of sequences in the form of a multiple sequence alignment and is trained with a variant of the masked language modeling objective across many protein families surpasses current state-of-the-art unsupervised structure learning methods by a wide margin.
A novel structure-aware protein self-supervised learning method to effectively capture structural information of proteins via a novel pseudo bi-level optimization scheme using a graph neural network model pretrained on protein sequences.
This work proposes and test an iterative method that directly employs the masked language modeling objective to generate sequences using MSA Transformer and demonstrates that the resulting sequences score as well as natural sequences, for homology, coevolution and structure-based measures.
It is shown that DistilProtBert preforms very well on singlet, doublet, and even triplet-shuffled versions of the human proteome, with AUC of 0.92, 0.91, and 0.87, and it is suggested that by examining the small number of false-positive classifications the authors may be able to identify de-novo potential natural-like proteins based on random shuffling of amino acid sequences.
HelixFold-Single first pre-trains a large-scale protein language model with thousands of millions of primary structures utilizing the self-supervised learning paradigm, which will be used as an alternative to MSAs for learning the co-evolution information.
While deep mutational scan experiments provide an unbiased estimate of the mutational landscape, the community is encouraged to generate and curate rescue mutation experiments to inform the design of more sophisticated co-masking strategies and leverage large language models more effectively for downstream clinical prediction tasks.
The similarities between protein and human languages that allow LMs extended to pLMs, and applied to protein databases are introduced, and different types of methods for PSP are discussed, particularly how the pLM-based architectures function in the process of protein folding.
A sampling framework for evolving proteins in silico that supports mixing and matching a variety of unsupervised models, such as protein language models, and supervised models that predict protein function from sequence is introduced.
Adding a benchmark result helps the community track progress.