natural-language-processing-2

Protein Language Model

3260 papers • 126 benchmarks • 313 datasets

This task has no description! Would you like to contribute one?

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in protein-language-model-2

Trend

Dataset

Best Model

Actions

DAVIS-DTA

Libraries

i

Use these libraries to find protein-language-model-2 models and implementations

Datasets

DAVIS-DTA

Subtasks

No subtasks available.

Most implemented papers

ESM-NBR: fast and accurate nucleic acid-binding residue prediction via protein language model feature representation and multi-task learning

Wenwu Zeng, Dafeng Lv, Wenjuan Liu, Shaoliang Peng•Thu Nov 30 2023

Experimental results on benchmark data sets demonstrate that the prediction performance of ESM2 feature representation comprehensively outperforms evolutionary information-based hidden Markov model (HMM) features.

8

Content

0

Paper Graph

MSA Transformer

P. Abbeel, Joshua Meier, J. Canny, Roshan Rao, Jason Liu, Robert Verkuil, Tom Sercu, Alexander Rives•Fri Feb 12 2021

A protein language model which takes as input a set of sequences in the form of a multiple sequence alignment and is trained with a variant of the masked language modeling objective across many protein families surpasses current state-of-the-art unsupervised structure learning methods by a wide margin.

660 0

Paper Graph

Structure-aware protein self-supervised learning

D. Dou, Can Chen, Jingbo Zhou, Fan Wang, Xue Liu•Tue Apr 05 2022

A novel structure-aware protein self-supervised learning method to effectively capture structural information of proteins via a novel pseudo bi-level optimization scheme using a graph neural network model pretrained on protein sequences.

77 0

Paper Graph

Generative power of a protein language model trained on multiple sequence alignments

Damiano Sgarbossa, Umberto Lupo, Anne-Florence Bitbol•Wed Apr 13 2022

This work proposes and test an iterative method that directly employs the masked language modeling objective to generate sequences using MSA Transformer and demonstrates that the resulting sequences score as well as natural sequences, for homology, coevolution and structure-based measures.

40 0

Paper Graph

DistilProtBert: A distilled protein language model used to distinguish between real proteins and their randomly shuffled counterparts

Yaron Geffen, Yanay Ofran, R. Unger•Mon May 09 2022

It is shown that DistilProtBert preforms very well on singlet, doublet, and even triplet-shuffled versions of the human proteome, with AUC of 0.92, 0.91, and 0.87, and it is suggested that by examining the small number of false-positive classifications the authors may be able to identify de-novo potential natural-like proteins based on random shuffling of amino acid sequences.

41 0

Paper Graph

A method for multiple-sequence-alignment-free protein structure prediction using a protein language model

Fan Wang, Xiaomin Fang, Lihang Liu, Jingzhou He, Dayong Lin, Yingfei Xiang, Xiaonan Zhang, Huahong Wu, Hui Li, Le Song•Wed Jul 27 2022

HelixFold-Single first pre-trains a large-scale protein language model with thousands of millions of primary structures utilizing the self-supervised learning paradigm, which will be used as an alternative to MSAs for learning the co-evolution information.

72 0

Paper Graph

Protein language model rescue mutations highlight variant effects and structure in clinically relevant genes

Onuralp Soylemez, P. Cordero•Thu Nov 17 2022

While deep mutational scan experiments provide an unbiased estimate of the mutational landscape, the community is encouraged to generate and curate rescue mutation experiments to inform the design of more sophisticated co-masking strategies and leverage large language models more effectively for downstream clinical prediction tasks.

0 0

Paper Graph

Protein Language Models and Structure Prediction: Connection and Progression

Jiangbin Zheng, Cheng Tan, Stan Z. Li, Bozhen Hu, Jun-Xiong Xia, Yufei Huang, Yongjie Xu•Tue Nov 29 2022

The similarities between protein and human languages that allow LMs extended to pLMs, and applied to protein databases are introduced, and different types of methods for PSP are discussed, particularly how the pLM-based architectures function in the process of protein folding.

46 0

Paper Graph

Plug & play directed evolution of proteins with gradient-based discrete MCMC

Patrick Emami, Aidan Perreault, Jeffrey N. Law, David J. Biagioni, Peter C. St. John•Mon Dec 19 2022

A sampling framework for evolving proteins in silico that supports mixing and matching a variety of unsupervised models, such as protein language models, and supervised models that predict protein function from sequence is introduced.

24 0

Paper Graph

Adding a benchmark result helps the community track progress.