3260 papers • 126 benchmarks • 313 datasets
De novo peptide sequencing refers to the process of determining the amino acid sequence of a peptide without prior knowledge of the DNA or protein it comes from. This technique is used in proteomics to analyze proteins and peptides, especially when the genomic sequence of the organism is unknown or the protein sequence is not available in databases. The process typically involves mass spectrometry (MS), where peptides are ionized and fragmented. The mass spectrometer measures the masses of these peptide fragments. By analyzing the mass differences between the fragments, the machine learning model can infer the sequence of amino acids in the peptide. This method is particularly useful for studying proteins from organisms with unsequenced genomes, post-translational modifications, and for discovering new proteins or variants.
(Image credit: Papersgraph)
These leaderboards are used to track progress in de-novo-peptide-sequencing-7
No benchmarks available.
Use these libraries to find de-novo-peptide-sequencing-7 models and implementations
No datasets available.
No subtasks available.
The proposed DeepNovo architecture combines recent advances in convolutional neural networks and recurrent neural networks to learn features of tandem mass spectra, fragment ions, and sequence patterns of peptides to solve the complex optimization task of de novo sequencing.
PointNovo is presented, a neural network-based de novo peptide sequencing model that can robustly handle any resolution levels of mass spectrometry data while keeping the computational complexity unchanged.
Cananovo is described, a machine learning model that uses a transformer neural network architecture to translate the sequence of peaks in a tandem mass spectrum into the sequence of amino acids that comprise the generating peptide.
Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce Casanovo-DIA, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Casanovo-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our Casanovo-DIA model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Casanovo-DIA is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/Casanovo-DIA.
This paper combines two modules de novo sequencing and database search into a single deep learning framework for peptide identification, and integrates de Bruijn graph assembly technique to offer a complete solution to reconstruct protein sequences from tandem mass spectrometry data.
A deep-learning-based tool that uses data-independent-acquisition mass spectrometry data to sequence peptides without using a database, DIA coupled with de novo sequencing allowed us to identify novel peptides in human antibodies and antigens.
This paper introduces DeepNovoV2, the state-of-the-art model for peptide sequencing that combines an order invariant network structure (T-Net) and recurrent neural networks and provides a complete end-to-end training and prediction framework to sequence patterns of peptides.
The deep learning and learning-to-rank techniques implemented in pNovo 3 significantly improve the precision of de novo sequencing, and such machine learning framework is worth extending to other related research fields to distinguish the similar sequences.
PepNet is presented, a fully convolutional neural network (CNN) for high accuracy de novo peptide sequencing that can sequence a large fraction of spectra that were not identified by database search engines, and thus could be used as a complementary tool ofdatabase search engines for peptide identification in proteomics.
Adding a benchmark result helps the community track progress.