1
Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey
2
InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery
3
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4
4
MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter
5
BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations
6
BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine
7
GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text
8
Prot2Text: Multimodal Protein's Function Generation with GNNs and Transformers
9
Llama 2: Open Foundation and Fine-Tuned Chat Models
10
Empowering Molecule Discovery for Molecule-Caption Translation With Large Language Models: A ChatGPT Perspective
11
MolFM: A Multimodal Molecular Foundation Model
12
Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks
13
MolXPT: Wrapping Molecules with Text for Generative Pre-training
14
LLaMA: Open and Efficient Foundation Language Models
15
Multilingual translation for zero-shot biomedical classification using BioTranslator
16
A text-guided protein design framework
17
Unifying Molecular and Textual Representations via Multi-task Language Modelling
18
Galactica: A Large Language Model for Science
19
GLM-130B: An Open Bilingual Pre-trained Model
20
A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language
21
Interpretable bilinear attention network with domain adaptation improves drug–target prediction
22
PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding
23
Translation between Molecules and Natural Language
24
A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals
25
BERN2: an advanced neural biomedical named entity recognition and normalization tool
26
Pre-training Molecular Graph Representation with 3D Geometry
27
Motif-based Graph Self-Supervised Learning for Molecular Property Prediction
28
Machine Learning in Drug Discovery: A Review
29
Geometry-enhanced molecular representation learning for property prediction
30
Investigating the Limitations of Transformers with Simple Arithmetic Tasks
31
STOUT: SMILES to IUPAC names using neural machine translation
32
ZINC20 - A Free Ultralarge-Scale Chemical Database for Ligand Discovery
33
Graph Contrastive Learning with Augmentations
34
ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction
36
Language Models are Few-Shot Learners
37
TransformerCPI: improving compound-protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments
38
MolTrans: Molecular Interaction Transformer for drug–target interaction prediction
39
bioRxiv: the preprint server for biology
40
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
41
Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation
42
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
43
DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences
44
PubChem 2019 update: improved access to chemical data
45
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
46
Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery
47
Decoupled Weight Decay Regularization
48
Attention is All you Need
49
MoleculeNet: a benchmark for molecular machine learning
50
Deep Residual Learning for Image Recognition
51
An Introduction to Convolutional Neural Networks
52
Get Your Atoms in Order - An Open-Source Implementation of a Novel and Robust Molecular Canonicalization Algorithm
53
Harnessing Computational Biology for Exact Linear B-Cell Epitope Prediction: A Novel Amino Acid Composition-Based Feature Descriptor
54
Improving compound–protein interaction prediction by building up highly credible negative samples
55
PubMed: The Bibliographic Database
56
Extended-Connectivity Fingerprints
57
Levenshtein Distance: Information theory, Computer science, String (computer science), String metric, Damerau?Levenshtein distance, Spell checker, Hamming distance
58
Prediction of drug–target interaction networks from the integration of chemical and genomic spaces
59
Detection of IUPAC and IUPAC-like chemical names
60
BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities
61
METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments
62
ROUGE: A Package for Automatic Evaluation of Summaries
63
Reoptimization of MDL Keys for Use in Drug Discovery
64
Bleu: a Method for Automatic Evaluation of Machine Translation
65
Prediction of Membrane Protein Types Based on the Hydrophobic Index of Amino Acids
67
Support-Vector Networks
68
SMILES. 2. Algorithm for generation of unique SMILES notation
69
Improved tools for biological sequence comparison.
70
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules
71
Rapid and sensitive protein similarity searches.
72
Uni-Mol: A Universal 3D Molecular Representation Learning Framework
73
The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023
74
International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA
75
Text2Mol: Cross-Modal Molecule Retrieval with Natural Language Queries
76
Prottrans: Toward understanding the language of life through self-supervised learning
77
Rdkit: Open-source cheminfor-matics software
78
GraphDTA: Predicting drug-target binding affinity with graph neural networks
79
Biosnap datasets: Stanford biomedical network dataset collection
80
Random decision forests
81
2023. Mol-instructions: A large-scale biomolecular instruction dataset for large language models
82
2023. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality
83
2023a. Empowering ai drug discovery with explicit and implicit knowledge
84
2023. Stanford alpaca: An instruction-following llama model
85
2023a. Baize: An open-source chat model with parameter-efficient tuning on self-chat data
86
2022. Molecular contrastive learning of representations via graph neural networks
87
BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm098 Databases and ontologies UniRef: comprehensive and non-redundant UniProt reference
88
2023. GPT-4 technical report
89
2022. Language models of protein sequences at the scale of evolution enable accurate structure prediction