GIT-Mol is introduced, a multi-modal large language model that integrates the Graph, Image, and Text information and proposes GIT-Former, a novel architecture that is capable of aligning all modalities into a unified latent space.
Authors
Peng Liu
5 papers
Yiming Ren
1 papers
Zhixiang Ren
1 papers
References63 items
1
Geometry-enhanced pretraining on interatomic potentials
2
DrugGPT: A GPT-based Strategy for Designing Potential Ligands Targeting Specific Proteins
3
Empowering Molecule Discovery for Molecule-Caption Translation With Large Language Models: A ChatGPT Perspective
4
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
5
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
6
Visual Instruction Tuning
7
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
8
One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
9
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
10
PaLM-E: An Embodied Multimodal Language Model
11
Language Is Not All You Need: Aligning Perception with Language Models
12
LLaMA: Open and Efficient Foundation Language Models
13
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
14
Multi-modal molecule structure–text model for text-based retrieval and editing
15
Multi-modal chemical information reconstruction from images and texts for exploring the near-drug space
16
Foundation Transformers
17
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
18
A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language
19
Diffusion Models: A Comprehensive Survey of Methods and Applications
20
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
21
SwinOCSR: end-to-end optical chemical structure recognition using a Swin Transformer