How well does CLIP understand texture? (2022-03-22T00:00:00.000000Z)

TL;DR

This work analyzes CLIP’s ability to perform zero-shot learning on various texture and material classification datasets, and its ability to represent compositional properties of texture such as red dots or yellow stripes on the Describable Texture in Detail dataset.

Abstract

. We investigate how well does CLIP understand texture in natural images described by natural language. To this end we analyze CLIP’s ability to: (1) perform zero-shot learning on various texture and material classification datasets; (2) represent compositional properties of texture such as red dots or yellow stripes on the Describable Texture in Detail ( DTD 2 ) dataset; and (3) aid fine-grained categorization of birds in photographs described by color and texture of their body parts.

Authors

Subhransu Maji

10 papers

Chenyun Wu

1 papers

How well does CLIP understand texture?

TL;DR

Abstract

Authors

References23 items

Hierarchical Text-Conditional Image Generation with CLIP Latents

Image Segmentation Using Text and Image Prompts

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Open-vocabulary Object Detection via Vision and Language Knowledge Distillation

The Power of Scale for Parameter-Efficient Prompt Tuning

Learning Transferable Visual Models From Natural Language Supervision

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

Describing Textures using Natural Language

UNITER: Learning UNiversal Image-TExt Representations

Global Second-Order Pooling Convolutional Networks

Second-order Democratic Aggregation

Attention is All you Need

Deep Residual Learning for Image Recognition

Deep filter banks for texture recognition and segmentation

Bilinear CNN Models for Fine-Grained Visual Recognition

Accuracy and speed of material categorization in real-world images.

Describing Textures in the Wild

The Caltech-UCSD Birds-200-2011 Dataset

Prefix-Tuning: Optimizing Continuous Prompts for Generation

DenseCLIP: Extract Free Dense Labels from CLIP

Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi

The KTH-TIPS2 database. Computational Vision and Active Perception Laboratory

THE KTH-TIPS database

Field of Study