CCNet: Criss-Cross Attention for Semantic Segmentation (2018-11-28T00:00:00.000000Z)

TL;DR

For each pixel, a novel criss-cross attention module in CCNet harvests the contextual information of all the pixels on its criss-cross path by taking a further recurrent operation, each pixel can finally capture the full-image dependencies from all pixels.

Abstract

Full-image dependencies provide useful contextual information to benefit visual understanding problems. In this work, we propose a Criss-Cross Network (CCNet) for obtaining such contextual information in a more effective and efficient way. Concretely, for each pixel, a novel criss-cross attention module in CCNet harvests the contextual information of all the pixels on its criss-cross path. By taking a further recurrent operation, each pixel can finally capture the full-image dependencies from all pixels. Overall, CCNet is with the following merits: 1) GPU memory friendly. Compared with the non-local block, the proposed recurrent criss-cross attention module requires 11x less GPU memory usage. 2) High computational efficiency. The recurrent criss-cross attention significantly reduces FLOPs by about 85% of the non-local block in computing full-image dependencies. 3) The state-of-the-art performance. We conduct extensive experiments on popular semantic segmentation benchmarks including Cityscapes, ADE20K, and instance segmentation benchmark COCO. In particular, our CCNet achieves the mIoU score of 81.4 and 45.22 on Cityscapes test set and ADE20K validation set, respectively, which are the new state-of-the-art results. The source code is available at https://github.com/speedinghzl/CCNet.

Authors

Xinggang Wang

22 papers

Chang Huang

5 papers

Humphrey Shi

13 papers

TL;DR

Abstract

Authors

References51 items

SPGNet: Semantic Prediction Guidance for Scene Parsing

Searching for Efficient Multi-Scale Architectures for Dense Image Prediction

Dual Attention Network for Scene Segmentation

PSANet: Point-wise Spatial Attention Network for Scene Parsing

Multi-scale Context Intertwining for Semantic Segmentation

OCNet: Object Context Network for Scene Parsing

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

Unified Perceptual Parsing for Scene Understanding

DenseASPP for Semantic Segmentation in Street Scenes

Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation

Learning a Discriminative Feature Network for Semantic Segmentation

Adaptive Affinity Field for Semantic Segmentation

Context Encoding for Semantic Segmentation

Dynamic-Structured Semantic Propagation Network

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

In-place Activated BatchNorm for Memory-Optimized Training of DNNs

Non-local Neural Networks

Scale-Adaptive Convolutions for Scene Parsing

Learning Affinity via Spatial Propagation Networks

Dense and Low-Rank Gaussian CRFs Using Deep Embeddings

Squeeze-and-Excitation Networks

Scene Parsing through ADE20K Dataset

Rethinking Atrous Convolution for Semantic Image Segmentation

Attention is All you Need

Mask R-CNN

Deformable Convolutional Networks

Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network

Understanding Convolution for Semantic Segmentation

Pyramid Scene Parsing Network

Wider or Deeper: Revisiting the ResNet Model for Visual Recognition

RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

The Cityscapes Dataset for Semantic Urban Scene Understanding

Fast, Exact and Multi-scale Inference for Semantic Image Segmentation with Deep Gaussian CRFs

Long Short-Term Memory-Networks for Machine Reading

Deep Residual Learning for Image Recognition

Multi-Scale Context Aggregation by Dilated Convolutions

ReSeg: A Recurrent Neural Network-Based Model for Semantic Segmentation

Attention to Scale: Scale-Aware Semantic Image Segmentation

Semantic Image Segmentation via Deep Parsing Network

ParseNet: Looking Wider to See Better

U-Net: Convolutional Networks for Biomedical Image Segmentation

Conditional Random Fields as Recurrent Neural Networks

Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

Fully convolutional networks for semantic segmentation

Microsoft COCO: Common Objects in Context

A new performance measure and evaluation benchmark for road detection algorithms

Adobe Photoshop 5.5 for Photographers: A professional image editor's guide to the creative use of Photoshop for the Macintosh and PC

A Survey of Augmented Reality

A PyTorch Semantic Segmentation Toolbox

and as an in

Field of Study

Journal Information

Name

Page

Venue Information

Name

Type

URL

Alternate Names