Learning deep representations by mutual information estimation and maximization (2018-08-20T00:00:00.000000Z)

TL;DR

It is shown that structure matters: incorporating knowledge about locality in the input into the objective can significantly improve a representation’s suitability for downstream tasks and is an important step towards flexible formulations of representation learning objectives for specific end-goals.

Abstract

This work investigates unsupervised learning of representations by maximizing mutual information between an input and the output of a deep neural network encoder. Importantly, we show that structure matters: incorporating knowledge about locality in the input into the objective can significantly improve a representation’s suitability for downstream tasks. We further control characteristics of the representation by matching to a prior distribution adversarially. Our method, which we call Deep InfoMax (DIM), outperforms a number of popular unsupervised learning methods and compares favorably with fully-supervised learning on several classification tasks in with some standard architectures. DIM opens new avenues for unsupervised learning of representations and is an important step towards flexible formulations of representation learning objectives for specific end-goals.

Authors

Yoshua Bengio

69 papers

R. Devon Hjelm

4 papers

A. Fedorov

1 papers

TL;DR

Abstract

Authors

References80 items

Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency

Invariant Information Clustering for Unsupervised Image Classification and Segmentation

Invariant Information Distillation for Unsupervised Image Segmentation and Clustering

Representation Learning with Contrastive Predictive Coding

Image-to-image translation for cross-domain disentanglement

MINE: Mutual Information Neural Estimation

Isolating Sources of Disentanglement in Variational Autoencoders

Which Training Methods for GANs do actually Converge?

Geometrical Insights for Implicit Generative Modeling

Learning Independent Features with Adversarial Nets for Non-linear ICA

Deep Adaptive Image Clustering

Multi-task Self-Supervised Visual Learning

Independently Controllable Factors

Semi-supervised Learning with GANs: Manifold Invariance with Improved Inference

Unsupervised Learning by Predicting Noise

Improved Training of Wasserstein GANs

Independently Controllable Features

Learning Discrete Representations via Information Maximizing Self-Augmented Training

Boundary-Seeking Generative Adversarial Networks

Towards Principled Methods for Training Generative Adversarial Networks

beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework

Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Improved Techniques for Training GANs

Adversarially Learned Inference

f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization

Adversarial Feature Learning

Density estimation using Real NVP

Context Encoders: Feature Learning by Inpainting

One-Shot Generalization in Deep Generative Models

Exploring the Limits of Language Modeling

Pixel Recurrent Neural Networks

Deep Residual Learning for Image Recognition

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Unsupervised Deep Embedding for Clustering Analysis

Adversarial Autoencoders

From Facial Parts Responses to Face Detection: A Deep Learning Approach

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

Unsupervised Visual Representation Learning by Context Prediction

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Deep Learning Face Attributes in the Wild

NICE: Non-linear Independent Components Estimation

Dynamic functional connectivity analysis reveals transient states of dysconnectivity in schizophrenia

Semi-supervised Learning with Deep Generative Models

Generative Adversarial Nets

Auto-Encoding Variational Bayes

Learning word embeddings efficiently with noise-contrastive estimation

Distributed Representations of Words and Phrases and their Compositionality

ImageNet classification with deep convolutional neural networks

Disentangling Factors of Variation for Facial Expression Recognition

Representation Learning: A Review and New Perspectives

A Kernel Two-Sample Test

Capturing inter-subject variability with group independent component analysis of fMRI data: A simulation study

An Analysis of Single-Layer Networks in Unsupervised Feature Learning

Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Noise-contrastive estimation: A new estimation principle for unnormalized statistical models

Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion

Prediction, Cognition and the Brain

Extracting and composing robust features with denoising autoencoders

Modulation of temporally coherent brain networks estimated using ICA at rest and during cognitive tasks

Linear and nonlinear ICA based on mutual information

Multiscale structural similarity for image quality assessment

Training Products of Experts by Minimizing Contrastive Divergence

Slow Feature Analysis: Unsupervised Learning of Invariances

Independent component analysis: algorithms and applications

Nonlinear independent component analysis: Existence and uniqueness results

An Information-Maximization Approach to Blind Separation and Blind Deconvolution

An information-theoretic unsupervised learning algorithm for neural networks

Learning Factorial Codes by Predictability Minimization

The self-organizing map

Self-organization in a perceptual network

Asymptotic evaluation of certain Markov process expectations for large time

Deep Variational Information Bottleneck

Shiming Xiang, and Chunhong Pan

Deep generative models for speech and images, 2017

We used the contractive penalty (found in Mescheder et al