Mobile Video Object Detection with Temporally-Aware Feature Maps (2017-11-17T00:00:00.000000Z)

TL;DR

This approach combines fast single-image object detection with convolutional long short term memory layers to create an inter-weaved recurrent-convolutional architecture that is substantially faster than existing detection methods in video and significantly reduces computational cost.

Abstract

This paper introduces an online model for object detection in videos designed to run in real-time on low-powered mobile and embedded devices. Our approach combines fast single-image object detection with convolutional long short term memory (LSTM) layers to create an inter-weaved recurrent-convolutional architecture. Additionally, we propose an efficient Bottleneck-LSTM layer that significantly reduces computational cost compared to regular LSTMs. Our network achieves temporal awareness by using Bottleneck-LSTMs to refine and propagate feature maps across frames. This approach is substantially faster than existing detection methods in video, outperforming the fastest single-frame models in model size and computational cost while attaining accuracy comparable to much more expensive single-frame models on the Imagenet VID 2015 dataset. Our model reaches a real-time inference speed of up to 15 FPS on a mobile CPU.

Authors

Mason Liu

3 papers

Menglong Zhu

7 papers

TL;DR

Abstract

Authors

References44 items

Detect to Track and Track to Detect

Focal Loss for Dense Object Detection

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Flow-Guided Feature Aggregation for Video Object Detection

NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

Deep Feature Flow for Video Recognition

PVANet: Lightweight Deep Neural Networks for Real-time Object Detection

Xception: Deep Learning with Depthwise Separable Convolutions

Spatially supervised recurrent convolutional neural networks for visual object tracking

R-FCN: Object Detection via Region-based Fully Convolutional Networks

Object Detection from Video Tubelets with Convolutional Neural Networks

T-CNN: Tubelets With Convolutional Neural Networks for Object Detection From Videos

Identity Mappings in Deep Residual Networks

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size

Seq-NMS for Video Object Detection

Quantized Convolutional Neural Networks for Mobile Devices

Deep Residual Learning for Image Recognition

SSD: Single Shot MultiBox Detector

Spatio-temporal video autoencoder with differentiable memory

Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting

You Only Look Once: Unified, Real-Time Object Detection

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

A Critical Review of Recurrent Neural Networks for Sequence Learning

Fast R-CNN

Distilling the Knowledge in a Neural Network

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

Long-term recurrent convolutional networks for visual recognition and description

Going deeper with convolutions

Very Deep Convolutional Networks for Large-Scale Image Recognition

ImageNet Large Scale Visual Recognition Challenge

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

Speeding up Convolutional Neural Networks with Low Rank Expansions

Intriguing properties of neural networks

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

ImageNet classification with deep convolutional neural networks

Long Short-Term Memory

Tensorflow object detection api, 2017

Software available from tensorflow.org

Tensor-Flow: Large-scale machine learning on heterogeneous systems

Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude

Field of Study

Journal Information

Name

Page