Long-Term Visual Object Tracking with Event Cameras: An Associative Memory Augmented Tracker and A Benchmark Dataset (2024-03-09T00:00:00.000000Z)

TL;DR

A new long-term, large-scale frame-event visual object tracking dataset, termed FELT, which follows a one-stream tracking framework and aggregates the multi-scale RGB/event template and search tokens effectively via the Hopfield retrieval layer and a novel Associative Memory Transformer based RGB-Event long-term visual tracker, termed AMTTrack.

Abstract

Existing event stream based trackers undergo evaluation on short-term tracking datasets, however, the tracking of real-world scenarios involves long-term tracking, and the performance of existing tracking algorithms in these scenarios remains unclear. In this paper, we first propose a new long-term, large-scale frame-event visual object tracking dataset, termed FELT. It contains 1,044 long-term videos that involve 1.9 million RGB frames and event stream pairs, 60 different target objects, and 14 challenging attributes. To build a solid benchmark, we retrain and evaluate 21 baseline trackers on our dataset for future work to compare. In addition, we propose a novel Associative Memory Transformer based RGB-Event long-term visual tracker, termed AMTTrack. It follows a one-stream tracking framework and aggregates the multi-scale RGB/event template and search tokens effectively via the Hopfield retrieval layer. The framework also embodies another aspect of associative memory by maintaining dynamic template representations through an associative memory update scheme, which addresses the appearance variation in long-term tracking. Extensive experiments on FELT, FE108, VisEvent, and COESOT datasets fully validated the effectiveness of our proposed tracker. Both the dataset and source code will be released on https://github.com/Event-AHU/FELT_SOT_Benchmark

Authors

Shiao Wang

2 papers

Bowei Jiang

4 papers

Xiao Wang

1 papers

TL;DR

Abstract

Authors

References66 items

Exploring Historical Information for RGBE Visual Tracking with Mamba

Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking

Cross-Modality Distillation for Multi-Modal Tracking

MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking

Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking

Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-Ray Report Generation

Exploiting Memory-aware Q-distribution Prediction for Nuclear Fusion via Modern Hopfield Network

MambaEVT: Event Stream-Based Visual Object Tracking Using State Space Model

SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking

Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers

Explicit Visual Prompts for Visual Object Tracking

CRSOT: Cross-Resolution Object Tracking Using Unaligned Frame and Event Cameras

ODTrack: Online Dense Temporal Token Learning for Visual Tracking

ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe

Single-Model and Any-Modality for Video Object Tracking

HIPTrack: Visual Tracking with Historical Prompts

Distractor-Aware Event-Based Tracking

Event Stream-Based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline

Robust Object Modeling for Visual Tracking

Cross-modal Orthogonal High-rank Augmentation for RGB-Event Transformer-trackers

SeqTrack: Sequence to Sequence Learning for Visual Object Tracking

Frame-Event Alignment and Fusion Network for High Frame Rate Tracking

Generalized Relation Modeling for Transformer Tracking

Visual Prompt Multi-Modal Tracking

Revisiting Color-Event based Tracking: A Unified Network, Dataset, and Metric

Txt2Img-MHN: Remote Sensing Image Generation From Text Using Modern Hopfield Networks

AiATrack: Attention in Attention for Transformer Visual Tracking

Spiking Transformers for Event-based Single Object Tracking

Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework

MixFormer: End-to-End Tracking with Iterative Mixed Attention

Transforming Model Prediction for Tracking

Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking

Improving Few- and Zero-Shot Reaction Template Prediction Using Modern Hopfield Networks

Object Tracking by Jointly Exploiting Frame and Event Domain

VisEvent: Reliable Object Tracking via Collaboration of Frame and Event Flows

Learning Spatio-Temporal Transformer for Visual Tracking

Learning Target Candidate Association to Keep Track of What Not to Track

Transformer Tracking

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Hopfield Networks is All You Need

Modern Hopfield Networks and Attention for Immune Repertoire Classification

Probabilistic Regression for Visual Tracking

Know Your Surroundings: Exploiting Scene Information for Object Tracking

Siamese Box Adaptive Network for Visual Tracking

Siam R-CNN: Visual Tracking by Re-Detection

SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking

SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines

Tracking Holistic Object Representations

Event-Based Vision: A Survey

Learning Discriminative Model Prediction for Tracking

Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression

ATOM: Accurate Tracking by Overlap Maximization

CornerNet: Detecting Objects as Paired Keypoints

High Performance Visual Tracking with Siamese Region Proposal Network

Event-Based Moving Object Detection and Tracking

DVS Benchmark Datasets for Object Tracking, Action Recognition, and Object Recognition

Dense Associative Memory for Pattern Recognition

Combined frame- and event-based detection and tracking

Huge Storage Capacity

Neural networks and physical systems with emergent collective computational abilities.

[A model of associative memory].

Exploring the Feature Extraction and Relation Modeling For Light-Weight Transformer Tracking

Learning Graph-embedded Key-event Back-tracing for Object Tracking in Event Clouds

CLNet: A Compact Latent Network for Fast Adjusting Siamese Trackers

Less is more: Token context-aware learning for object tracking

Field of Study