computer-vision-11

Activity Detection

3260 papers • 126 benchmarks • 313 datasets

Detecting activities in extended videos.

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in activity-detection-11

Trend

Dataset

Best Model

Actions

AVA-Speech

Libraries

i

Use these libraries to find activity-detection-11 models and implementations

alibaba-damo-academy/FunASR

3 papers 3,012

Datasets

AVA

Toyota Smarthome Dataset

ROAD

MEVA

TSU

AVA-Speech

Subtasks

No subtasks available.

Most implemented papers

An End-to-End Architecture for Keyword Spotting and Voice Activity Detection

Awni Y. Hannun, Christopher T. Lengerich•Sun Nov 27 2016

Novel inference algorithms for an end-to-end Recurrent Neural Network trained with the Connectionist Temporal Classification loss function are developed which allow the model to achieve high accuracy on both keyword spotting and voice activity detection without retraining.

48

Content

Home Action Genome

0

R-C3D: Region Convolutional 3D Network for Temporal Activity Detection

Kate Saenko, Huijuan Xu, Abir Das•Tue Mar 21 2017

A new model, Region Convolutional 3D Network (R-C3D), is introduced, which encodes the video streams using a three-dimensional fully convolutional network, then generates candidate temporal regions containing activities, and finally classifies selected regions into specific activities.

753 0

Paper Graph

rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method

Zheng-Hua Tan, A. Sarkar, N. Dehak•Sat Jun 08 2019

A modified version of rVAD is presented where computationally intensive pitch extraction is replaced by computationally efficient spectral flatness calculation, which significantly reduces the computational complexity at the cost of moderately inferior VAD performance, which is an advantage when processing a large amount of data and running on low resource devices.

145 0

Paper Graph

Fine-Grained Activity Recognition in Baseball Videos

M. Ryoo, A. Piergiovanni•Sun Apr 08 2018

This paper experimentally compares various recognition approaches capturing temporal structure in activity videos, by classifying segmented videos and extending those approaches to continuous videos and finds that learning temporal structure is valuable for fine-grained activity recognition.

82 0

Paper Graph

Pyannote.Audio: Neural Building Blocks for Speaker Diarization

Pavel Korshunov, Marvin Lavechin, Hadrien Titeux, H. Bredin, Ruiqing Yin, Juan Manuel Coria, G. Gelly, D. Fustes, Wassim Bouaziz, Marie-Philippe Gill•Sun Nov 03 2019

This work introduces pyannote.audio, an open-source toolkit written in Python for speaker diarization, which provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker darization pipelines.

397 0

Paper Graph

Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization

L. Burget, Federico Landini, M. Díez, Alicia Lozano-Diez•Fri Nov 11 2022

This work creates SC with multiple speakers per conversation and shows that they allow for substantially better performance than SM, also reducing the dependence on a fine-tuning stage.

23 0

Paper Graph

Learning Latent Super-Events to Detect Multiple Activities in Videos

M. Ryoo, A. Piergiovanni•Mon Dec 04 2017

The approach is designed to be fully differentiable, enabling end-to-end learning of latent super-event representations jointly with the activity detector using them.

94 0

Paper Graph

Personal VAD: Speaker-Conditioned Voice Activity Detection

Shuo-yiin Chang, Quan Wang, Li Wan, I. López-Moreno, Shaojin Ding•Sun Aug 11 2019

This system is useful for gating the inputs to a streaming on-device speech recognition system, such that it only triggers for the target user, which helps reduce the computational cost and battery consumption, especially in scenarios where a keyword detector is unpreferable.

89 0

Paper Graph

Harvesting Ambient RF for Presence Detection Through Deep Learning

Yang Liu, Tiexing Wang, Yuexin Jiang, Biao Chen•Wed Feb 12 2020

The proposed deep-learning-based RF sensing achieves near-perfect presence detection during multiple extended periods of test and exhibits superior performance compared with leading edge passive infrared sensors.

28 0

Paper Graph

Adding a benchmark result helps the community track progress.