speech-5

Keyword Spotting

3260 papers • 126 benchmarks • 313 datasets

In speech processing, keyword spotting deals with the identification of keywords in utterances. ( Image credit: Simon Grest )

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in keyword-spotting-11

Trend

Dataset

Best Model

Actions

QUESST

Google Speech Commands

hey Siri

Libraries

i

Use these libraries to find keyword-spotting-11 models and implementations

PaddlePaddle/PaddleSpeech

2 papers 10,304

Datasets

Speech Commands

TAU Urban Acoustic Scenes 2019

VoxForge

PodcastFillers

FKD

Google Speech Commands - Musan

Subtasks

Small-Footprint Keyword Spotting Visual Keyword Spotting

Most implemented papers

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

Pete Warden•Sun Apr 08 2018

An audio dataset of spoken words designed to help train and evaluate keyword spotting systems and suggests a methodology for reproducible and comparable accuracy metrics for this task.

1871

Content

TensorFlow

TAU Urban Acoustic Scenes 2019

VoxForge

FKD

Google Speech Commands V2 35

Google Speech Commands V2 12

Google Speech Commands (v2)

holgerbovbjerg/data2vec-kws

2 papers 20

EmoSpeech

Auto-KWS

0

Paper Graph

Hello Edge: Keyword Spotting on Microcontrollers

Liangzhen Lai, V. Chandra, Yundong Zhang, Naveen Suda•Sun Nov 19 2017

It is shown that it is possible to optimize these neural network architectures to fit within the memory and compute constraints of microcontrollers without sacrificing accuracy, and the depthwise separable convolutional neural network (DS-CNN) is explored and compared against other neural network architecture.

474 0

Paper Graph

Keyword Transformer: A Self-Attention Model for Keyword Spotting

Axel Berg, M. O'Connor, M. T. Cruz•Wed Mar 31 2021

The Keyword Transformer (KWT), a fully self-attentional architecture that exceeds state-of-the-art performance across multiple tasks without any pre-training or additional data, is introduced.

168 0

Paper Graph

TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech

Shang-Wen Li, Hung-yi Lee, Andy T. Liu•Sat Jul 11 2020

A self-supervised speech pre-training method called TERA, which stands for Transformer Encoder Representations from Alteration, is introduced and it is shown the proposed method is transferable to downstream datasets not used in pre- training.

397 0

Paper Graph

Low-Power Audio Keyword Spotting using Tsetlin Machines

F. Kawsar, Akhil Mathur, Jie Lei, Tousif Rahman, R. Shafik, A. Wheeldon, A. Yakovlev, Ole-Christoffer Granmo•Tue Jan 26 2021

A TM-based keyword spotting pipeline is explored to demonstrate low complexity with faster rate of convergence compared to NNs and investigate the scalability with increasing keywords and explore the potential for enabling low-power on-chip KWS.

46 0

Paper Graph

Honk: A PyTorch Reimplementation of Convolutional Neural Networks for Keyword Spotting

Jimmy J. Lin, Raphael Tang•Tue Oct 17 2017

Honk, an open-source PyTorch reimplementation of convolutional neural networks for keyword spotting that are included as examples in TensorFlow, is described and provides a starting point for future work on the keyword spotting task.

36 0

Paper Graph

READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

R. Labahn, Tobias Grüning, Markus Diem, Florian Kleber, Stefan Fiel•Mon May 08 2017

This paper collects and annotates 2036 archival document images from different locations and time periods and proposes a new evaluation scheme that is based on baselines, which has no need for binarization and it can handle skewed as well as rotated text lines.

71 0

Paper Graph

Deep Residual Learning for Small-Footprint Keyword Spotting

Jimmy J. Lin, Raphael Tang•Fri Oct 27 2017

This work explores the application of deep residual learning and dilated convolutions to the keyword spotting task, using the recently-released Google Speech Commands Dataset as a benchmark and establishes an open-source state-of-the-art reference to support the development of future speech-based interfaces.

252 0

Paper Graph

Efficient Keyword Spotting Using Dilated Convolutions and Gating

A. Coucke, David Leroy, Thibault Gisselbrecht, Thibaut Lavril, M. Chlieh, Mathieu Poumeyrol•Sun Nov 18 2018

A model inspired by the recent success of dilated convolutions in sequence modeling applications, allowing to train deeper architectures in resource-constrained configurations, and applies a custom target labeling that back-propagates loss from specific frames of interest, therefore yielding higher accuracy and only requiring to detect the end of the keyword.

108 0

Paper Graph

AST: Audio Spectrogram Transformer

Yuan Gong, James R. Glass, Yu-An Chung•Sun Apr 04 2021

The Audio Spectrogram Transformer is introduced, the first convolution-free, purely attention-based model for audio classification, which achieves new state-of-the-art results on various audio classification benchmarks.

1188 0

Paper Graph

Adding a benchmark result helps the community track progress.