3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in english-conversational-speech-recognition-8
No benchmarks available.
Use these libraries to find english-conversational-speech-recognition-8 models and implementations
No subtasks available.
We study the performance of customer intent classifiers designed to predict the most popular intent received through ASOS.com Customer Care Department, namely “Where is my order?”. These queries are characterised by the use of colloquialism, label noise and short message length. We conduct extensive experiments with twowell established classification models: logistic regression via n-grams to account for sequences in the dataand recurrent neural networks that perform the extraction of these sequential patterns automatically. Maintaining the embedding layer fixed to GloVe coordinates, a Mann-Whitney U test indicated that the F1 score on aheld out set of messages was lower for recurrent neural network classifiers than for linear n-grams classifiers (M1=0.828, M2=0.815; U=1,196, P=1.46e-20), unless all layers were jointly trained with all other network parameters (M1=0.831, M2=0.828, U=4,280, P=8.24e-4). This plain neural network produced top performance on a denoised set of labels (0.887 F1) matching with Human annotators (0.889 F1) and superior to linear classifiers (0.865 F1). Calibrating these models to achieveprecision levels above Human performance (0.93 Precision), our results indicate a small difference in Recall of 0.05 for the plain neural networks (training under 1hr), and 0.07 for the linear n-grams (training under 10min), revealing the latter as a judicious choice of model architecture in modern AI production systems.
Adding a benchmark result helps the community track progress.