3260 papers • 126 benchmarks • 313 datasets
Punctuation Restoration
(Image credit: Papersgraph)
These leaderboards are used to track progress in punctuation-restoration-10
No benchmarks available.
Use these libraries to find punctuation-restoration-10 models and implementations
No subtasks available.
BARTpho is presented, which are the first public large-scale monolingual sequence-to-sequence models pre-trained for Vietnamese, and it is found that it is more effective than mBART on these two tasks.
An approach for automatic punctuation restoration with BERT models for English and Hungarian is presented, which achieves a macro-averaged $F_1$-score of 79.8 in English and 82.2 in Hungarian.
A multitask modeling approach is described as a system to restore punctuation in multiple high resource – Germanic (English and German), Romanic (French)– and low resource languages – Indo-Aryan (Hindi) Dravidian (Tamil) – that does not require extensive knowledge of grammar or syntax of a given language for both spoken and written form of text.
A token-level supervised contrastive learning method that aims at maximizing the distance of representation of different punctuation marks in the embedding space is proposed that obtains up to 3.2% absolute F1 improvement on the test set.
This work incorporates an external POS tagger and fuse its predicted labels into the existing language model to provide syntactic information, and proposes sequence boundary sampling (SBS) to learn punctuation positions more efficiently as a sequence tagging task.
A unified multimodal punctuation restoration framework, named UniPunc, to punctuate the mixed sentences with a single model that jointly represents audio and non-audio samples in a shared latent space, based on which the model learns a hybrid representation and punctuates both kinds of samples.
A novel F eature F usion framework based on two-type A ttentions (FFA) to alleviate the shortage of independent attention in punctuation restoration, which intro-duces a two-stream architecture.
Vakyansh, an end to end toolkit for Speech Recognition in Indic languages introduces automatic data pipelines for data creation, model training, model evaluation and deployment and hopes that this will inspire the speech community to develop speech first applications using the ASR models inIndic languages.
This work presents an approach for automatic punctuation of text using a pretrained IndicBERT model for 11 Indic languages namely Hindi, Tamil, Telugu, Kannada, Gujarati, Marathi, Odia, Bengali, Assamese, Malayalam and Punjabi.
This work adopts a slot-filling approach that predicts the presence and type of punctuation marks at each word boundary, similar to the Masked-Language Model approach employed during the pre-training stages of BERT but instead of predicting the masked word, the model predicts masked punctuation.
Adding a benchmark result helps the community track progress.