natural-language-processing-10

Punctuation Restoration

3260 papers • 126 benchmarks • 313 datasets

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in punctuation-restoration-10

Trend

Dataset

Best Model

Actions

No benchmarks available.

Libraries

i

Use these libraries to find punctuation-restoration-10 models and implementations

Datasets

LEPISZCZE

Turkish Punctuation Restoration

Subtasks

No subtasks available.

Most implemented papers

BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese

Dat Quoc Nguyen, Nguyen Luong Tran, Duong Minh Le•Sun Sep 19 2021

BARTpho is presented, which are the first public large-scale monolingual sequence-to-sequence models pre-trained for Vietnamese, and it is found that it is more effective than mBART on these two tasks.

67

Content

0

Paper Graph

Automatic punctuation restoration with BERT models

A. Nagy, Bence Bial, Judit Ács•Sun Jan 17 2021

An approach for automatic punctuation restoration with BERT models for English and Hungarian is presented, which achieves a macro-averaged $F_1$-score of 79.8 in English and 82.2 in Hungarian.

27 0

Paper Graph

PunKtuator: A Multilingual Punctuation Restoration System for Spoken and Written Text

Varnith Chordia•Thu Dec 31 2020

A multitask modeling approach is described as a system to restore punctuation in multiple high resource – Germanic (English and German), Romanic (French)– and low resource languages – Indo-Aryan (Hindi) Dravidian (Tamil) – that does not require extensive knowledge of grammar or syntax of a given language for both spoken and written form of text.

15 0

Paper Graph

Token-Level Supervised Contrastive Learning for Punctuation Restoration

Xubo Liu, Qiushi Huang, Tom Ko, Lilian Tang, Boyong Wu•Sun Jul 18 2021

A token-level supervised contrastive learning method that aims at maximizing the distance of representation of different punctuation marks in the embedding space is proposed that obtains up to 3.2% absolute F1 improvement on the test set.

24 0

Paper Graph

Incorporating External POS Tagger for Punctuation Restoration

Zhouhan Lin, Ning Shi, Boxin Wang, Wei Wang, Xiangyu Liu, Jinfeng Li•Fri Jun 11 2021

This work incorporates an external POS tagger and fuse its predicted labels into the existing language model to provide syntactic information, and proposes sequence boundary sampling (SBS) to learn punctuation positions more efficiently as a sequence tagging task.

10 0

Paper Graph

Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus

Yaoming Zhu, Mingxuan Wang, Liwei Wu, Shanbo Cheng•Sun Jan 23 2022

A unified multimodal punctuation restoration framework, named UniPunc, to punctuate the mixed sentences with a single model that jointly represents audio and non-audio samples in a shared latent space, based on which the model learns a hybrid representation and punctuates both kinds of samples.

14 0

Paper Graph

A Context-Aware Feature Fusion Framework for Punctuation Restoration

Yangjun Wu, Kebin Fang•Tue Mar 22 2022

A novel F eature F usion framework based on two-type A ttentions (FFA) to alleviate the shortage of independent attention in punctuation restoration, which intro-duces a two-stream architecture.

4 0

Paper Graph

Vakyansh: ASR Toolkit for Low Resource Indic languages

Vivek Raghavan, H. Chadha, Anirudh Gupta, Priyanshi Shah, Neeraj Chhimwal, Ankur Dhuriya, Rishabh Gaur•Tue Mar 29 2022

Vakyansh, an end to end toolkit for Speech Recognition in Indic languages introduces automatic data pipelines for data creation, model training, model evaluation and deployment and hopes that this will inspire the speech community to develop speech first applications using the ASR models inIndic languages.

20 0

Paper Graph

indic-punct: An automatic punctuation restoration and inverse text normalization framework for Indic languages

Vivek Raghavan, H. Chadha, Anirudh Gupta, Priyanshi Shah, Neeraj Chhimwal, Ankur Dhuriya, Rishabh Gaur•Wed Mar 30 2022

This work presents an approach for automatic punctuation of text using a pretrained IndicBERT model for 11 Indic languages namely Hindi, Tamil, Telugu, Kannada, Gujarati, Marathi, Odia, Bengali, Assamese, Malayalam and Punjabi.

5 0

Paper Graph

Punctuation Restoration for Singaporean Spoken Languages: English, Malay, and Mandarin

Abhinav Rao, Ho Thi-Nga, Chng Eng Siong•Sun Nov 06 2022

This work adopts a slot-filling approach that predicts the presence and type of punctuation marks at each word boundary, similar to the Masked-Language Model approach employed during the pre-training stages of BERT but instead of predicting the masked word, the model predicts masked punctuation.

5 0

Paper Graph

Adding a benchmark result helps the community track progress.