1
Robust Speech Recognition via Large-Scale Weak Supervision
2
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
3
IDK-MRC: Unanswerable Questions for Indonesian Machine Reading Comprehension
4
Language Models are Multilingual Chain-of-Thought Reasoners
5
Masader Plus: A New Interface for Exploring +500 Arabic NLP Datasets
6
No Language Left Behind: Scaling Human-Centered Machine Translation
7
Emotion dataset from Indonesian public opinion
8
GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
9
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
10
NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages
11
Predicting the Category and the Length of Punishment in Indonesian Courts Based on Previous Court Decision Documents
12
Lifting the Curse of Multilinguality by Pre-training Modular Transformers
13
Sentiment Analysis in Karonese Tweet using Machine Learning
14
One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
15
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
16
IndicNLG Benchmark: Multilingual Datasets for Diverse NLG Tasks in Indic Languages
17
Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation
18
Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
19
CVSS Corpus and Massively Multilingual Speech-to-Speech Translation
20
Analisis Perbandingan Nilai Akurasi Mekanisme Attention Bahdanau dan Luong pada Neural Machine Translation Bahasa Indonesia ke Bahasa Melayu Ketapang dengan Arsitektur Recurrent Neural Network
21
Few-shot Learning with Multilingual Generative Language Models
22
FLAVA: A Foundational Language And Vision Alignment Model
23
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
24
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
25
IndoNLI: A Natural Language Inference Dataset for Indonesian
26
Causal and Masked Language Modeling of Javanese Language using Transformer-based Architectures
27
Masader: Metadata Sourcing for Arabic Text and Speech Data Resources
28
Visually Grounded Reasoning across Languages and Cultures
29
Pre-trained transformer-based language models for Sundanese
30
Greenformer: Factorization Toolkit for Efficient Deep Neural Networks
31
IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization
32
CoVoST 2 and Massively Multilingual Speech Translation
33
XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages
34
X-Fact: A New Benchmark Dataset for Multilingual Fact Checking
35
Findings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas
36
AM2iCo: Evaluating Word Meaning in Context across Low-Resource Languages with Adversarial Examples
37
Code-mixed sentiment analysis of Indonesian language and Javanese language using Lexicon based approach
38
ALUE: Arabic Language Understanding Evaluation
39
MasakhaNER: Named Entity Recognition for African Languages
40
Multimodal End-to-End Sparse Model for Emotion Recognition
41
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
42
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
43
Effect of mono corpus quantity on statistical machine translation Indonesian – Lampung dialect of nyo
44
BinaryBERT: Pushing the Limit of BERT Quantization
45
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
46
VOXLINGUA107: A Dataset for Spoken Language Recognition
47
Attention-based CNN-BiLSTM for Dialect Identification on Javanese Text
48
Sundanese Twitter Dataset for Emotion Classification
49
IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP
50
Liputan6: A Large-scale Indonesian Dataset for Text Summarization
51
Tree Rotations for Dependency Trees: Converting the Head-Directionality of Noun Phrases
52
mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer
53
WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization
54
Short Answer Grading Using Contextual Word Embedding and Linear Regression
55
TernaryBERT: Distillation-aware Ultra-low Bit BERT
56
Towards Computational Linguistics in Minangkabau Language: Studies on Sentiment Analysis and Machine Translation
57
IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding
58
Parsing Indonesian Sentence into Abstract Meaning Representation using Machine Learning Approach
59
Sequence-to-Sequence Learning for Indonesian Automatic Question Generator
60
CLICK-ID: A novel dataset for Indonesian clickbait headlines
61
TICO-19: the Translation Initiative for Covid-19
62
Compressing Neural Machine Translation Models with 4-bit Precision
63
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
64
English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too
65
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
66
Building the Old Javanese Wordnet
67
Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis
68
MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer
69
CLUE: A Chinese Language Understanding Evaluation Benchmark
70
XGLUE: A New Benchmark Datasetfor Cross-lingual Pre-training, Understanding and Generation
71
Improving the role of language model in statistical machine translation (Indonesian-Javanese)
72
The State and Fate of Linguistic Diversity and Inclusion in the NLP World
73
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
74
XPersona: Evaluating Multilingual Personalized Chatbot
75
PhoBERT: Pre-trained language models for Vietnamese
76
THE LANGUAGE CHOICE OF CHINESE COMMUNITY IN MEDAN: A SOCIOLINGUISTICS STUDY
77
Abusive Language Detection on Indonesian Online News Comments
78
Zero-Shot Code-Switching ASR and TTS with Multilingual Machine Speech Chain
79
Unsupervised Cross-lingual Representation Learning at Scale
80
Converting an Indonesian Constituency Treebank to the Penn Treebank Format
81
Normalization of Indonesian-English Code-Mixed Twitter Data
82
Lightweight and Efficient End-To-End Speech Recognition Using Low-Rank Transformer
83
CORD: A Consolidated Receipt Dataset for Post-OCR Parsing
84
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
85
Improving Bi-LSTM Performance for Indonesian Sentiment Analysis Using Paragraph Vector
86
Improving Joint Layer RNN based Keyphrase Extraction by Using Syntactical Features
87
Aspect and Opinion Terms Extraction Using Double Embeddings and Attention Mechanism for Indonesian Hotel Reviews
88
Multi-label Aspect Categorization with Convolutional Neural Networks and Extreme Gradient Boosting
89
KaWAT: A Word Analogy Task Dataset for Indonesian
90
Hate Speech Detection on Indonesian Long Text Documents Using Machine Learning Approach
91
Pengaruh Kuantitas Korpus Monolingual Terhadap Akurasi Mesin Penerjemah Statistik
92
Peningkatan Mesin Penerjemah Statistik dengan Menambah Kuantitas Korpus Monolingual (Studi Kasus : Bahasa Indonesia - Sunda)
93
Penggunaan Pivot Language pada Mesin Penerjemah Statistik Bahasa Inggris ke Bahasa Melayu Sambas
94
Chinese Ethnic Communication Pattern in the Environment of Indigenous People in Lhokseumawe, Indonesia
95
Aspect Detection and Sentiment Classification Using Deep Neural Network for Indonesian Aspect-Based Sentiment Analysis
96
Emotion Classification on Indonesian Twitter Dataset
97
Colloquial Indonesian Lexicon
98
Stance Classification Towards Political Figures on Blog Writing
99
Investigating Bi-LSTM and CRF with POS Tag Embedding for Indonesian Named Entity Tagger
100
Indosum: A New Benchmark Dataset for Indonesian Text Summarization
101
Crowd-Sourced Speech Corpora for Javanese, Sundanese, Sinhala, Nepali, and Bangladeshi Bengali
102
A Step-by-Step Process for Building TTS Voices Using Open Source Data and Frameworks for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese
103
Dialect and Identity: A Case Study of Javanese Use in WhatsApp and Line
104
Vocabulary Alignment for Collaborative Agents: a Study with Real-World Multilingual How-to Instructions
105
Pengaruh Metode Dictionary Lookup pada Cleaning Korpus Terhadap Akurasi Mesin Penerjemah Statistik Indonesia–Melayu Pontianak
106
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
107
When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation?
108
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
109
CLASSIFICATION OF CUSTOMERS EMOTION USING NAÏVE BAYES CLASSIFIER (Case Study: Natasha Skin Care)
110
Inset lexicon: Evaluation of a word list for Indonesian sentiment analysis in microblogs
111
Study of hoax news detection using naïve bayes classifier in Indonesian language
112
Modified DBpedia entities expansion for tagging automatically NER dataset
113
Hate speech detection in the Indonesian language: A dataset and preliminary study
114
Automatic open domain information extraction from Indonesian text
115
Experiments on coreference resolution for Indonesian language with lexical and shallow syntactic features
116
Meningkatkan Akurasi Pada Mesin Penerjemah Bahasa Indonesia Ke Bahasa Melayu Pontianak Dengan Part Of Speech
117
TUNING FOR QUALITY UNTUK UJI AKURASI MESIN PENERJEMAH STATISTIK (MPS) BAHASA INDONESIA - BAHASA DAYAK KANAYATN
118
Multilingualism and the West Kalimantan Hakka
119
KERANCUAN FONO-ORTOGRAFIS DAN ORTO-FONOLOGIS BAHASA INDONESIA RAGAM LISAN DAN TULIS
120
Multilingual Open Relation Extraction Using Cross-lingual Projection
121
Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus
122
Creating Indonesian-Javanese parallel corpora using wikipedia articles
123
LOCAL LANGUAGES IN INDONESIA: LANGUAGE MAINTENANCE OR LANGUAGE SHIFT?
124
Code Switching and Code Mixing in Indonesia: Study in Sociolinguistics?
125
Towards language preservation: Design and collection of graphemically balanced and parallel speech corpora of Indonesian ethnic languages
126
Universal Dependency Annotation for Multilingual Parsing
127
Towards language preservation: Preliminary collection and vowel analysis of Indonesian ethnic speech data
128
IDENTIC Corpus: Morphologically Enriched Indonesian-English Parallel Corpus
129
PERUBAHAN DAN PERKEMBANGAN BAHASA: Tinjauan Historis dan Sosiolinguistik
132
Resource Report: Building Parallel Text Corpora for Multi-Domain Translation System
133
A Two-Level Morphological Analyser for the Indonesian Language
134
A machine learning approach for indonesian question answering system
135
The Indonesian Language: Its History and Role in Modern Society
136
SALSA version 3.0: a single recognizer-based multilingual speech-based web browser
137
Ethnologue: Languages of the World
138
SALSA version 1.0: a speech-based web browser for hong kong English
139
Crosslingual Generalization through Multitask Finetuning
140
Poetry Generation for Indonesian Pantun : Comparison Between SeqGAN and GPT-2
141
cld3: Google’s Compact Language Detector 3
142
Postagged sundanese monolingual corpus
143
IndicXTREME: A Multi-Task Benchmark For Evaluating Indic Languages
144
Bigbio: A framework for datacentric biomedical natural language processing
145
The State of Multilingual AI
146
Kyokushoushugi ni motoduku heiretsu tsuriibanku no kouchiku [building a parallel treebank based on minimalism
147
Normalisation of Indonesian-English Code-Mixed Text and its Effect on Emotion Classification
148
Abusive Language and Hate Speech Detection for Javanese and Sundanese Languages in Tweets: Dataset and Preliminary Study
149
A Multi-Pass Sieve Coreference Resolution for Indonesian
150
MAD-G: Multilingual Adapter Generation for Efficient Cross-Lingual Transfer
151
IndoCollex: A testbed for morphological transformation of Indonesian colloquial words
152
WuDaoCorpora: A super large-scale Chinese corpora for pre-training language models
153
Multilingual Translation from Denoising Pre-Training
154
SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages
155
1,3 juta anak di ntt belum bisa berbahasa indonesia
156
2021b. From masked language modeling to translation: Non-english auxiliary tasks
157
2021b. IndoNLG: Benchmark and resources
158
Benchmarking multidomain EnglishIndonesian machine translation
159
On the Syntax of West Kalimantan: Asymmetries and A’-Movement in Malayic and Land Dayak Languages
160
International Conference on Asian Language Processing (IALP), pages 310–315
161
National Strategy for Artificial Intelligence 2020-2045 (2020) (Indonesian)
162
IndicNLPSuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian
163
Costs to consider in adopting nlp for your business
164
Building Cendana: a Treebank for Informal Indonesian
165
Interpersonal meaning annotation for Asian language corpora: The case of TUFS Asian Language Parallel Corpus (TALPCo)
166
A gold standard dependency treebank for Indonesian
167
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
168
Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter
169
Media Elektronik : 2580-0760 Penggunaan Bahasa Indonesia sebagai Pivot Language pada Mesin Penerjemah Madura-Sunda dengan Metode Transfer dan Triangulation
170
Pembangkitan deskripsi gambar dalam bahasa indonesia dengan pendekatan semantic compositional networks
171
A Dataset and Preliminaries Study for Abusive Language Detection in Indonesian Social Media
172
Cross-Lingual and Supervised Learning Approach for Indonesian Word Sense Disambiguation Task
173
Xnli: Evaluating crosslingual sentence representations
174
Semi-supervised Textual Entailment on Indonesian Wikipedia Data
175
Pengembangan kemampuan berbahasa indonesia siswa sekolah dasar desa terpencil melalui metode karyawisata berbasis potensi lokal
176
Cross-lingual Name Tagging and Linking for 282 Languages
177
Syntactic Variation Of Buginese, A Language In Austronesian Great Family
178
PERFECTIVE ASPECT AND NEGATION IN PONTIANAK TEOCHEW
179
Recent progress in developing grapheme-based speech recognition for Indonesian ethnic languages: Javanese, Sundanese, Balinese and Bataks
180
Named entity recognition for Indonesian text using hidden Markov model
181
Usage of Indonesian possessive verbal predicates: a statistical analysis based on questionnaire and storytelling surveys
183
Sundanese complementation
185
Distributed speech translation technologies
186
Head-final and head-initial relative clauses in jambi teochew
187
Quality and Intelligibility Assessment of Indonesian HMM-Based Speech Synthesis System
188
The Austronesian Languages
189
Voice and verb morphology in Minangkabau, a language of West Sumatra, Indonesia
190
Development of Indonesian Large Vocabulary Continuous Speech Recognition System within A-STAR Project
191
Development of HMM-based Indonesian Speech Synthesis
192
Malay dialects of the Batanghari river basin (Jambi, Sumatra)
193
A Large Vocabulary Continuous Speech Recognition System for Indonesian Language
195
Indonesian speech recognition for hearing and speaking impaired people
196
Balinese morphosyntax: a lexical-functional approach
197
JATI will be employed to build an ontology, in which knowledge is extracted from the semantic representation in Minimal Recursion Semantics (MRS) (Copestake
198
Preferred argument structure in an active language: Arguments against the category ‘intransitive subject’
200
A grammar of Acehnese on the basis of a dialect of north Aceh