Introduced in NusaCrowd: Open Source Initiative for Indonesian NLP Resources2022
NusaCrowd is a collaborative initiative to collect and unite existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, the authors have has brought together 137 datasets and 117 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their effectiveness has been demonstrated in multiple experiments.
Source: https://arxiv.org/pdf/2212.09648v2.pdf
Image Source: NusaCrowd: Open Source Initiative for Indonesian NLP Resources