RuDSI: Graph-based Word Sense Induction Dataset for Russian (2022-09-28T00:00:00.000000Z)

TL;DR

RuDSI is a new benchmark for word sense induction (WSI) in Russian created using manual annotation and semi-automatic clustering of Word Usage Graphs (WUGs) using no external word senses imposed on annotators.

Abstract

We present RuDSI, a new benchmark for word sense induction (WSI) in Russian. The dataset was created using manual annotation and semi-automatic clustering of Word Usage Graphs (WUGs). RuDSI is completely data-driven (based on texts from Russian National Corpus), with no external word senses imposed on annotators. We present and analyze RuDSI, describe our annotation workflow, show how graph clustering parameters affect the dataset, report the performance that several baseline WSI methods obtain on RuDSI and discuss possibilities for improving these scores.

Authors

Andrey Kutuzov

3 papers

A. Aksenova

1 papers

Ekaterina Gavrishina

1 papers

TL;DR

Abstract

Authors

References31 items

Three-part diachronic semantic change dataset for Russian

Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks

Fuzzy graph clustering

DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages

HPC Resources of the Higher School of Economics

RuSemShift: a dataset of historical lexical semantic change in Russian

SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection

Word Sense Disambiguation for 158 Languages using Word Embeddings Only

Towards better substitution-based word sense induction

RUSSE'2018: A Shared Task on Word Sense Induction for the Russian Language

On the Active Dictionary of Russian

Testing the Robustness of Laws of Polysemy and Brevity Versus Frequency

Word Sense Clustering and Clusterability

WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic Models

On comparing partitions

SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses

MaxMax: A Graph-Based Soft Clustering Algorithm Applied to Word Sense Induction

SemEval-2010 Task 14: Word Sense Induction &Disambiguation

SemEval-2007 Task 02: Evaluating Word Sense Induction and Discrimination Systems

"I Don’t Believe in Word Senses"

Prinzipien des lexikalischen Bedeutungswandels am Beispiel der romanischen Sprachen

BIRCH: an efficient data clustering method for very large databases

WordNet: A Lexical Database for English

A Semantic Concordance

Word Sense Induction for Russian Texts Using BERT

RussianSuperGLUE: A Russian language

Correlation Clustering

Creating Russian WordNet by Conversion

Ol'ga Lyashevskaya, Tatyana Reznikova, and Ol'ga Shemanayeva

Chastotny'j slovar' sovremennogo russkogo yazy'ka: na materialax Nacional'nogo korpusa russkogo yazy'ka

The meaning-frequency relationship of words.

Field of Study

Journal Information

Name

Volume

Venue Information

Name

Type

URL

Alternate Names