3260 papers • 126 benchmarks • 313 datasets
Word sense induction (WSI) is widely known as the “unsupervised version” of WSD. The problem states as: Given a target word (e.g., “cold”) and a collection of sentences (e.g., “I caught a cold”, “The weather is cold”) that use the word, cluster the sentences according to their different senses/meanings. We do not need to know the sense/meaning of each cluster, but sentences inside a cluster should have used the target words with the same sense. Description from NLP Progress
(Image credit: Papersgraph)
These leaderboards are used to track progress in word-sense-induction-16
Use these libraries to find word-sense-induction-16 models and implementations
No subtasks available.
The Adaptive Skip-gram model is proposed which is a nonparametric Bayesian extension of Skip- Gram capable to automatically learn the required number of representations for all words at desired semantic resolution and derives efficient online variational learning algorithm for the model and empirically demonstrates its efficiency on word-sense induction task.
This paper proposes a simple method to learn a word representation, given any context, that only requires learning the usual single sense representation, and coefficients that can be learnt via a single pass over the data.
This work extends the previous method to support a dynamic rather than a fixed number of clusters as supported by other prominent methods, and proposes a method for interpreting the resulting clusters by associating them with their most informative substitutes.
RuDSI is a new benchmark for word sense induction (WSI) in Russian created using manual annotation and semi-automatic clustering of Word Usage Graphs (WUGs) using no external word senses imposed on annotators.
This paper describes a previously-proposed WSI methodology for the task, which is based on a Hierarchical Dirichlet Process (HDP), a nonparametric topic model, which requires no parameter tuning, uses the English ukWaC as an external resource, and achieves encouraging results over the shared task.
Two new automated semantic evaluations to three distinct latent topic models are applied, revealing that LDA and LSA each have different strengths; LDA best learns descriptive topics while LSA is best at creating a compact semantic representation of documents and words in a corpus.
This paper builds a weighted graph of synonyms extracted from commonly available resources, such as Wiktionary, and applies word sense induction to deal with ambiguous words, and clusters the disambiguated version of the ambiguous input graph into synsets.
This paper follows the framework of Skip-gram and presents three sememe-encoded models to learn representations of sememes, senses and words, where the attention scheme is applied to detect word senses in various contexts.
It is shown that word embedding models trained on small but balanced corpora can be superior to those trained on large but noisy data - not only in intrinsic evaluation, but also in downstream tasks like word sense induction.
Adding a benchmark result helps the community track progress.