3260 papers • 126 benchmarks • 313 datasets
A prevalent use case of topic models is that of topic discovery. However, most of the topic model evaluation methods rely on abstract metrics such as perplexity or topic coherence. The topic coverage approach is to measure the models' performance by matching model-generated topics to a fixed set of reference topics - topics discovered by humans and represented in a machine-readable format. This way, the models are evaluated in the context of their use, by essentially simulating topic modeling in a fixed setting defined by a text collection and a set of reference topics. Reference topics represent a ground truth that can be used to evaluate both topic models and other measures of model performance. This coverage approach enables large-scale automatic evaluation of existing and future topic models.
(Image credit: Papersgraph)
These leaderboards are used to track progress in topic-coverage
Use these libraries to find topic-coverage models and implementations
No subtasks available.
An approach to topic model evaluation based on measuring topic coverage is investigated, and measures of coverage based on matching between model topics and reference topics are proposed.
Experimental results show that RankAE significantly outperforms other unsupervised methods and is able to generate high-quality summaries in terms of relevance and topic coverage.
A large-scale general Meeting Understanding and Generation Benchmark (MUG) is established to benchmark the performance of a wide range of SLP tasks, including topic segmentation, topic-level and session-level extractive summarization and topic title generation, keyphrase extraction, and action item detection.
A retrieval-enhanced framework to create training data from a general-domain unlabeled corpus for zero-shot learning, which achieves 4.3% gain over the strongest baselines and saves around 70% of the time compared to baselines using large NLG models.
This work leverages large language models for dialogue augmentation in the task of emotional support conversation (ESC) to prompt a fine-tuned language model to complete full dialogues from available dialogue posts of various topics, which are then postprocessed based on heuristics.
Adding a benchmark result helps the community track progress.