Reuters Corpus Volume 1
Introduced in RCV1: A New Benchmark Collection for Text Categorization Research2003
The RCV1 dataset is a benchmark dataset on text categorization. It is a collection of newswire articles producd by Reuters in 1996-1997. It contains 804,414 manually labeled newswire documents, and categorized with respect to three controlled vocabularies: industries, topics and regions.
Source: Random Projections for Linear Support Vector Machines Image Source: https://www.nasdaq.com/publishers/reuters