A new corpus of Weibo messages annotated for both name and nominal mentions is presented and a joint training objective for the embeddings that makes use of both (NER) labeled and unlabeled raw text is proposed.
We consider the task of named entity recognition for Chinese social media. The long line of work in Chinese NER has focused on formal domains, and NER for social media has been largely restricted to English. We present a new corpus of Weibo messages annotated for both name and nominal mentions. Additionally, we evaluate three types of neural embeddings for representing Chinese text. Finally, we propose a joint training objective for the embeddings that makes use of both (NER) labeled and unlabeled raw text. Our methods yield a 9% improvement over a stateof-the-art baseline.