3260 papers • 126 benchmarks • 313 datasets
Part of Speech Tagging
(Image credit: Papersgraph)
These leaderboards are used to track progress in pos-tagging-4
No benchmarks available.
Use these libraries to find pos-tagging-4 models and implementations
No datasets available.
No subtasks available.
A novel neutral network architecture is introduced that benefits from both word- and character-level representations automatically, by using combination of bidirectional LSTM, CNN and CRF, thus making it applicable to a wide range of sequence labeling tasks.
Overall, it is found that the similarity between the percentage of words that get split into subwords in the source and target data (the isplit word ratio difference/i) is the strongest predictor for model performance on target data.
This study proposes to use BLSTM-RNN with word embedding for part-of-speech (POS) tagging task and can also achieve a good performance comparable with the Stanford POS tagger.
The effects of transfer learning for deep hierarchical recurrent networks across domains, applications, and languages are examined, and it is shown that significant improvement can often be obtained.
The new toolkit, named PKUSEG, targets at multi-domain word segmentation and provides separate models for different domains, such as web, medicine, and tourism, and supports POS tagging and model training to adapt to various application scenarios.
This work presents a novel bi-LSTM model, which combines the POS tagging loss function with an auxiliary loss function that accounts for rare words, which obtains state-of-the-art performance across 22 languages, and works especially well for morphologically complex languages.
A sequence labeling framework with a secondary training objective, learning to predict surrounding words for every word in the dataset, which incentivises the system to learn general-purpose patterns of semantic and syntactic composition, useful for improving accuracy on different sequence labeling tasks.
Evaluating different word embedding models trained on a large Portuguese corpus, including both Brazilian and European variants, suggests that word analogies are not appropriate forword embedding evaluation; task-specific evaluations appear to be a better option.
It is found that a number of probing tests have significantly high positive correlation to the downstream tasks, especially for morphologically rich languages, and these tests can be used to explore word embeddings or black-box neural models for linguistic cues in a multilingual setting.
Experiments on handwriting recognition and joint Chinese word segmentation/POS tagging show that segmental recurrent neural networks obtain substantially higher accuracies compared to models that do not explicitly represent segments.
Adding a benchmark result helps the community track progress.