MBT: A Memory-Based Part of Speech Tagger-Generator (1996-07-01T00:00:00.000000Z)

TL;DR

A large-scale application of the memory-based approach to part of speech tagging is shown to be feasible, obtaining a tagging accuracy that is on a par with that of known statistical approaches, and with attractive space and time complexity properties when using IGTree, a tree-based formalism for indexing and searching huge case bases.

Abstract

We introduce a memory-based approach to part of speech tagging. Memory-based learning is a form of supervised learning based on similarity-based reasoning. The part of speech tag of a word in a particular context is extrapolated from the most similar cases held in memory. Supervised learning approaches are useful when a tagged corpus is available as an example of the desired output of the tagger. Based on such a corpus, the tagger-generator automatically builds a tagger which is able to tag new text the same way, diminishing development time for the construction of a tagger considerably. Memory-based tagging shares this advantage with other statistical or machine learning approaches. Additional advantages specific to a memory-based approach include (i) the relatively small tagged corpus size sufficient for training, (ii) incremental learning, (iii) explanation capabilities, (iv) flexible integration of information in case representations, (v) its non-parametric nature, (vi) reasonably good results on unknown words without morphological analysis, and (vii) fast learning and tagging. In this paper we show that a large-scale application of the memory-based approach is feasible: we obtain a tagging accuracy that is on a par with that of known statistical approaches, and with attractive space and time complexity properties when using IGTree, a tree-based formalism for indexing and searching huge case bases. The use of IGTree has as additional advantage that optimal context size for disambiguation is dynamically computed. 1 I n t r o d u c t i o n Part of Speech (POS) tagging is a process in which syntactic categories are assigned to words. It can be seen as a mapping from sentences to strings of tags. Automatic tagging is useful for a number of applications: as a preprocessing stage to parsing, in information retrieval, in text to speech systems, in corpus linguistics, etc. The two factors determining the syntactic category of a word are its lexical probability (e.g. without context, man is more probably a noun than a verb), and its contextual probability (e.g. after a pronoun, man is more probably a verb than a noun, as in they man the boats). Several approaches have been proposed to construct automatic taggers. Most work on statistical methods has used n-gram models or Hidden Markov Model-based taggers (e.g. Church, 1988; DeRose, 1988; Cutting et al. 1992; Merialdo, 1994, etc.). In

TL;DR

Abstract

Authors

References37 items

Beyond Word N-Grams

Domain-specific knowledge acquisition for conceptual sentence analysis

Part-of-Speech Tagging With Neural Networks

Oblivious Decision Trees and Abstract Cases

Tagging English Text with a Probabilistic Model

Are rules and modules really necessary for explaining language?

A Case-Based Approach to Knowledge Acquisition for Domain-Specific Sentence Analysis

Using Decision Trees to Improve Case-Based Learning

Memory-based lexical acquisition and processing

C4.5: Programs for Machine Learning

A Practical Part-of-Speech Tagger

A Simple Rule-Based Part of Speech Tagger

Computer Systems That Learn

Real-Time Morphology: Symbolic Rules or Analogical Networks?

Acquiring Disambiguation Rules From Text

The computational analysis of English : a corpus-based approach

The Computational Analysis of English—A Corpus‐Based Approach

A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text

Toward memory-based reasoning

Analogy, computation and linguistic theory

`IGTree: Using Trees

Analogical natural language processing

Proceedings Third Workshop on Very Large Corpora

Part-of-speech Tagging Using a Variable Context Markov Model' Proceedings of ACL

Morphological Tagging Based Entirely on Bayesian Inference

Natural language processing

`A weighted nearest neighbour algorithm for learning

Generalization performance of backpropagation learning on a syllabification task

Virtuele Grammatica's en Creatieve Algoritmen

Analogical Modeling of Language

Grammatical Category Disambiguation by Statistical Optimization

Case-Based Reasoning

Parallel Networks that Learn to Pronounce English Text

Parallel Networks that Learn to Pronounce

Automatic Grammatical Tagging of English

Experiments in induction

`A grammatical approach to grammatical coding