Towards a Malay derivational lexicon: learning affixes using expectation maximization

We propose an unsupervised training method to guide the learning of Malay derivational morphology from a set of morphological segmentations produced by a na¨ıve morphological analyzer. Using a morphology-based language model, we first estimate the probability of a given segmentation. We train the...

Full description

Saved in:
Bibliographic Details
Main Authors: Sulaiman, Suriani, Gasser, Michael, Kubler, Sandra
Format: Proceeding Paper
Language:en
Published: 2011
Subjects:
Online Access:http://irep.iium.edu.my/32082/1/W11-3005.pdf
http://irep.iium.edu.my/32082/
http://aclweb.org/anthology//W/W11/W11-3005.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We propose an unsupervised training method to guide the learning of Malay derivational morphology from a set of morphological segmentations produced by a na¨ıve morphological analyzer. Using a morphology-based language model, we first estimate the probability of a given segmentation. We train the model with EM to find the segmentation that maximizes the probability of each morpheme. We extract the set of affix patterns produced by our algorithm and evaluate them against two references: a list of affix patterns extracted from our hand-segmented derivational wordlist and a derivational history produced by a stemmer.