Morphological segmentation and analysis of Bangla text

This paper deals with lexicon and system development for word segmentation in Bangla language.Our goal in this paper is to develop a morphological segmentation algorithm that can work well for Bangla and to address the problem of unsupervised word segmentation across different languages.From a hand...

Full description

Saved in:
Bibliographic Details
Main Authors: Saha, G C, Saha, Hasi, Che Mat, Ruzinoor, Khan, Nur Hossain, Sarker, Bappa
Format: Article
Language:English
Published: Faculty of Computing, Universiti Teknologi Malaysia 2016
Subjects:
Online Access:http://repo.uum.edu.my/21406/1/IJIDM%204%203%202016%20%2015%2020.pdf
http://repo.uum.edu.my/21406/
http://ijidm.org/wp-content/uploads/IJIDM-04-03-03.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper deals with lexicon and system development for word segmentation in Bangla language.Our goal in this paper is to develop a morphological segmentation algorithm that can work well for Bangla and to address the problem of unsupervised word segmentation across different languages.From a hand-corrected Bangla corpus, 5000 popular words were segmented into suffixes, prefixes and roots manually.These were the sample lexicon used as seed for next step. A system was developed using C language to automate the Segmentation process based on hand made lexical database.The System was evaluated on several pages of Bangla text and achieved a success rate of about 83.05%.In our observation the system will work with full success if twice the volume of lexicon database and this system may have a huge impact particularly to learn and use Bangla for the people which will enhance their socio-economic life greatly.