CROSS-LINGUAL ANNOTATION PROJECTION FOR THE DEVELOPMENT OF MALAY CORPUS

Cross-lingual annotation projection methods can benefit from rich-resourced languages to improve the performance of Natural Language Processing (NLP) tasks in less-resourced languages. In this research, Malay is experimented as the lessresourced language and English is experimented as the rich-re...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: ZAMIN, NORSHUHANI
التنسيق: أطروحة
اللغة:English
منشور في: 2014
الموضوعات:
الوصول للمادة أونلاين:http://utpedia.utp.edu.my/21305/1/2014%20-COMPUTER%20%26%20INFORMATION%20SCIENCES%20-%20CROSS-LINGUAL%20ANNOTATION%20PROJECTION%20FOR%20THE%20DEVELOPMENT%20OF%20MALAY%20CORPOS%20-%20NORSHUHANI%20ZAMIN.pdf
http://utpedia.utp.edu.my/21305/
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
الوصف
الملخص:Cross-lingual annotation projection methods can benefit from rich-resourced languages to improve the performance of Natural Language Processing (NLP) tasks in less-resourced languages. In this research, Malay is experimented as the lessresourced language and English is experimented as the rich-resourced language. The research is proposed to reduce the deadlock in Malay computational linguistic research due to the shortage of Malay tools and annotated corpus by exploiting stateof- the-art English tools. The aim of the research is to investigate a suitable crosslingual annotation projection based on word alignment of two languages with syntactical differences. A word alignment method known as MEW A (Malay-J;nglish Word Aligner) that integrates a Dice Coefficient and bigram string similarity measure with little supervision is proposed.