A language identifier for Indonesian and Malay text document
There is huge growth of online text documents in the Internet today. We can easily find documents written in languages from all over part of the just from a single click. Increasing number of online text document in Internet makes the increased availability of information on the Internet. In fact th...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference or Workshop Item |
Published: |
Institute of Electrical and Electronics Engineers Inc.
2016
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-84995701431&doi=10.1109%2fISMSC.2015.7594040&partnerID=40&md5=d9715785f362d63c5eefd4f58185acc8 http://eprints.utp.edu.my/30800/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | There is huge growth of online text documents in the Internet today. We can easily find documents written in languages from all over part of the just from a single click. Increasing number of online text document in Internet makes the increased availability of information on the Internet. In fact that none in the world can understand all languages of the digital documents. Hence, there is a significant need to have a language identifier to assist user to understand the information. Up to now, the language identification is more focused in European languages and still limited for Asian languages. Whilst the research of language identification for similar languages from popular languages has attracted the attention of many researchers. In this research, a new language identification for language with similar topology, Malay and Indonesian language, is proposed. The algorithm is experimented on a set of Indonesian and Malay text documents to support the limited research of language identification for Asian language. An experiment done on 100 Indonesian and Malay text documents has produced a number of satisfactorily accurate results. © 2015 IEEE. |
---|