An application of Malay short-form word conversion using Levenshtein distance / Azilawati Azizan, NurAine Saidin, Nurkhairizan Khairudin & Rohana Ismail

Formerly, short-form word was widely used in the field of journalism. However, nowadays, short-form word has been widely used by many people, especially in online communication. These short-form words trigger problems in the field of data mining, especially those involving online text processing. It...

Full description

Saved in:
Bibliographic Details
Main Authors: Azilawati Azizan, Azilawati Azizan, NurAine Saidin, NurAine Saidin, Nurkhairizan Khairudin, Nurkhairizan Khairudin, Rohana Ismail, Rohana Ismail
Format: Article
Language:English
Published: UiTM Press 2020
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/38191/2/38191.pdf
https://ir.uitm.edu.my/id/eprint/38191/
https://mijuitm.com.my/view-articles/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uitm.ir.38191
record_format eprints
spelling my.uitm.ir.381912023-06-26T04:51:35Z https://ir.uitm.edu.my/id/eprint/38191/ An application of Malay short-form word conversion using Levenshtein distance / Azilawati Azizan, NurAine Saidin, Nurkhairizan Khairudin & Rohana Ismail msij Azilawati Azizan, Azilawati Azizan NurAine Saidin, NurAine Saidin Nurkhairizan Khairudin, Nurkhairizan Khairudin Rohana Ismail, Rohana Ismail Malaysia Malaysia Algorithms Formerly, short-form word was widely used in the field of journalism. However, nowadays, short-form word has been widely used by many people, especially in online communication. These short-form words trigger problems in the field of data mining, especially those involving online text processing. It leads to inaccurate result of text mining activities. On the other hand, only few works have investigated on Malay short-form word identification and conversion. Therefore, this work aims to develop an application that can identify and convert Malay short-form words into its’ full word. In order to develop this application, the short-form rules need to be carefully examined. The formal rules from Dewan Bahasa & Pustaka (DBP) are used as the primary reference for generating the short form word identification algorithm. While for the conversion algorithm, Levenshtein Distance (LD) is used to measure the similarity. The rule-based technique is also used as a complement to LD technique. As a result, 70.27% of the Malay short-form words have been correctly converted into their full words. The conversion rate is quite promising, and this work can be further strengthened by incorporating more rules into the algorithm. UiTM Press 2020-11 Article PeerReviewed text en https://ir.uitm.edu.my/id/eprint/38191/2/38191.pdf An application of Malay short-form word conversion using Levenshtein distance / Azilawati Azizan, NurAine Saidin, Nurkhairizan Khairudin & Rohana Ismail. (2020) Mathematical Sciences and Informatics Journal (MIJ) <https://ir.uitm.edu.my/view/publication/Mathematical_Sciences_and_Informatics_Journal_=28MIJ=29.html>, 1 (2). pp. 34-42. ISSN 2735-0703 https://mijuitm.com.my/view-articles/
institution Universiti Teknologi Mara
building Tun Abdul Razak Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Mara
content_source UiTM Institutional Repository
url_provider http://ir.uitm.edu.my/
language English
topic Malaysia
Malaysia
Algorithms
spellingShingle Malaysia
Malaysia
Algorithms
Azilawati Azizan, Azilawati Azizan
NurAine Saidin, NurAine Saidin
Nurkhairizan Khairudin, Nurkhairizan Khairudin
Rohana Ismail, Rohana Ismail
An application of Malay short-form word conversion using Levenshtein distance / Azilawati Azizan, NurAine Saidin, Nurkhairizan Khairudin & Rohana Ismail
description Formerly, short-form word was widely used in the field of journalism. However, nowadays, short-form word has been widely used by many people, especially in online communication. These short-form words trigger problems in the field of data mining, especially those involving online text processing. It leads to inaccurate result of text mining activities. On the other hand, only few works have investigated on Malay short-form word identification and conversion. Therefore, this work aims to develop an application that can identify and convert Malay short-form words into its’ full word. In order to develop this application, the short-form rules need to be carefully examined. The formal rules from Dewan Bahasa & Pustaka (DBP) are used as the primary reference for generating the short form word identification algorithm. While for the conversion algorithm, Levenshtein Distance (LD) is used to measure the similarity. The rule-based technique is also used as a complement to LD technique. As a result, 70.27% of the Malay short-form words have been correctly converted into their full words. The conversion rate is quite promising, and this work can be further strengthened by incorporating more rules into the algorithm.
format Article
author Azilawati Azizan, Azilawati Azizan
NurAine Saidin, NurAine Saidin
Nurkhairizan Khairudin, Nurkhairizan Khairudin
Rohana Ismail, Rohana Ismail
author_facet Azilawati Azizan, Azilawati Azizan
NurAine Saidin, NurAine Saidin
Nurkhairizan Khairudin, Nurkhairizan Khairudin
Rohana Ismail, Rohana Ismail
author_sort Azilawati Azizan, Azilawati Azizan
title An application of Malay short-form word conversion using Levenshtein distance / Azilawati Azizan, NurAine Saidin, Nurkhairizan Khairudin & Rohana Ismail
title_short An application of Malay short-form word conversion using Levenshtein distance / Azilawati Azizan, NurAine Saidin, Nurkhairizan Khairudin & Rohana Ismail
title_full An application of Malay short-form word conversion using Levenshtein distance / Azilawati Azizan, NurAine Saidin, Nurkhairizan Khairudin & Rohana Ismail
title_fullStr An application of Malay short-form word conversion using Levenshtein distance / Azilawati Azizan, NurAine Saidin, Nurkhairizan Khairudin & Rohana Ismail
title_full_unstemmed An application of Malay short-form word conversion using Levenshtein distance / Azilawati Azizan, NurAine Saidin, Nurkhairizan Khairudin & Rohana Ismail
title_sort application of malay short-form word conversion using levenshtein distance / azilawati azizan, nuraine saidin, nurkhairizan khairudin & rohana ismail
publisher UiTM Press
publishDate 2020
url https://ir.uitm.edu.my/id/eprint/38191/2/38191.pdf
https://ir.uitm.edu.my/id/eprint/38191/
https://mijuitm.com.my/view-articles/
_version_ 1769846456549638144
score 13.211869