An application of Malay short-form word conversion using levenshtein distance
Formerly, short-form word was widely used in the field of journalism. However, nowadays, short-form word has been widely used by many people, especially in online communication. These short-form words trigger problems in the field of data mining, especially those involving online text processing...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English English |
Published: |
2020
|
Subjects: | |
Online Access: | http://eprints.unisza.edu.my/7277/1/FH02-FIK-20-43364.pdf http://eprints.unisza.edu.my/7277/2/FH02-FIK-21-50265.pdf http://eprints.unisza.edu.my/7277/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-unisza-ir.7277 |
---|---|
record_format |
eprints |
spelling |
my-unisza-ir.72772022-05-24T04:35:34Z http://eprints.unisza.edu.my/7277/ An application of Malay short-form word conversion using levenshtein distance Rohana, Ismail Azilawati, Azizan NurAine, Saidin P Philology. Linguistics QA Mathematics Formerly, short-form word was widely used in the field of journalism. However, nowadays, short-form word has been widely used by many people, especially in online communication. These short-form words trigger problems in the field of data mining, especially those involving online text processing. It leads to inaccurate result of text mining activities. On the other hand, only few works have investigated on Malay short-form word identification and conversion. Therefore, this work aims to develop an application that can identify and convert Malay short-form words into its’ full word. In order to develop this application, the short-form rules need to be carefully examined. The formal rules from Dewan Bahasa & Pustaka (DBP) are used as the primary reference for generating the short form word identification algorithm. While for the conversion algorithm, Levenshtein Distance (LD) is used to measure the similarity. The rule-based technique is also used as a complement to LD technique. As a result, 70.27% of the Malay short-form words have been correctly converted into their full words. The conversion rate is quite promising, and this work can be further strengthened by incorporating more rules into the algorithm. 2020-12 Article PeerReviewed text en http://eprints.unisza.edu.my/7277/1/FH02-FIK-20-43364.pdf text en http://eprints.unisza.edu.my/7277/2/FH02-FIK-21-50265.pdf Rohana, Ismail and Azilawati, Azizan and NurAine, Saidin (2020) An application of Malay short-form word conversion using levenshtein distance. Mathematical Sciences and Informatics Journal, 1 (1). pp. 34-42. ISSN 2735-0703 |
institution |
Universiti Sultan Zainal Abidin |
building |
UNISZA Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Sultan Zainal Abidin |
content_source |
UNISZA Institutional Repository |
url_provider |
https://eprints.unisza.edu.my/ |
language |
English English |
topic |
P Philology. Linguistics QA Mathematics |
spellingShingle |
P Philology. Linguistics QA Mathematics Rohana, Ismail Azilawati, Azizan NurAine, Saidin An application of Malay short-form word conversion using levenshtein distance |
description |
Formerly, short-form word was widely used in the field of journalism.
However, nowadays, short-form word has been widely used by many
people, especially in online communication. These short-form words
trigger problems in the field of data mining, especially those involving
online text processing. It leads to inaccurate result of text mining
activities. On the other hand, only few works have investigated on
Malay short-form word identification and conversion. Therefore, this
work aims to develop an application that can identify and convert
Malay short-form words into its’ full word. In order to develop this
application, the short-form rules need to be carefully examined. The
formal rules from Dewan Bahasa & Pustaka (DBP) are used as the
primary reference for generating the short form word identification
algorithm. While for the conversion algorithm, Levenshtein Distance
(LD) is used to measure the similarity. The rule-based technique is
also used as a complement to LD technique. As a result, 70.27% of
the Malay short-form words have been correctly converted into their
full words. The conversion rate is quite promising, and this work can
be further strengthened by incorporating more rules into the algorithm. |
format |
Article |
author |
Rohana, Ismail Azilawati, Azizan NurAine, Saidin |
author_facet |
Rohana, Ismail Azilawati, Azizan NurAine, Saidin |
author_sort |
Rohana, Ismail |
title |
An application of Malay short-form word conversion using levenshtein distance |
title_short |
An application of Malay short-form word conversion using levenshtein distance |
title_full |
An application of Malay short-form word conversion using levenshtein distance |
title_fullStr |
An application of Malay short-form word conversion using levenshtein distance |
title_full_unstemmed |
An application of Malay short-form word conversion using levenshtein distance |
title_sort |
application of malay short-form word conversion using levenshtein distance |
publishDate |
2020 |
url |
http://eprints.unisza.edu.my/7277/1/FH02-FIK-20-43364.pdf http://eprints.unisza.edu.my/7277/2/FH02-FIK-21-50265.pdf http://eprints.unisza.edu.my/7277/ |
_version_ |
1734304597271379968 |
score |
13.211869 |