Staff View: Text clustering for reducing semantic information in Malay semantic representation

Text clustering for reducing semantic information in Malay semantic representation

The generation of texts are dramatically increased in this era. A text basically consists of structured and unstructured texts. The enormous amount of unstructured texts can be easily perceived by humans, unfortunately cannot be simply processed by computer. It needs efficient techniques to redu...

Full description

Saved in:

Bibliographic Details
Main Authors:	Tuan Norhafizah Tuan Zakaria,, Mohd Juzaiddin Ab Aziz,, Mohd Rosmadi Mokhtar,, Saadiyah Darus,
Format:	Article
Language:	English
Published:	Penerbit Universiti Kebangsaan Malaysia 2020
Online Access:	http://journalarticle.ukm.my/16833/1/02.pdf http://journalarticle.ukm.my/16833/ https://www.ukm.my/apjitm/articles-year.php
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-ukm.journal.16833
record_format	eprints
spelling	my-ukm.journal.168332021-06-20T04:35:55Z http://journalarticle.ukm.my/16833/ Text clustering for reducing semantic information in Malay semantic representation Tuan Norhafizah Tuan Zakaria, Mohd Juzaiddin Ab Aziz, Mohd Rosmadi Mokhtar, Saadiyah Darus, The generation of texts are dramatically increased in this era. A text basically consists of structured and unstructured texts. The enormous amount of unstructured texts can be easily perceived by humans, unfortunately cannot be simply processed by computer. It needs efficient techniques to reduce the information into more valuable vectors. In this article, we introduce text clustering method using Malay linguistic information to reduce the unstructured semantic information derived from Wikipedia Bahasa Melayu’s articles. The proposed method uses the linguistic features in Malay language to cater the morphological issues of Malay words. We have incorporated semantic information from semantic lexical resource for Malay, which called Wikipedia Bahasa Melayu (WikiBM). Then, an experiment was conducted to evaluate the effects of text clustering to the semantic similarity value using gloss definition of WikiBM’s article. We used Jaccard similarity to calculate the overlaps vectors from the text of WikiBM. Then, the correlation was computed using Pearson’s correlation. The score between original text definition was compared to the new text definition using text clustering method. From the experiment, we can conclude that the correlation value was increased after the semantic information was reduced to more valuable vectors using text clustering method (from 0.39 to 0.43). Penerbit Universiti Kebangsaan Malaysia 2020-12 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/16833/1/02.pdf Tuan Norhafizah Tuan Zakaria, and Mohd Juzaiddin Ab Aziz, and Mohd Rosmadi Mokhtar, and Saadiyah Darus, (2020) Text clustering for reducing semantic information in Malay semantic representation. Asia-Pacific Journal of Information Technology and Multimedia, 9 (2). pp. 11-24. ISSN 2289-2192 https://www.ukm.my/apjitm/articles-year.php
institution	Universiti Kebangsaan Malaysia
building	Tun Sri Lanang Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Kebangsaan Malaysia
content_source	UKM Journal Article Repository
url_provider	http://journalarticle.ukm.my/
language	English
description	The generation of texts are dramatically increased in this era. A text basically consists of structured and unstructured texts. The enormous amount of unstructured texts can be easily perceived by humans, unfortunately cannot be simply processed by computer. It needs efficient techniques to reduce the information into more valuable vectors. In this article, we introduce text clustering method using Malay linguistic information to reduce the unstructured semantic information derived from Wikipedia Bahasa Melayu’s articles. The proposed method uses the linguistic features in Malay language to cater the morphological issues of Malay words. We have incorporated semantic information from semantic lexical resource for Malay, which called Wikipedia Bahasa Melayu (WikiBM). Then, an experiment was conducted to evaluate the effects of text clustering to the semantic similarity value using gloss definition of WikiBM’s article. We used Jaccard similarity to calculate the overlaps vectors from the text of WikiBM. Then, the correlation was computed using Pearson’s correlation. The score between original text definition was compared to the new text definition using text clustering method. From the experiment, we can conclude that the correlation value was increased after the semantic information was reduced to more valuable vectors using text clustering method (from 0.39 to 0.43).
format	Article
author	Tuan Norhafizah Tuan Zakaria, Mohd Juzaiddin Ab Aziz, Mohd Rosmadi Mokhtar, Saadiyah Darus,
spellingShingle	Tuan Norhafizah Tuan Zakaria, Mohd Juzaiddin Ab Aziz, Mohd Rosmadi Mokhtar, Saadiyah Darus, Text clustering for reducing semantic information in Malay semantic representation
author_facet	Tuan Norhafizah Tuan Zakaria, Mohd Juzaiddin Ab Aziz, Mohd Rosmadi Mokhtar, Saadiyah Darus,
author_sort	Tuan Norhafizah Tuan Zakaria,
title	Text clustering for reducing semantic information in Malay semantic representation
title_short	Text clustering for reducing semantic information in Malay semantic representation
title_full	Text clustering for reducing semantic information in Malay semantic representation
title_fullStr	Text clustering for reducing semantic information in Malay semantic representation
title_full_unstemmed	Text clustering for reducing semantic information in Malay semantic representation
title_sort	text clustering for reducing semantic information in malay semantic representation
publisher	Penerbit Universiti Kebangsaan Malaysia
publishDate	2020
url	http://journalarticle.ukm.my/16833/1/02.pdf http://journalarticle.ukm.my/16833/ https://www.ukm.my/apjitm/articles-year.php
_version_	1703961596412297216
score	13.211869

Text clustering for reducing semantic information in Malay semantic representation

Similar Items