Assessing Suitable Word Embedding Model for Malay Language through Intrinsic Evaluation

Word embeddings were created to form meaningful representation for words in an efficient manner. This is an essential step in most of the Natural Language Processing tasks. In this paper, different Malay language word embedding models were trained on Malay text corpus. These models were trained usin...

Full description

Saved in:
Bibliographic Details
Main Authors: Phua, Y.-T., Yew, K.-H., Foong, O.-M., Teow, M.Y.-W.
Format: Conference or Workshop Item
Published: Institute of Electrical and Electronics Engineers Inc. 2020
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85097520726&doi=10.1109%2fICCI51257.2020.9247707&partnerID=40&md5=68e584984f71f741dbb5c313d6dcf19e
http://eprints.utp.edu.my/29870/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utp.eprints.29870
record_format eprints
spelling my.utp.eprints.298702022-03-25T03:05:09Z Assessing Suitable Word Embedding Model for Malay Language through Intrinsic Evaluation Phua, Y.-T. Yew, K.-H. Foong, O.-M. Teow, M.Y.-W. Word embeddings were created to form meaningful representation for words in an efficient manner. This is an essential step in most of the Natural Language Processing tasks. In this paper, different Malay language word embedding models were trained on Malay text corpus. These models were trained using Word2Vec and fastText using both CBOW and Skip-gram architectures, and GloVe. These trained models were tested on intrinsic evaluation for semantic similarity and word analogies. In the experiment, the custom-trained fastText Skip-gram model achieved 0.5509 for Pearson correlation coefficient at word similarity evaluation, and 36.80 for accuracy at word analogies evaluation. The result outperformed the fastText pre-trained models which only achieved 0.477 and 22.96 for word similarity evaluation and word analogies evaluation, respectively. The result shows that there is still room for improvement in both pre-processing tasks and datasets for evaluation. © 2020 IEEE. Institute of Electrical and Electronics Engineers Inc. 2020 Conference or Workshop Item NonPeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-85097520726&doi=10.1109%2fICCI51257.2020.9247707&partnerID=40&md5=68e584984f71f741dbb5c313d6dcf19e Phua, Y.-T. and Yew, K.-H. and Foong, O.-M. and Teow, M.Y.-W. (2020) Assessing Suitable Word Embedding Model for Malay Language through Intrinsic Evaluation. In: UNSPECIFIED. http://eprints.utp.edu.my/29870/
institution Universiti Teknologi Petronas
building UTP Resource Centre
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Petronas
content_source UTP Institutional Repository
url_provider http://eprints.utp.edu.my/
description Word embeddings were created to form meaningful representation for words in an efficient manner. This is an essential step in most of the Natural Language Processing tasks. In this paper, different Malay language word embedding models were trained on Malay text corpus. These models were trained using Word2Vec and fastText using both CBOW and Skip-gram architectures, and GloVe. These trained models were tested on intrinsic evaluation for semantic similarity and word analogies. In the experiment, the custom-trained fastText Skip-gram model achieved 0.5509 for Pearson correlation coefficient at word similarity evaluation, and 36.80 for accuracy at word analogies evaluation. The result outperformed the fastText pre-trained models which only achieved 0.477 and 22.96 for word similarity evaluation and word analogies evaluation, respectively. The result shows that there is still room for improvement in both pre-processing tasks and datasets for evaluation. © 2020 IEEE.
format Conference or Workshop Item
author Phua, Y.-T.
Yew, K.-H.
Foong, O.-M.
Teow, M.Y.-W.
spellingShingle Phua, Y.-T.
Yew, K.-H.
Foong, O.-M.
Teow, M.Y.-W.
Assessing Suitable Word Embedding Model for Malay Language through Intrinsic Evaluation
author_facet Phua, Y.-T.
Yew, K.-H.
Foong, O.-M.
Teow, M.Y.-W.
author_sort Phua, Y.-T.
title Assessing Suitable Word Embedding Model for Malay Language through Intrinsic Evaluation
title_short Assessing Suitable Word Embedding Model for Malay Language through Intrinsic Evaluation
title_full Assessing Suitable Word Embedding Model for Malay Language through Intrinsic Evaluation
title_fullStr Assessing Suitable Word Embedding Model for Malay Language through Intrinsic Evaluation
title_full_unstemmed Assessing Suitable Word Embedding Model for Malay Language through Intrinsic Evaluation
title_sort assessing suitable word embedding model for malay language through intrinsic evaluation
publisher Institute of Electrical and Electronics Engineers Inc.
publishDate 2020
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85097520726&doi=10.1109%2fICCI51257.2020.9247707&partnerID=40&md5=68e584984f71f741dbb5c313d6dcf19e
http://eprints.utp.edu.my/29870/
_version_ 1738657027140354048
score 13.211869