Text Analysis of 2019 Auditor General’s Report / Aisyah Hamizah Azmi, Faresha Farhana Rahman and Noor Dzulaiqa Izafitriah Mohd Alias

Text data analysis has become an essential tool in extracting information from enormous amount of online documents. One of the documents that can be analyzed is the Malaysian Auditor General’s report. This research was inspired to assist the National Audit Department collect valuable details from th...

Full description

Saved in:
Bibliographic Details
Main Authors: Azmi, Aisyah Hamizah, Rahman, Faresha Farhana, Mohd Alias, Noor Dzulaiqa Izafitriah
Format: Student Project
Language:en
Published: 2021
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/59915/1/59915.pdf
https://ir.uitm.edu.my/id/eprint/59915/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Text data analysis has become an essential tool in extracting information from enormous amount of online documents. One of the documents that can be analyzed is the Malaysian Auditor General’s report. This research was inspired to assist the National Audit Department collect valuable details from the report and to visualise it into a simplest form to monitor. The first objective of this research is to explore the word pattern of Auditor General’s Report for 2019. The method used to achieve this is by using the collocation analysis. It is found that the collocation of telaga tiub has the highest association strength, measured by lambda which has been standardized. Since telaga tiub has the highest probability that exactly follow each other, this research also investigate the words that relate to telaga by employing cluster analysis, which is the second objective. The method of clustering used is the Ward’s Minimum Variance. There are two clusters of words formed. The first cluster can be classified as authorities that are responsible for the Telaga Tiub project which are Kementerian Pendidikan Malaysia and Jabatan Mineral dan Geologi. The second cluster represents the agencies that can get benefits from the Telaga Tiub project. As for the third objective, this research also focus on determining the words that are significantly related to specific terms such as penyelewengan, pembaziran, gagal, kecuaian and ketirisan, using the multiple Fisher’s Exact test. The term penyelewengan is found to be highly significant with the words wujud and pengawal. The words hpkk, pengawal, memandang, diharapkan and mengelakkan are found to be highly significant with term pembaziran. As for the term gagal, it is found that the term is highly significant with the words bayaran, deposit, membayar, syarat and guaman. Whereas, the words skop, kolam, spesifikasi and uwet are found to be highly significant with the term kecuaian. The last term ketirisan is found to be highly significant with the words lkim, hasil, sewa and dikutip.