Framework for mining XML format business process log data

With the advent of the Internet, there is a dramatic increase in the volume of semi-structured and unstructured data. Therefore, a lot of frequent subtree mining (FSM) algorithms and methods were developed to get information from semi-structured data specifically data with hierarchical nature. Howev...

Full description

Saved in:
Bibliographic Details
Main Author: Ang, Jin Sheng
Format: Thesis
Language:English
English
English
Published: 2024
Subjects:
Online Access:https://etd.uum.edu.my/11012/1/permission%20to%20deposit-allow%20embargo%2012%20months-s904045.pdf
https://etd.uum.edu.my/11012/2/s904045_01.pdf
https://etd.uum.edu.my/11012/3/s904045_02.pdf
https://etd.uum.edu.my/11012/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uum.etd.11012
record_format eprints
spelling my.uum.etd.110122024-02-29T00:24:50Z https://etd.uum.edu.my/11012/ Framework for mining XML format business process log data Ang, Jin Sheng T58.5-58.64 Information technology QA299.6-433 Analysis With the advent of the Internet, there is a dramatic increase in the volume of semi-structured and unstructured data. Therefore, a lot of frequent subtree mining (FSM) algorithms and methods were developed to get information from semi-structured data specifically data with hierarchical nature. However, many existing FSM algorithms and methods often neglect or fail to preserve structural information, which hinders extracting meaningful insights from such data. Besides, statistical analysis and data mining techniques are difficult to be applied in eXtensible Markup Language (XML) format documents. This study introduces an alternative approach for mining XML format documents which can be modelled into tree-structured format. The Flatten Sequential Structure Model (FSSM) was developed to transform tree-structured data into structured, preserving its structural integrity, thus facilitating comprehensive statistical analysis and data mining. FSSM was divided into two phases. The first phase converted tree structure data into flat structure with the structural information. The second phase converted the first phase data into structured format. After that, statistical analysis or classification were conducted. The effectiveness of the methods and framework was assessed by applying them to both simulation datasets and real-life event logs, namely the Business Process Intelligence Challenge (BPIC). After applying FSSM phases to simulation and real-life event log data, descriptive statistics, t-tests, and chi-square tests were successfully executed. Association rules revealed that they outnumbered those from existing FSM methods. The Random Forest model outperformed others with a classification accuracy of 0.75 for simulation data, while the decision tree achieved the highest accuracy (0.7474) in the BPIC 2017 dataset. In the BPIC 2018 dataset, all three models performed well, exceeding 0.99 in classification accuracy. The results indicate that by transforming complex hierarchical data into a format suitable for statistical analysis, the analysis process is simplified and made more accessible to researchers in various fields. 2024 Thesis NonPeerReviewed text en https://etd.uum.edu.my/11012/1/permission%20to%20deposit-allow%20embargo%2012%20months-s904045.pdf text en https://etd.uum.edu.my/11012/2/s904045_01.pdf text en https://etd.uum.edu.my/11012/3/s904045_02.pdf Ang, Jin Sheng (2024) Framework for mining XML format business process log data. Doctoral thesis, Universiti Utara Malaysia.
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Electronic Theses
url_provider http://etd.uum.edu.my/
language English
English
English
topic T58.5-58.64 Information technology
QA299.6-433 Analysis
spellingShingle T58.5-58.64 Information technology
QA299.6-433 Analysis
Ang, Jin Sheng
Framework for mining XML format business process log data
description With the advent of the Internet, there is a dramatic increase in the volume of semi-structured and unstructured data. Therefore, a lot of frequent subtree mining (FSM) algorithms and methods were developed to get information from semi-structured data specifically data with hierarchical nature. However, many existing FSM algorithms and methods often neglect or fail to preserve structural information, which hinders extracting meaningful insights from such data. Besides, statistical analysis and data mining techniques are difficult to be applied in eXtensible Markup Language (XML) format documents. This study introduces an alternative approach for mining XML format documents which can be modelled into tree-structured format. The Flatten Sequential Structure Model (FSSM) was developed to transform tree-structured data into structured, preserving its structural integrity, thus facilitating comprehensive statistical analysis and data mining. FSSM was divided into two phases. The first phase converted tree structure data into flat structure with the structural information. The second phase converted the first phase data into structured format. After that, statistical analysis or classification were conducted. The effectiveness of the methods and framework was assessed by applying them to both simulation datasets and real-life event logs, namely the Business Process Intelligence Challenge (BPIC). After applying FSSM phases to simulation and real-life event log data, descriptive statistics, t-tests, and chi-square tests were successfully executed. Association rules revealed that they outnumbered those from existing FSM methods. The Random Forest model outperformed others with a classification accuracy of 0.75 for simulation data, while the decision tree achieved the highest accuracy (0.7474) in the BPIC 2017 dataset. In the BPIC 2018 dataset, all three models performed well, exceeding 0.99 in classification accuracy. The results indicate that by transforming complex hierarchical data into a format suitable for statistical analysis, the analysis process is simplified and made more accessible to researchers in various fields.
format Thesis
author Ang, Jin Sheng
author_facet Ang, Jin Sheng
author_sort Ang, Jin Sheng
title Framework for mining XML format business process log data
title_short Framework for mining XML format business process log data
title_full Framework for mining XML format business process log data
title_fullStr Framework for mining XML format business process log data
title_full_unstemmed Framework for mining XML format business process log data
title_sort framework for mining xml format business process log data
publishDate 2024
url https://etd.uum.edu.my/11012/1/permission%20to%20deposit-allow%20embargo%2012%20months-s904045.pdf
https://etd.uum.edu.my/11012/2/s904045_01.pdf
https://etd.uum.edu.my/11012/3/s904045_02.pdf
https://etd.uum.edu.my/11012/
_version_ 1793158727691403264
score 13.211869