Framework for mining XML format business process log data
With the advent of the Internet, there is a dramatic increase in the volume of semi-structured and unstructured data. Therefore, a lot of frequent subtree mining (FSM) algorithms and methods were developed to get information from semi-structured data specifically data with hierarchical nature. Howev...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English English English |
Published: |
2024
|
Subjects: | |
Online Access: | https://etd.uum.edu.my/11012/1/permission%20to%20deposit-allow%20embargo%2012%20months-s904045.pdf https://etd.uum.edu.my/11012/2/s904045_01.pdf https://etd.uum.edu.my/11012/3/s904045_02.pdf https://etd.uum.edu.my/11012/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.uum.etd.11012 |
---|---|
record_format |
eprints |
spelling |
my.uum.etd.110122024-02-29T00:24:50Z https://etd.uum.edu.my/11012/ Framework for mining XML format business process log data Ang, Jin Sheng T58.5-58.64 Information technology QA299.6-433 Analysis With the advent of the Internet, there is a dramatic increase in the volume of semi-structured and unstructured data. Therefore, a lot of frequent subtree mining (FSM) algorithms and methods were developed to get information from semi-structured data specifically data with hierarchical nature. However, many existing FSM algorithms and methods often neglect or fail to preserve structural information, which hinders extracting meaningful insights from such data. Besides, statistical analysis and data mining techniques are difficult to be applied in eXtensible Markup Language (XML) format documents. This study introduces an alternative approach for mining XML format documents which can be modelled into tree-structured format. The Flatten Sequential Structure Model (FSSM) was developed to transform tree-structured data into structured, preserving its structural integrity, thus facilitating comprehensive statistical analysis and data mining. FSSM was divided into two phases. The first phase converted tree structure data into flat structure with the structural information. The second phase converted the first phase data into structured format. After that, statistical analysis or classification were conducted. The effectiveness of the methods and framework was assessed by applying them to both simulation datasets and real-life event logs, namely the Business Process Intelligence Challenge (BPIC). After applying FSSM phases to simulation and real-life event log data, descriptive statistics, t-tests, and chi-square tests were successfully executed. Association rules revealed that they outnumbered those from existing FSM methods. The Random Forest model outperformed others with a classification accuracy of 0.75 for simulation data, while the decision tree achieved the highest accuracy (0.7474) in the BPIC 2017 dataset. In the BPIC 2018 dataset, all three models performed well, exceeding 0.99 in classification accuracy. The results indicate that by transforming complex hierarchical data into a format suitable for statistical analysis, the analysis process is simplified and made more accessible to researchers in various fields. 2024 Thesis NonPeerReviewed text en https://etd.uum.edu.my/11012/1/permission%20to%20deposit-allow%20embargo%2012%20months-s904045.pdf text en https://etd.uum.edu.my/11012/2/s904045_01.pdf text en https://etd.uum.edu.my/11012/3/s904045_02.pdf Ang, Jin Sheng (2024) Framework for mining XML format business process log data. Doctoral thesis, Universiti Utara Malaysia. |
institution |
Universiti Utara Malaysia |
building |
UUM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Utara Malaysia |
content_source |
UUM Electronic Theses |
url_provider |
http://etd.uum.edu.my/ |
language |
English English English |
topic |
T58.5-58.64 Information technology QA299.6-433 Analysis |
spellingShingle |
T58.5-58.64 Information technology QA299.6-433 Analysis Ang, Jin Sheng Framework for mining XML format business process log data |
description |
With the advent of the Internet, there is a dramatic increase in the volume of semi-structured and unstructured data. Therefore, a lot of frequent subtree mining (FSM) algorithms and methods were developed to get information from semi-structured data specifically data with hierarchical nature. However, many existing FSM algorithms and methods often neglect or fail to preserve structural information, which hinders extracting meaningful insights from such data. Besides, statistical analysis and data mining techniques are difficult to be applied in eXtensible Markup Language (XML) format documents. This study introduces an alternative approach for mining XML format documents which can be modelled into tree-structured format. The Flatten Sequential Structure Model (FSSM) was developed to transform tree-structured data into structured, preserving its structural integrity, thus facilitating comprehensive statistical analysis and data mining. FSSM was divided into two phases. The first phase converted tree structure data into flat structure with the structural information. The second phase converted the first phase data into structured format. After that, statistical analysis or classification were conducted. The effectiveness of the methods and framework was assessed by applying them to both simulation datasets and real-life event logs, namely the Business Process Intelligence Challenge (BPIC). After applying FSSM phases to simulation and real-life event log data, descriptive statistics, t-tests, and chi-square tests were successfully executed. Association rules revealed that they outnumbered those from existing FSM methods. The Random Forest model outperformed others with a classification accuracy of 0.75 for simulation data, while the decision tree achieved the highest accuracy (0.7474) in the BPIC 2017 dataset. In the BPIC 2018 dataset, all three models performed well, exceeding 0.99 in classification accuracy. The results indicate that by transforming complex hierarchical data into a format suitable for statistical analysis, the analysis process is simplified and made more accessible to researchers in various fields. |
format |
Thesis |
author |
Ang, Jin Sheng |
author_facet |
Ang, Jin Sheng |
author_sort |
Ang, Jin Sheng |
title |
Framework for mining XML format business process log data |
title_short |
Framework for mining XML format business process log data |
title_full |
Framework for mining XML format business process log data |
title_fullStr |
Framework for mining XML format business process log data |
title_full_unstemmed |
Framework for mining XML format business process log data |
title_sort |
framework for mining xml format business process log data |
publishDate |
2024 |
url |
https://etd.uum.edu.my/11012/1/permission%20to%20deposit-allow%20embargo%2012%20months-s904045.pdf https://etd.uum.edu.my/11012/2/s904045_01.pdf https://etd.uum.edu.my/11012/3/s904045_02.pdf https://etd.uum.edu.my/11012/ |
_version_ |
1793158727691403264 |
score |
13.211869 |