MARC表示: Instance matching framework for heterogeneous semantic web content over linked data environment

Instance matching framework for heterogeneous semantic web content over linked data environment

Over the past decade, instance matching has been the possible method of discovering relationships within heterogeneous Resource Description framework (RDF) based data that can represent the same real-word entity over Linked Data environment. The exponential growth of data being experienced in the...

詳細記述

保存先:

書誌詳細
第一著者:	Mansir, Abubakar
フォーマット:	学位論文
言語:	English
出版事項:	2021
主題:	Semantic Web Heterogeneous distributed computing systems
オンライン･アクセス:	http://psasir.upm.edu.my/id/eprint/104010/1/FSKTM%202022%209%20IR.pdf http://psasir.upm.edu.my/id/eprint/104010/
タグ:	タグ追加タグなし, このレコードへの初めてのタグを付けませんか!

id	my.upm.eprints.104010
record_format	eprints
spelling	my.upm.eprints.1040102023-07-03T08:14:03Z http://psasir.upm.edu.my/id/eprint/104010/ Instance matching framework for heterogeneous semantic web content over linked data environment Mansir, Abubakar Over the past decade, instance matching has been the possible method of discovering relationships within heterogeneous Resource Description framework (RDF) based data that can represent the same real-word entity over Linked Data environment. The exponential growth of data being experienced in the recent times in terms of volume, variety and velocity makes existing instance matching frameworks difficult to effectively discover relationships and generate a matching output. These frameworks suffer a high amount of comparisons in discovering matching attributes at initial stage which leads to missing attributes in generating training samples, thus results to incomplete alignment generation as matching output. Manual parameter configuration is another problem associated to existing matching frameworks, which make them weak in handling data with high level of heterogeneity. Another issue caused by these problems is the time taken to generate alignment as well as maximum memory space utilization during the process. Effective and scalable instance matching framework is needed to improve the matching performance. In this study, an instance matching framework is proposed to address the identified problems to improve the ability of generating better and accurate matching output (alignment) in a minimum running time. This framework adapted the methods used in the benchmark studies with additional components and modifications in some existing components to boost the matching performance. A proposed framework works interactively with the following components: Serialisation and pre-processing, unsupervised training set generation, property alignment and two-fold similarity generation components. Serialisation involves translating RDF data from of N-Triples file to Comma Separated Value (CSV) file format while pre-processing performs basic text filter. In attribute discovery component, potential matching attributes are discovered by clustering attributes of matching instances into similar and non-similar clusters in order to discover potential attribute pairs for the matching. These discovered attributes serve as input to a modified training set generation component, where training sets are generated based on the potential attributes’ clusters. Property alignment check the irregular data associated to the generated sets to optimise the matching performance. The last component generates similarity with self-configuration behavior. Experiments have been conducted to evaluate the performance of individual components and the output of the framework as whole. The evaluation is performed on real-world datasets provided in different Ontology Alignment Evaluation Initiative (OAEI) campaign as benchmark data for instance matching track evaluation. The output of each algorithm is evaluated, the results have shown that each algorithm performs well and outperforms the existing algorithms on all test cases in terms better output generation and effective handling of heterogeneity from different domains, which is a necessary concern in all data-intensive problems. A proposed framework demonstrated a significant improvement compared to the benchmark frameworks: Agreement Maker Light (AML), RiMOM-Instance Matching (RiMOM-IM) and Unsupervised Instance Matcher in terms of accuracy of alignment generation in a minimum time frame with ability to accommodate increase in the size of Linked Data (LD) in today’s web content. 2021-06 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/104010/1/FSKTM%202022%209%20IR.pdf Mansir, Abubakar (2021) Instance matching framework for heterogeneous semantic web content over linked data environment. Doctoral thesis, Universiti Putra Malaysia. Semantic Web Heterogeneous distributed computing systems
institution	Universiti Putra Malaysia
building	UPM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Putra Malaysia
content_source	UPM Institutional Repository
url_provider	http://psasir.upm.edu.my/
language	English
topic	Semantic Web Heterogeneous distributed computing systems
spellingShingle	Semantic Web Heterogeneous distributed computing systems Mansir, Abubakar Instance matching framework for heterogeneous semantic web content over linked data environment
description	Over the past decade, instance matching has been the possible method of discovering relationships within heterogeneous Resource Description framework (RDF) based data that can represent the same real-word entity over Linked Data environment. The exponential growth of data being experienced in the recent times in terms of volume, variety and velocity makes existing instance matching frameworks difficult to effectively discover relationships and generate a matching output. These frameworks suffer a high amount of comparisons in discovering matching attributes at initial stage which leads to missing attributes in generating training samples, thus results to incomplete alignment generation as matching output. Manual parameter configuration is another problem associated to existing matching frameworks, which make them weak in handling data with high level of heterogeneity. Another issue caused by these problems is the time taken to generate alignment as well as maximum memory space utilization during the process. Effective and scalable instance matching framework is needed to improve the matching performance. In this study, an instance matching framework is proposed to address the identified problems to improve the ability of generating better and accurate matching output (alignment) in a minimum running time. This framework adapted the methods used in the benchmark studies with additional components and modifications in some existing components to boost the matching performance. A proposed framework works interactively with the following components: Serialisation and pre-processing, unsupervised training set generation, property alignment and two-fold similarity generation components. Serialisation involves translating RDF data from of N-Triples file to Comma Separated Value (CSV) file format while pre-processing performs basic text filter. In attribute discovery component, potential matching attributes are discovered by clustering attributes of matching instances into similar and non-similar clusters in order to discover potential attribute pairs for the matching. These discovered attributes serve as input to a modified training set generation component, where training sets are generated based on the potential attributes’ clusters. Property alignment check the irregular data associated to the generated sets to optimise the matching performance. The last component generates similarity with self-configuration behavior. Experiments have been conducted to evaluate the performance of individual components and the output of the framework as whole. The evaluation is performed on real-world datasets provided in different Ontology Alignment Evaluation Initiative (OAEI) campaign as benchmark data for instance matching track evaluation. The output of each algorithm is evaluated, the results have shown that each algorithm performs well and outperforms the existing algorithms on all test cases in terms better output generation and effective handling of heterogeneity from different domains, which is a necessary concern in all data-intensive problems. A proposed framework demonstrated a significant improvement compared to the benchmark frameworks: Agreement Maker Light (AML), RiMOM-Instance Matching (RiMOM-IM) and Unsupervised Instance Matcher in terms of accuracy of alignment generation in a minimum time frame with ability to accommodate increase in the size of Linked Data (LD) in today’s web content.
format	Thesis
author	Mansir, Abubakar
author_facet	Mansir, Abubakar
author_sort	Mansir, Abubakar
title	Instance matching framework for heterogeneous semantic web content over linked data environment
title_short	Instance matching framework for heterogeneous semantic web content over linked data environment
title_full	Instance matching framework for heterogeneous semantic web content over linked data environment
title_fullStr	Instance matching framework for heterogeneous semantic web content over linked data environment
title_full_unstemmed	Instance matching framework for heterogeneous semantic web content over linked data environment
title_sort	instance matching framework for heterogeneous semantic web content over linked data environment
publishDate	2021
url	http://psasir.upm.edu.my/id/eprint/104010/1/FSKTM%202022%209%20IR.pdf http://psasir.upm.edu.my/id/eprint/104010/
_version_	1770553015800954880
score	13.251813

Instance matching framework for heterogeneous semantic web content over linked data environment

類似資料