Framework for a semantic data transformation in solving data quality issues in big data

Purpose - Today organizations and companies are generating a tremendous amount of data.At the same time, an enormous amount of data is being received and acquired from various resources and being stored which brings us to the era of Big Data (BD). BD is a term used to describe massive datasets that...

Full description

Saved in:
Bibliographic Details
Main Authors: Onyeabor, Grace, Ta'a, Azman
Format: Conference or Workshop Item
Language:English
Published: 2017
Subjects:
Online Access:http://repo.uum.edu.my/24488/1/SICONSEM%202017%2019%2021.pdf
http://repo.uum.edu.my/24488/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uum.repo.24488
record_format eprints
spelling my.uum.repo.244882020-11-03T07:36:09Z http://repo.uum.edu.my/24488/ Framework for a semantic data transformation in solving data quality issues in big data Onyeabor, Grace Ta'a, Azman QA75 Electronic computers. Computer science Purpose - Today organizations and companies are generating a tremendous amount of data.At the same time, an enormous amount of data is being received and acquired from various resources and being stored which brings us to the era of Big Data (BD). BD is a term used to describe massive datasets that are of diverse format created at a very high speed, the management of which is near impossible by using traditional database management systems (Kanchi et al., 2015). With the dawn of BD, Data Quality (DQ) has become very imperative.Volume, velocity and variety – the initial 3Vs characteristics of BD are usually used to describe the main properties of BD.But for extraction of value (which is another V property) and make BD effective and efficient for organizational decision making, the significance of another V of BD, veracity, is gradually coming to light. Veracity straightly denotes inconsistency and DQ issues.Today, veracity in data analysis is the biggest challenge when compared to other aspects such as volume and velocity. Trusting the data acquired goes a long way in implementing decisions from an automated decision making system and veracity helps to validate the data acquired (Agarwal, Ravikumar, & Saha, 2016).DQ represents an important issue in every business.To be successful, companies need high-quality data on inventory, supplies, customers, vendors and other vital enterprise information in order to run efficiently their data analysis applications (e.g. decision support systems, data mining, customer relationship management) and produce accurate results (McAfee & Brynjolfsson, 2012).During the transformation of huge volume of data, there might exist data mismatch, miscalculation and/or loss of useful data that leads to an unsuccessful data transformation (Tesfagiorgish, & JunYi, 2015) which will in turn leads to poor data quality. In addition of external data, particularly RDF data, increase some challenges for data transformation when compared with the traditional transformation process. For example, the drawbacks of using BD in the business analysis process is that the data is almost schema less, and RDF data contains poor or complex schema. Traditional data transformation tools are not able to process such inconsistent and heterogeneous data because they do not support semantic-aware data, they are entirely schema-dependent and they do not focus on expressive semantic relationships to integrate data from different sources.Thus, BD requires more powerful tools to transform data semantically. While the research on this area so far offer different frameworks, to the best of the researchers knowledge, not much research has been done in relation to transformation of DQ in BD. The much that has been done has not gone beyond cleansing incoming data generally (Merino et al., 2016).The proposed framework presents the method for the analysis of DQ using BD from various domains and applying semantic technologies in the ETL transformation stage to create a semantic model for the enablement of quality in the data. 2017-12-05 Conference or Workshop Item PeerReviewed application/pdf en http://repo.uum.edu.my/24488/1/SICONSEM%202017%2019%2021.pdf Onyeabor, Grace and Ta'a, Azman (2017) Framework for a semantic data transformation in solving data quality issues in big data. In: Sintok International Conference on Social Science and Management (SICONSEM 2017), 5 December 2017, Adya Hotel, Langkawi Island, Kedah, Malaysia.
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Institutional Repository
url_provider http://repo.uum.edu.my/
language English
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Onyeabor, Grace
Ta'a, Azman
Framework for a semantic data transformation in solving data quality issues in big data
description Purpose - Today organizations and companies are generating a tremendous amount of data.At the same time, an enormous amount of data is being received and acquired from various resources and being stored which brings us to the era of Big Data (BD). BD is a term used to describe massive datasets that are of diverse format created at a very high speed, the management of which is near impossible by using traditional database management systems (Kanchi et al., 2015). With the dawn of BD, Data Quality (DQ) has become very imperative.Volume, velocity and variety – the initial 3Vs characteristics of BD are usually used to describe the main properties of BD.But for extraction of value (which is another V property) and make BD effective and efficient for organizational decision making, the significance of another V of BD, veracity, is gradually coming to light. Veracity straightly denotes inconsistency and DQ issues.Today, veracity in data analysis is the biggest challenge when compared to other aspects such as volume and velocity. Trusting the data acquired goes a long way in implementing decisions from an automated decision making system and veracity helps to validate the data acquired (Agarwal, Ravikumar, & Saha, 2016).DQ represents an important issue in every business.To be successful, companies need high-quality data on inventory, supplies, customers, vendors and other vital enterprise information in order to run efficiently their data analysis applications (e.g. decision support systems, data mining, customer relationship management) and produce accurate results (McAfee & Brynjolfsson, 2012).During the transformation of huge volume of data, there might exist data mismatch, miscalculation and/or loss of useful data that leads to an unsuccessful data transformation (Tesfagiorgish, & JunYi, 2015) which will in turn leads to poor data quality. In addition of external data, particularly RDF data, increase some challenges for data transformation when compared with the traditional transformation process. For example, the drawbacks of using BD in the business analysis process is that the data is almost schema less, and RDF data contains poor or complex schema. Traditional data transformation tools are not able to process such inconsistent and heterogeneous data because they do not support semantic-aware data, they are entirely schema-dependent and they do not focus on expressive semantic relationships to integrate data from different sources.Thus, BD requires more powerful tools to transform data semantically. While the research on this area so far offer different frameworks, to the best of the researchers knowledge, not much research has been done in relation to transformation of DQ in BD. The much that has been done has not gone beyond cleansing incoming data generally (Merino et al., 2016).The proposed framework presents the method for the analysis of DQ using BD from various domains and applying semantic technologies in the ETL transformation stage to create a semantic model for the enablement of quality in the data.
format Conference or Workshop Item
author Onyeabor, Grace
Ta'a, Azman
author_facet Onyeabor, Grace
Ta'a, Azman
author_sort Onyeabor, Grace
title Framework for a semantic data transformation in solving data quality issues in big data
title_short Framework for a semantic data transformation in solving data quality issues in big data
title_full Framework for a semantic data transformation in solving data quality issues in big data
title_fullStr Framework for a semantic data transformation in solving data quality issues in big data
title_full_unstemmed Framework for a semantic data transformation in solving data quality issues in big data
title_sort framework for a semantic data transformation in solving data quality issues in big data
publishDate 2017
url http://repo.uum.edu.my/24488/1/SICONSEM%202017%2019%2021.pdf
http://repo.uum.edu.my/24488/
_version_ 1684655797173223424
score 13.211869