Staff View: Authenticating sensitive diacritical texts using residual, data representation and pattern matching methods / Saqib Iqbal Hakak

Authenticating sensitive diacritical texts using residual, data representation and pattern matching methods / Saqib Iqbal Hakak

Diacritics play an important role in interpreting the meaning of a sentence through the proper pronunciation. Any text that needs diacritics is sensitive as any disarrangement of diacritics (intentional or unintentional) will result in complete misinterpretation of the text. There are different diac...

Full description

Saved in:

Bibliographic Details
Main Author:	Saqib Iqbal , Hakak
Format:	Thesis
Published:	2018
Subjects:	QA75 Electronic computers. Computer science
Online Access:	http://studentsrepo.um.edu.my/10408/1/Saqib_Iqbal_Hakak.pdf http://studentsrepo.um.edu.my/10408/2/Saqib_Iqbal_Hakak_%E2%80%93_Thesis.pdf http://studentsrepo.um.edu.my/10408/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.um.stud.10408
record_format	eprints
institution	Universiti Malaya
building	UM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Malaya
content_source	UM Student Repository
url_provider	http://studentsrepo.um.edu.my/
topic	QA75 Electronic computers. Computer science
spellingShingle	QA75 Electronic computers. Computer science Saqib Iqbal , Hakak Authenticating sensitive diacritical texts using residual, data representation and pattern matching methods / Saqib Iqbal Hakak
description	Diacritics play an important role in interpreting the meaning of a sentence through the proper pronunciation. Any text that needs diacritics is sensitive as any disarrangement of diacritics (intentional or unintentional) will result in complete misinterpretation of the text. There are different diacritics like punctuation symbols, extended letters (e.g. kashidas) and other symbols, that can be easily tampered to alter the original meaning of the text. There are limited studies focused on the authentication of such sensitive diacritical content (SDC). Most of the studies have removed the diacritics for authentication making the process questionable. Besides, the proliferation of such a sensitive content in different languages and formats on the internet has further exaggerated the issue of authentication involving search and retrieval phases. To address the mentioned issues, this thesis presents the different methods to authenticate the SDC with the aim to improve the searching and retrieval phases. The first method is based on the residual approach that authenticates any two similar sample texts written in different styles using one common database. It minimizes the overhead associated with maintaining the multiple databases. The objective is achieved using logical operations and the character segmentation. The second method is based on the representation of the diacritical text within the database to improve the retrieval performance for authentication of a single sentence (verse). The objective is achieved by creating individual nodes based on the total number of characters and placing each diacritical verse within its respective node. The last method is based on the pattern matching approach, where given multiple pattern input is authenticated from a given text. The purpose of exploring pattern matching approach is to authenticate multiple diacritical verses with improved time and space efficiency. The proposed method works by splitting the given pattern into two halves and searching for the respective halves. The searching of halves is achieved through two different algorithms based on the split approach and the parallel approach respectively. To show the practicality of the proposed methods, they are tested on sensitive diacritical text, which includes the Arabic Digital Holy Quran (DHQ). The reason for selecting the DHQ for evaluation purposes is its availability in different styles like uthmani and plain Arabic style that makes evaluation possible based on our first method. The second reason is the complexity of diacritics within DHQ and encoding scheme that decreases the authentication performance due to inefficient data representation and search/retrieval strategies. The mentioned reason made the evaluation of the second proposed method feasible and practical. Finally, for evaluating the pattern matching based approach, different sensitive texts including Arabic, French. Italian, English and Chinese were taken. The findings show that the first method manages to convert Uthmani and Plain Quranic verses into one common style with an accuracy of about 87 %. Similarly, the second method manages to authenticate single DHQ verse with the improvement in search time by approximately 70 % over the existing methods. Finally, the final method successfully authenticates multiple verses of different sensitive diacritical texts with improved computational time and memory consumption.
format	Thesis
author	Saqib Iqbal , Hakak
author_facet	Saqib Iqbal , Hakak
author_sort	Saqib Iqbal , Hakak
title	Authenticating sensitive diacritical texts using residual, data representation and pattern matching methods / Saqib Iqbal Hakak
title_short	Authenticating sensitive diacritical texts using residual, data representation and pattern matching methods / Saqib Iqbal Hakak
title_full	Authenticating sensitive diacritical texts using residual, data representation and pattern matching methods / Saqib Iqbal Hakak
title_fullStr	Authenticating sensitive diacritical texts using residual, data representation and pattern matching methods / Saqib Iqbal Hakak
title_full_unstemmed	Authenticating sensitive diacritical texts using residual, data representation and pattern matching methods / Saqib Iqbal Hakak
title_sort	authenticating sensitive diacritical texts using residual, data representation and pattern matching methods / saqib iqbal hakak
publishDate	2018
url	http://studentsrepo.um.edu.my/10408/1/Saqib_Iqbal_Hakak.pdf http://studentsrepo.um.edu.my/10408/2/Saqib_Iqbal_Hakak_%E2%80%93_Thesis.pdf http://studentsrepo.um.edu.my/10408/
_version_	1738506362924564480
spelling	my.um.stud.104082020-02-02T19:10:04Z Authenticating sensitive diacritical texts using residual, data representation and pattern matching methods / Saqib Iqbal Hakak Saqib Iqbal , Hakak QA75 Electronic computers. Computer science Diacritics play an important role in interpreting the meaning of a sentence through the proper pronunciation. Any text that needs diacritics is sensitive as any disarrangement of diacritics (intentional or unintentional) will result in complete misinterpretation of the text. There are different diacritics like punctuation symbols, extended letters (e.g. kashidas) and other symbols, that can be easily tampered to alter the original meaning of the text. There are limited studies focused on the authentication of such sensitive diacritical content (SDC). Most of the studies have removed the diacritics for authentication making the process questionable. Besides, the proliferation of such a sensitive content in different languages and formats on the internet has further exaggerated the issue of authentication involving search and retrieval phases. To address the mentioned issues, this thesis presents the different methods to authenticate the SDC with the aim to improve the searching and retrieval phases. The first method is based on the residual approach that authenticates any two similar sample texts written in different styles using one common database. It minimizes the overhead associated with maintaining the multiple databases. The objective is achieved using logical operations and the character segmentation. The second method is based on the representation of the diacritical text within the database to improve the retrieval performance for authentication of a single sentence (verse). The objective is achieved by creating individual nodes based on the total number of characters and placing each diacritical verse within its respective node. The last method is based on the pattern matching approach, where given multiple pattern input is authenticated from a given text. The purpose of exploring pattern matching approach is to authenticate multiple diacritical verses with improved time and space efficiency. The proposed method works by splitting the given pattern into two halves and searching for the respective halves. The searching of halves is achieved through two different algorithms based on the split approach and the parallel approach respectively. To show the practicality of the proposed methods, they are tested on sensitive diacritical text, which includes the Arabic Digital Holy Quran (DHQ). The reason for selecting the DHQ for evaluation purposes is its availability in different styles like uthmani and plain Arabic style that makes evaluation possible based on our first method. The second reason is the complexity of diacritics within DHQ and encoding scheme that decreases the authentication performance due to inefficient data representation and search/retrieval strategies. The mentioned reason made the evaluation of the second proposed method feasible and practical. Finally, for evaluating the pattern matching based approach, different sensitive texts including Arabic, French. Italian, English and Chinese were taken. The findings show that the first method manages to convert Uthmani and Plain Quranic verses into one common style with an accuracy of about 87 %. Similarly, the second method manages to authenticate single DHQ verse with the improvement in search time by approximately 70 % over the existing methods. Finally, the final method successfully authenticates multiple verses of different sensitive diacritical texts with improved computational time and memory consumption. 2018-07 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/10408/1/Saqib_Iqbal_Hakak.pdf application/pdf http://studentsrepo.um.edu.my/10408/2/Saqib_Iqbal_Hakak_%E2%80%93_Thesis.pdf Saqib Iqbal , Hakak (2018) Authenticating sensitive diacritical texts using residual, data representation and pattern matching methods / Saqib Iqbal Hakak. PhD thesis, University of Malaya. http://studentsrepo.um.edu.my/10408/
score	13.211869

Authenticating sensitive diacritical texts using residual, data representation and pattern matching methods / Saqib Iqbal Hakak

Similar Items