Stylometric authorship balanced attribution prediction method

Stylometric authorship attribution is one of the important approaches in the text mining field that has received growing attention due to its delicateness. This approach concerns about analyzing texts such as novels and plays written by famous authors, trying to measure their writing style by choosi...

Full description

Saved in:
Bibliographic Details
Main Author: Mustafa, Tareef Kamil
Format: Thesis
Language:English
English
Published: 2011
Online Access:http://psasir.upm.edu.my/id/eprint/27377/1/FSKTM%202011%2016R.pdf
http://psasir.upm.edu.my/id/eprint/27377/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.upm.eprints.27377
record_format eprints
spelling my.upm.eprints.273772014-02-27T00:53:54Z http://psasir.upm.edu.my/id/eprint/27377/ Stylometric authorship balanced attribution prediction method Mustafa, Tareef Kamil Stylometric authorship attribution is one of the important approaches in the text mining field that has received growing attention due to its delicateness. This approach concerns about analyzing texts such as novels and plays written by famous authors, trying to measure their writing style by choosing some attributes that shows uniquely belong to the author, assuming that each author has a special artistic way of writing that no other author has. There are two major problems that tie up the progress in this field, which are the predictions accuracy results and the human expert judgment. The techniques that manage such predictions are either using the statistical attributes such as frequent words or the use of more sophisticated semantic techniques such as lexicons. Nonetheless, the results are still considerably less accurate. In this research, we propose a new Stylometric method known as the Stylometric authorship balanced attribution (SABA) that is able to overcome these problems with higher accuracy prediction and independent from human judgments, which means that the method does not rely on the domain experts. The new method is implemented by merging three methods, which are called the computational approach, the Winnow algorithm and the Burrows-delta method. The proposed method also uses a set of more effective attributes as compared to the frequent words method. This results in higher Stylometric prediction thus far, having more alibis for author artistic writing style for authorship recognition and prediction. The effective attributes are represented by the word pair and the trio, while both are multiple words attributes. The proposed SABA method is compared against three other methods using the computational approach, the Winnow algorithm method, and the Burrows-delta method. The results showed that the proposed method produces superior prediction accuracy and even provides a completely correct result during the final stage of the experiment. 2011-08 Thesis NonPeerReviewed application/pdf en http://psasir.upm.edu.my/id/eprint/27377/1/FSKTM%202011%2016R.pdf Mustafa, Tareef Kamil (2011) Stylometric authorship balanced attribution prediction method. PhD thesis, Universiti Putra Malaysia. English
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
language English
English
description Stylometric authorship attribution is one of the important approaches in the text mining field that has received growing attention due to its delicateness. This approach concerns about analyzing texts such as novels and plays written by famous authors, trying to measure their writing style by choosing some attributes that shows uniquely belong to the author, assuming that each author has a special artistic way of writing that no other author has. There are two major problems that tie up the progress in this field, which are the predictions accuracy results and the human expert judgment. The techniques that manage such predictions are either using the statistical attributes such as frequent words or the use of more sophisticated semantic techniques such as lexicons. Nonetheless, the results are still considerably less accurate. In this research, we propose a new Stylometric method known as the Stylometric authorship balanced attribution (SABA) that is able to overcome these problems with higher accuracy prediction and independent from human judgments, which means that the method does not rely on the domain experts. The new method is implemented by merging three methods, which are called the computational approach, the Winnow algorithm and the Burrows-delta method. The proposed method also uses a set of more effective attributes as compared to the frequent words method. This results in higher Stylometric prediction thus far, having more alibis for author artistic writing style for authorship recognition and prediction. The effective attributes are represented by the word pair and the trio, while both are multiple words attributes. The proposed SABA method is compared against three other methods using the computational approach, the Winnow algorithm method, and the Burrows-delta method. The results showed that the proposed method produces superior prediction accuracy and even provides a completely correct result during the final stage of the experiment.
format Thesis
author Mustafa, Tareef Kamil
spellingShingle Mustafa, Tareef Kamil
Stylometric authorship balanced attribution prediction method
author_facet Mustafa, Tareef Kamil
author_sort Mustafa, Tareef Kamil
title Stylometric authorship balanced attribution prediction method
title_short Stylometric authorship balanced attribution prediction method
title_full Stylometric authorship balanced attribution prediction method
title_fullStr Stylometric authorship balanced attribution prediction method
title_full_unstemmed Stylometric authorship balanced attribution prediction method
title_sort stylometric authorship balanced attribution prediction method
publishDate 2011
url http://psasir.upm.edu.my/id/eprint/27377/1/FSKTM%202011%2016R.pdf
http://psasir.upm.edu.my/id/eprint/27377/
_version_ 1643829166797225984
score 13.211869