Dissimilarity algorithm on conceptual graphs to mine text outliers

The graphical text representation method such as Conceptual Graphs (CGs) attempts to capture the structure and semantics of documents.As such, they are the preferred text representation approach for a wide range of problems namely in natural language processing, information retrieval and text mining...

Full description

Saved in:
Bibliographic Details
Main Authors: Kamaruddin, Siti Sakira, Hamdan, Abdul Razak, Abu Bakar, Azuraliza, Mat Nor, Fauzias
Format: Conference or Workshop Item
Language:English
Published: 2009
Subjects:
Online Access:http://repo.uum.edu.my/15346/1/05341910.pdf
http://repo.uum.edu.my/15346/
http://doi.org/10.1109/DMO.2009.5341910
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uum.repo.15346
record_format eprints
spelling my.uum.repo.153462015-09-01T08:56:58Z http://repo.uum.edu.my/15346/ Dissimilarity algorithm on conceptual graphs to mine text outliers Kamaruddin, Siti Sakira Hamdan, Abdul Razak Abu Bakar, Azuraliza Mat Nor, Fauzias QA Mathematics The graphical text representation method such as Conceptual Graphs (CGs) attempts to capture the structure and semantics of documents.As such, they are the preferred text representation approach for a wide range of problems namely in natural language processing, information retrieval and text mining.In a number of these applications, it is necessary to measure the dissimilarity (or similarity) between knowledge represented in the CGs.In this paper, we would like to present a dissimilarity algorithm to detect outliers from a collection of text represented with Conceptual Graph Interchange Format (CGIF).In order to avoid the NP-complete problem of graph matching algorithm, we introduce the use of a standard CG in the dissimilarity computation.We evaluate our method in the context of analyzing real world financial statements for identifying outlying performance indicators.For evaluation purposes, we compare the proposed dissimilarity function with a dice-coefficient similarity function used in a related previous work.Experimental results indicate that our method outperforms the existing method and correlates better to human judgements. In Comparison to other text outlier detection method, this approach managed to capture the semantics of documents through the use of CGs and is convenient to detect outliers through a simple dissimilarity function.Furthermore, our proposed algorithm retains a linear complexity with the increasing number of CGs. 2009 Conference or Workshop Item PeerReviewed application/pdf en http://repo.uum.edu.my/15346/1/05341910.pdf Kamaruddin, Siti Sakira and Hamdan, Abdul Razak and Abu Bakar, Azuraliza and Mat Nor, Fauzias (2009) Dissimilarity algorithm on conceptual graphs to mine text outliers. In: 2nd Conference on Data Mining and Optimization, 2009 (DMO '09), 27-28 October 2009, Selangor, Malaysia. http://doi.org/10.1109/DMO.2009.5341910 doi:10.1109/DMO.2009.5341910
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Institutionali Repository
url_provider http://repo.uum.edu.my/
language English
topic QA Mathematics
spellingShingle QA Mathematics
Kamaruddin, Siti Sakira
Hamdan, Abdul Razak
Abu Bakar, Azuraliza
Mat Nor, Fauzias
Dissimilarity algorithm on conceptual graphs to mine text outliers
description The graphical text representation method such as Conceptual Graphs (CGs) attempts to capture the structure and semantics of documents.As such, they are the preferred text representation approach for a wide range of problems namely in natural language processing, information retrieval and text mining.In a number of these applications, it is necessary to measure the dissimilarity (or similarity) between knowledge represented in the CGs.In this paper, we would like to present a dissimilarity algorithm to detect outliers from a collection of text represented with Conceptual Graph Interchange Format (CGIF).In order to avoid the NP-complete problem of graph matching algorithm, we introduce the use of a standard CG in the dissimilarity computation.We evaluate our method in the context of analyzing real world financial statements for identifying outlying performance indicators.For evaluation purposes, we compare the proposed dissimilarity function with a dice-coefficient similarity function used in a related previous work.Experimental results indicate that our method outperforms the existing method and correlates better to human judgements. In Comparison to other text outlier detection method, this approach managed to capture the semantics of documents through the use of CGs and is convenient to detect outliers through a simple dissimilarity function.Furthermore, our proposed algorithm retains a linear complexity with the increasing number of CGs.
format Conference or Workshop Item
author Kamaruddin, Siti Sakira
Hamdan, Abdul Razak
Abu Bakar, Azuraliza
Mat Nor, Fauzias
author_facet Kamaruddin, Siti Sakira
Hamdan, Abdul Razak
Abu Bakar, Azuraliza
Mat Nor, Fauzias
author_sort Kamaruddin, Siti Sakira
title Dissimilarity algorithm on conceptual graphs to mine text outliers
title_short Dissimilarity algorithm on conceptual graphs to mine text outliers
title_full Dissimilarity algorithm on conceptual graphs to mine text outliers
title_fullStr Dissimilarity algorithm on conceptual graphs to mine text outliers
title_full_unstemmed Dissimilarity algorithm on conceptual graphs to mine text outliers
title_sort dissimilarity algorithm on conceptual graphs to mine text outliers
publishDate 2009
url http://repo.uum.edu.my/15346/1/05341910.pdf
http://repo.uum.edu.my/15346/
http://doi.org/10.1109/DMO.2009.5341910
_version_ 1644281694649319424
score 13.211869