Hybridized feature set for accurate Arabic dark web pages classification

Security informatics and computational intelligence are gaining more importance in detecting terrorist activities as the extremist groups are misusing many of the available Internet services to incite violence and hatred. However, inadequate performance of statistical based computational intelligenc...

Full description

Saved in:
Bibliographic Details
Main Authors: Sabbah, T., Selamat, A.
Format: Conference or Workshop Item
Published: Springer Verlag 2015
Subjects:
Online Access:http://eprints.utm.my/id/eprint/59310/
http://dx.doi.org/10.1007/978-3-319-22689-7_13
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Security informatics and computational intelligence are gaining more importance in detecting terrorist activities as the extremist groups are misusing many of the available Internet services to incite violence and hatred. However, inadequate performance of statistical based computational intelligence methods reduces intelligent techniques efficiency in supporting counterterrorism efforts, and limits the early detection opportunities of potential terrorist activities. In this paper, we propose a feature set hybridization method, based on feature selection and extraction methods, for accurate content classification in Arabic dark web pages. The proposed method hybridizes the feature sets so that the generated feature set contains less number of features that capable of achieving higher classification performance. A selected dataset from Dark Web Forum Portal (DWFP) is used to test the performance of the proposed method that based on Term Frequency - Inverse Document Frequency (TFIDF) as feature selection method on one hand, while Random Projection (RP) and Principal Component Analysis (PCA) feature selection methods on the other hand. Classification results using the Support Vector Machine (SVM) classifier show that a high classification performance has been achieved base on the hybridization of TFIDF and PCA, where 99 % of F1 and accuracy performance has been achieved.