Anomaly detection in system log files using machine learning algorithms / Zahedeh Zamanian

In recent years due to rapid growth of information technology and easy access to computers, digital devices and internet, security management and investigating malicious activity have been main concern of organization and governments. People who are greatest asset of organization, they may also be t...

Full description

Saved in:
Bibliographic Details
Main Author: Zahedeh, Zamanian
Format: Thesis
Published: 2019
Subjects:
Online Access:http://studentsrepo.um.edu.my/10748/1/Zahedeh.pdf
http://studentsrepo.um.edu.my/10748/2/Zahedeh__%E2%80%93_Dissertation.pdf
http://studentsrepo.um.edu.my/10748/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.um.stud.10748
record_format eprints
spelling my.um.stud.107482020-08-16T23:23:32Z Anomaly detection in system log files using machine learning algorithms / Zahedeh Zamanian Zahedeh, Zamanian QA75 Electronic computers. Computer science In recent years due to rapid growth of information technology and easy access to computers, digital devices and internet, security management and investigating malicious activity have been main concern of organization and governments. People who are greatest asset of organization, they may also be the greatest threat due to their access to highly confidential information and their knowledge of the organizational systems. Insider threat activity has huge impact on business. Therefore, there is a need for methods to detect insider threats inside an organization. Log files are great source of information which can help to detect, understand and predict these kinds of threats. However, the sheer size of log files generated by systems makes human log analysis impractical. Moreover, log files have a lot of irrelevant and redundant features that act as noise. Also, log files are heterogenous and cannot fed them directly in machine learning algorithms. Furthermore, many of the companies use the signature-based detection method which is not capable of capturing more advanced attackers that use unfamiliar attacks methods. This study uses machine learning method to detect anomalies in system log files. This study uses synthetic CERT Insider Threat v6.2 dataset that includes five different domains of file, logon/logoff, http, device and email. This study generates 200 features from raw system log files that can be fed to machine learning. This study uses principal component analysis (PCA) as a feature extraction method to extract 117 independent and discriminative features with 95% of variance. This study applies unsupervised Isolation Forest and One Class SVM as ML algorithms to detect anomalies. Isolation Forest area under curve (AUC) successfully achieved 96.6% with applying PCA and without PCA, lowest value of AUC was 76%. In contrast, the AUC value for One Class SVM was 69.3% with applying PCA and 59.8% without PCA. Isolation Forest true positive rate (TPR) successfully achieved 93.2% with applying PCA and without PCA, value of TPR was 89.2%. On the other hand, the TPR value for One Class SVM was 68.1% with applying PCA and 55.4% without PCA. The highest FPR result of 26% was obtained by One Class SVM without PCA and the lowest FPR result of 2.8% was obtained by Isolation Forest with applying PCA. 2019-05 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/10748/1/Zahedeh.pdf application/pdf http://studentsrepo.um.edu.my/10748/2/Zahedeh__%E2%80%93_Dissertation.pdf Zahedeh, Zamanian (2019) Anomaly detection in system log files using machine learning algorithms / Zahedeh Zamanian. Masters thesis, University of Malaya. http://studentsrepo.um.edu.my/10748/
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Student Repository
url_provider http://studentsrepo.um.edu.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Zahedeh, Zamanian
Anomaly detection in system log files using machine learning algorithms / Zahedeh Zamanian
description In recent years due to rapid growth of information technology and easy access to computers, digital devices and internet, security management and investigating malicious activity have been main concern of organization and governments. People who are greatest asset of organization, they may also be the greatest threat due to their access to highly confidential information and their knowledge of the organizational systems. Insider threat activity has huge impact on business. Therefore, there is a need for methods to detect insider threats inside an organization. Log files are great source of information which can help to detect, understand and predict these kinds of threats. However, the sheer size of log files generated by systems makes human log analysis impractical. Moreover, log files have a lot of irrelevant and redundant features that act as noise. Also, log files are heterogenous and cannot fed them directly in machine learning algorithms. Furthermore, many of the companies use the signature-based detection method which is not capable of capturing more advanced attackers that use unfamiliar attacks methods. This study uses machine learning method to detect anomalies in system log files. This study uses synthetic CERT Insider Threat v6.2 dataset that includes five different domains of file, logon/logoff, http, device and email. This study generates 200 features from raw system log files that can be fed to machine learning. This study uses principal component analysis (PCA) as a feature extraction method to extract 117 independent and discriminative features with 95% of variance. This study applies unsupervised Isolation Forest and One Class SVM as ML algorithms to detect anomalies. Isolation Forest area under curve (AUC) successfully achieved 96.6% with applying PCA and without PCA, lowest value of AUC was 76%. In contrast, the AUC value for One Class SVM was 69.3% with applying PCA and 59.8% without PCA. Isolation Forest true positive rate (TPR) successfully achieved 93.2% with applying PCA and without PCA, value of TPR was 89.2%. On the other hand, the TPR value for One Class SVM was 68.1% with applying PCA and 55.4% without PCA. The highest FPR result of 26% was obtained by One Class SVM without PCA and the lowest FPR result of 2.8% was obtained by Isolation Forest with applying PCA.
format Thesis
author Zahedeh, Zamanian
author_facet Zahedeh, Zamanian
author_sort Zahedeh, Zamanian
title Anomaly detection in system log files using machine learning algorithms / Zahedeh Zamanian
title_short Anomaly detection in system log files using machine learning algorithms / Zahedeh Zamanian
title_full Anomaly detection in system log files using machine learning algorithms / Zahedeh Zamanian
title_fullStr Anomaly detection in system log files using machine learning algorithms / Zahedeh Zamanian
title_full_unstemmed Anomaly detection in system log files using machine learning algorithms / Zahedeh Zamanian
title_sort anomaly detection in system log files using machine learning algorithms / zahedeh zamanian
publishDate 2019
url http://studentsrepo.um.edu.my/10748/1/Zahedeh.pdf
http://studentsrepo.um.edu.my/10748/2/Zahedeh__%E2%80%93_Dissertation.pdf
http://studentsrepo.um.edu.my/10748/
_version_ 1738506405622579200
score 13.211869