An improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding

Cost functions; Forecasting; Large dataset; Multilayer neural networks; Network layers; Network security; Open source software; Software reliability; National vulnerability database; Over fitting problem; Performance metrics; Prediction model; Recent researches; Resampling technique; System software...

Full description

Saved in:
Bibliographic Details
Main Authors: Hoque M.S., Jamil N., Amin N., Lam K.-Y.
Other Authors: 57220806665
Format: Article
Published: MDPI AG 2023
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uniten.dspace-26158
record_format dspace
spelling my.uniten.dspace-261582023-05-29T17:07:18Z An improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding Hoque M.S. Jamil N. Amin N. Lam K.-Y. 57220806665 36682671900 7102424614 7403657062 Cost functions; Forecasting; Large dataset; Multilayer neural networks; Network layers; Network security; Open source software; Software reliability; National vulnerability database; Over fitting problem; Performance metrics; Prediction model; Recent researches; Resampling technique; System softwares; Unique identifiers; Predictive analytics; algorithm; computer security; machine learning; reproducibility; Algorithms; Computer Security; Machine Learning; Neural Networks, Computer; Reproducibility of Results Successful cyber-attacks are caused by the exploitation of some vulnerabilities in the software and/or hardware that exist in systems deployed in premises or the cloud. Although hundreds of vulnerabilities are discovered every year, only a small fraction of them actually become exploited, thereby there exists a severe class imbalance between the number of exploited and non-exploited vulnerabilities. The open source national vulnerability database, the largest repository to index and maintain all known vulnerabilities, assigns a unique identifier to each vulnerability. Each registered vulnerability also gets a severity score based on the impact it might inflict upon if compromised. Recent research works showed that the cvss score is not the only factor to select a vulnerability for exploitation, and other attributes in the national vulnerability database can be effectively utilized as predictive feature to predict the most exploitable vulnerabilities. Since cybersecurity management is highly resource savvy, organizations such as cloud systems will benefit when the most likely exploitable vulnerabilities that exist in their system software or hardware can be predicted with as much accuracy and reliability as possible, to best utilize the available resources to fix those first. Various existing research works have developed vulnerability exploitation prediction models by addressing the existing class imbalance based on algorithmic and artificial data resampling techniques but still suffer greatly from the overfitting problem to the major class rendering them practically unreliable. In this research, we have designed a novel cost function feature to address the existing class imbalance. We also have utilized the available large text corpus in the extracted dataset to develop a custom-trained word vector that can better capture the context of the local text data for utilization as an embedded layer in neural networks. Our developed vulnerability exploitation prediction models powered by a novel cost function and custom-trained word vector have achieved very high overall performance metrics for accuracy, precision, recall, F1-Score and AUC score with values of 0.92, 0.89, 0.98, 0.94 and 0.97, respectively, thereby outperforming any existing models while successfully overcoming the existing overfitting problem for class imbalance. � 2021 by the authors. Licensee MDPI, Basel, Switzerland. Final 2023-05-29T09:07:18Z 2023-05-29T09:07:18Z 2021 Article 10.3390/s21124220 2-s2.0-85108114510 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85108114510&doi=10.3390%2fs21124220&partnerID=40&md5=94a6e0312f1fab8c61ddaacc941d9583 https://irepository.uniten.edu.my/handle/123456789/26158 21 12 4220 All Open Access, Gold, Green MDPI AG Scopus
institution Universiti Tenaga Nasional
building UNITEN Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Tenaga Nasional
content_source UNITEN Institutional Repository
url_provider http://dspace.uniten.edu.my/
description Cost functions; Forecasting; Large dataset; Multilayer neural networks; Network layers; Network security; Open source software; Software reliability; National vulnerability database; Over fitting problem; Performance metrics; Prediction model; Recent researches; Resampling technique; System softwares; Unique identifiers; Predictive analytics; algorithm; computer security; machine learning; reproducibility; Algorithms; Computer Security; Machine Learning; Neural Networks, Computer; Reproducibility of Results
author2 57220806665
author_facet 57220806665
Hoque M.S.
Jamil N.
Amin N.
Lam K.-Y.
format Article
author Hoque M.S.
Jamil N.
Amin N.
Lam K.-Y.
spellingShingle Hoque M.S.
Jamil N.
Amin N.
Lam K.-Y.
An improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding
author_sort Hoque M.S.
title An improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding
title_short An improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding
title_full An improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding
title_fullStr An improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding
title_full_unstemmed An improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding
title_sort improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding
publisher MDPI AG
publishDate 2023
_version_ 1806426077276405760
score 13.211869