An empirical assessment of ML models for 5G network intrusion detection: a data leakage-free approach

This paper thoroughly compares thirteen unique Machine Learning (ML) models utilized for Intrusion detection systems (IDS) in a meticulously controlled environment. Unlike previous studies, we introduce a novel approach that meticulously avoids data leakage, enhancing the reliability of our findings...

Full description

Saved in:
Bibliographic Details
Main Authors: Bouke, Mohamed Aly, Abdullah, Azizol
Format: Article
Language:English
Published: Elsevier 2024
Online Access:http://psasir.upm.edu.my/id/eprint/113366/1/113366.pdf
http://psasir.upm.edu.my/id/eprint/113366/
https://linkinghub.elsevier.com/retrieve/pii/S2772671124001700
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper thoroughly compares thirteen unique Machine Learning (ML) models utilized for Intrusion detection systems (IDS) in a meticulously controlled environment. Unlike previous studies, we introduce a novel approach that meticulously avoids data leakage, enhancing the reliability of our findings. The study draws upon a comprehensively labeled 5G-NIDD dataset covering a broad spectrum of network behaviors, from benign real-user traffic to various attack scenarios. Our data preprocessing and experimental design have been carefully structured to eradicate any data leakage, a standout feature of our methodology that significantly improves the robustness and dependability of our results compared to prior studies. The ML models are evaluated using various performance metrics, including accuracy, precision, recall, F1-score, ROC AUC, and execution time. Our results reveal that the K-Nearest Neighbors model is superior in accuracy and ROC AUC, while the Voting Classifier stands out in precision and F1-score. Decision Tree, Bagging, and Extra Trees models exhibit strong recall scores. In contrast, the AdaBoost model falls short across all assessed metrics. Despite displaying only modest performance on other metrics, the Naive Bayes model excels in computational efficiency, offering the quickest execution time. This paper emphasizes the importance of understanding various ML models' distinct strengths, drawbacks, and trade-offs for network intrusion detection. It highlights that no single model is universally superior, and the choice hinges on the nature of the dataset, specific application requirements, and the computational resources available.