Enhancing high-dimensional streaming data analysis: optimizing Online Feature Selection for handling drift using optimization technique and ensemble learning

In the era of data-driven decision-making, managing dynamic data streams characterized by evolving data distributions and high dimensionality presents a formidable challenge for online feature selection. This research addresses the challenge by devel-oping innovative solutions in optimizing Online F...

Full description

Saved in:
Bibliographic Details
Main Author: Kamaru-Zaman, Ezzatul Akmal
Format: Thesis
Language:en
Published: 2024
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/122888/1/122888.pdf
https://ir.uitm.edu.my/id/eprint/122888/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the era of data-driven decision-making, managing dynamic data streams characterized by evolving data distributions and high dimensionality presents a formidable challenge for online feature selection. This research addresses the challenge by devel-oping innovative solutions in optimizing Online Feature Selection (OFS) to manage feature irrelevancy and redundancy, tackling the issues of Feature Drift, and rigor-ously validating the proposed algorithms in high-dimensional dynamic data streams. The research employs a structured methodology, introducing two novel methods: PSO-OSFS (Particle Swarm Optimization for Online Streaming Feature Selection), an optimized online feature selection and its enhancement, PSO-OSFS+ HEFT de-signed to handle feature drift. The PSO-OSFS method is underpinned by the adaptive threshold particle representation of particle swarm optimization and enhanced fitness function using minimization of mean absolute deviation of dependency among fea-ture subsets. Adaptive threshold particle representation introduces a novel aspect in defining a threshold value of significance level from 0.01 to 0.1.