Enhancing high-dimensional streaming data analysis: optimizing Online Feature Selection for handling drift using optimization technique and ensemble learning
In the era of data-driven decision-making, managing dynamic data streams characterized by evolving data distributions and high dimensionality presents a formidable challenge for online feature selection. This research addresses the challenge by devel-oping innovative solutions in optimizing Online F...
Saved in:
| Main Author: | |
|---|---|
| Format: | Thesis |
| Language: | en |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://ir.uitm.edu.my/id/eprint/122888/1/122888.pdf https://ir.uitm.edu.my/id/eprint/122888/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In the era of data-driven decision-making, managing dynamic data streams characterized by evolving data distributions and high dimensionality presents a formidable challenge for online feature selection. This research addresses the challenge by devel-oping innovative solutions in optimizing Online Feature Selection (OFS) to manage feature irrelevancy and redundancy, tackling the issues of Feature Drift, and rigor-ously validating the proposed algorithms in high-dimensional dynamic data streams. The research employs a structured methodology, introducing two novel methods: PSO-OSFS (Particle Swarm Optimization for Online Streaming Feature Selection), an optimized online feature selection and its enhancement, PSO-OSFS+ HEFT de-signed to handle feature drift. The PSO-OSFS method is underpinned by the adaptive threshold particle representation of particle swarm optimization and enhanced fitness function using minimization of mean absolute deviation of dependency among fea-ture subsets. Adaptive threshold particle representation introduces a novel aspect in defining a threshold value of significance level from 0.01 to 0.1. |
|---|
