Automated Feature Engineering Using Meta-Learning for Efficient and Generalizable Data Science Pipelines
Feature engineering remains one of the most time-intensive and expertise-dependent stages in machine learning pipelines, often limiting scalability and reproducibility. Despite advances in automated machine learning, existing systems largely emphasize model and hyperparameter optimization while leav...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | en en |
| Published: |
INTI International University
2026
|
| Subjects: | |
| Online Access: | http://eprints.intimal.edu.my/2301/1/jods2026_04.pdf http://eprints.intimal.edu.my/2301/2/854 http://eprints.intimal.edu.my/2301/ http://ipublishing.intimal.edu.my/jods.html |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Feature engineering remains one of the most time-intensive and expertise-dependent stages in machine learning pipelines, often limiting scalability and reproducibility. Despite advances in automated machine learning, existing systems largely emphasize model and hyperparameter optimization while leaving feature construction partially manual and task-specific. This reveals a critical research gap: the absence of a transferable, experience-driven mechanism capable of generalizing feature engineering knowledge across heterogeneous datasets. To address this limitation, this study proposes a meta-learning–based automated feature engineering framework that models transformation selection as a learnable mapping between dataset meta-characteristics and transformation utility. The framework constructs a reusable meta-knowledge layer trained on historical task–transformation–performance relationships and applies ranked transformation strategies to unseen datasets under computational constraints. Experiments conducted on diverse classification and regression datasets demonstrate that the proposed approach achieves up to 4.2% improvement in F1-score and 8.3% reduction in RMSE compared to raw-feature baselines, while maintaining performance comparable to or exceeding manually engineered pipelines. In addition, development time is reduced by up to 55%, and search complexity decreases by approximately 60% through ranking-based pruning. These findings confirm that feature engineering can be formalized as a transferable meta-learning problem, enabling scalable, efficient, and generalizable data science workflows. The study advances the automation of representation construction and supports the integration of intelligent meta-knowledge reuse in next-generation AutoML systems. |
|---|
