Automated Feature Engineering Using Meta-Learning for Efficient and Generalizable Data Science Pipelines

Feature engineering remains one of the most time-intensive and expertise-dependent stages in machine learning pipelines, often limiting scalability and reproducibility. Despite advances in automated machine learning, existing systems largely emphasize model and hyperparameter optimization while leav...

Full description

Saved in:
Bibliographic Details
Main Authors: Helda, Yudhiastuti, Shafiq, Hussain, Irfa, Shabbir
Format: Article
Language:en
en
Published: INTI International University 2026
Subjects:
Online Access:http://eprints.intimal.edu.my/2301/1/jods2026_04.pdf
http://eprints.intimal.edu.my/2301/2/854
http://eprints.intimal.edu.my/2301/
http://ipublishing.intimal.edu.my/jods.html
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Feature engineering remains one of the most time-intensive and expertise-dependent stages in machine learning pipelines, often limiting scalability and reproducibility. Despite advances in automated machine learning, existing systems largely emphasize model and hyperparameter optimization while leaving feature construction partially manual and task-specific. This reveals a critical research gap: the absence of a transferable, experience-driven mechanism capable of generalizing feature engineering knowledge across heterogeneous datasets. To address this limitation, this study proposes a meta-learning–based automated feature engineering framework that models transformation selection as a learnable mapping between dataset meta-characteristics and transformation utility. The framework constructs a reusable meta-knowledge layer trained on historical task–transformation–performance relationships and applies ranked transformation strategies to unseen datasets under computational constraints. Experiments conducted on diverse classification and regression datasets demonstrate that the proposed approach achieves up to 4.2% improvement in F1-score and 8.3% reduction in RMSE compared to raw-feature baselines, while maintaining performance comparable to or exceeding manually engineered pipelines. In addition, development time is reduced by up to 55%, and search complexity decreases by approximately 60% through ranking-based pruning. These findings confirm that feature engineering can be formalized as a transferable meta-learning problem, enabling scalable, efficient, and generalizable data science workflows. The study advances the automation of representation construction and supports the integration of intelligent meta-knowledge reuse in next-generation AutoML systems.