Text this: Distance-based undersampling for imbalance dataset: a comprehensive simulation study