Exploring New Frontiers in Imbalanced Learning: Data Complexity-Based Solutions
Class imbalance is a frequently occurring scenario in classification tasks. Learning from imbalanced data poses quite a challenge which has instigated a lot of research in this area. My thesis focuses on developing new solutions to this issue based on data-intrinsic characteristics.
The main contributions from my MSc thesis are as follows.
Efficacy Analysis -- A detailed experimental study has been carried out on a wide range of imbalanced datasets to observe the performance of popular and state-of-the-art techniques used in imbalanced learning. A critical discussion on these approaches has been provided. Through an in-depth analysis, several major limitations of the established approaches have been identified. The primary factors contributing to the challenges of learning from imbalanced data have been determined.
UniSyn: A Unified Sampling Framework to Jointly Address Class Imbalance and Overlapping -- A novel data resampling methodology has been developed with the aim of minimizing the drawbacks of the established approaches and addressing all the data difficulty factors of imbalanced learning.
iBRF: Improved Balanced Random Forest Classifier -- A novel ensemble technique for data resampling has been developed. It is a modified version of the original BRF classifier that ensures better performance and improved generalization.
iCost: A Novel Instance Complexity-Based Cost-Sensitive Learning Framework -- A novel cost-sensitive approach has been developed that offers better prediction performance compared to traditional cost-sensitive learning approaches. The underlying learning difficulty of different instances is taken into consideration during penalization, ensuring a more plausible weighting mechanism.
PowerPoint Presentation