Machine Learning Projects

Identifying Bias in Computer-Aided Diagnosis (CAD) tools

Tools: SHAP, Scikit-learn

Synopsis:

This work reflects the application of AI in medical diagnosis and the transparency that is required for a reliable decision-support tool.
The stochastic nature of the ML algorithms as well as the feature selection technique induces bias in the prediction framework. This raises concerns about the reliability of the CAD system.
To increase the trustworthiness and accountability of the diagnostic system as well as to provide transparency and explanations behind the predictions, explainable AI (XAI) can be incorporated into the ML framework.

Paper: A paper based on this work has been published in Informatics in Medicine Unlocked journal (Elsevier). New biomarkers have been proposed for the premenopausal and postmenopausal populations. The diagnostic accuracy obtained from the proposed system outperforms the existing methods as well as the state-of-the-art ROMA algorithm.

The paper is titled - "An ML-based decision support system for reliable diagnosis of ovarian cancer by leveraging explainable AI".

Code

Paper

A Sophisticated ML Framework for the Accurate Detection of Phishing Websites

Tools: Scikit-Learn, Pandas

Synopsis: Phishing is an increasingly sophisticated form of cyberattack. Designing an ML framework that can provide generalized performance over a broad variety of datasets is a challenge.

In this project, a stacking ensemble classifier is constructed which can harness the capabilities of a variety of different ML algorithms and provide generalized performance. A greedy-based selection mechanism is employed to obtain the optimal combination of weak learners while the recursive feature elimination algorithm is employed to extract the most representative features for individual classifiers. Finally, a neural network, capable of learning complex non-linear relationships in the data, was utilized as the meta-learner.

The performance of the developed system was tested on 4 different phishing datasets. A paper is written on this work which is currently under review.

Code

Paper (Arxiv)

Survival Analysis

Synopsis: Unlike traditional statistical methods, survival analysis can handle censored data and model time-to-event data, providing insight into not just whether an event happens, but when it happens and how different factors influence that timing.

Tools: Scikit-survival, Lifelines

Methods:

Kaplan-Meier estimation
Cox proportional hazard model
Random survival forest (RSF)

Data:

Heart failure patients

https://archive.ics.uci.edu/dataset/519/heart+failure+clinical+records

SEER dataset (https://seer.cancer.gov/)

Paper:

Code

Page updated

Google Sites

Report abuse