Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values

UdayKiran Gopidasu; Govinda Teja Sai; G.Mohan Vijay Govinda Raju; G.SriRam ShivaShankar

Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values

Author(s):

UdayKiran Gopidasu , Bharath Institute of Higher Education and Research; Govinda Teja Sai, Bharath Institute of Higher Education and Research; G.Mohan Vijay Govinda Raju, Bharath Institute of Higher Education and Research; G.SriRam ShivaShankar, Bharath Institute of Higher Education and Research

Keywords:

Insurance Fraud Detection, Super Learning, Ensemble Learning, SMOTE, XGBoost, SHAP, LIME, Explainable AI, Class Imbalance, Machine Learning

Abstract

Insurance fraud is a major challenge for the insurance industry, causing significant financial losses every year. Detecting fraudulent claims is difficult because fraud cases are rare compared to legitimate claims, resulting in highly imbalanced datasets. In addition, real-world insurance datasets often contain missing values and complex feature relationships, which further complicate fraud detection. This project proposes a machine learning-based approach using Super Learning (ensemble learning) and Explainable Artificial Intelligence (XAI) to improve fraud detection performance. The dataset used contains various insurance claim attributes such as policy details, incident information, and claim amounts. Data preprocessing techniques are applied to handle missing values and categorical variables, and class imbalance is addressed using SMOTE. Five machine learning algorithms are implemented and compared, including Logistic Regression, Decision Tree, Random Forest, Support Vector Machine (SVM), and XGBoost. These models are combined using a Super Learning framework to improve predictive accuracy. Explainable AI techniques such as SHAP and LIME are used to identify the most influential features contributing to fraud predictions. Experimental results show that the Super Learner model achieves 93.8% accuracy, outperforming individual algorithms while maintaining interpretability through XAI methods.

Other Details

Paper ID: IJSRDV14I20135
Published in: Volume : 14, Issue : 2
Publication Date: 01/05/2026
Page(s): 157-159

Article Preview

Download Article

Email To A Friend

CALL FOR PAPERS : June-2026

ADVANCED SEARCH

NEWS & UPDATES

FOR AUTHORS

FOR REVIEWERS

ARCHIVES

DOWNLOADS