Research Connect
Research PapersAboutContact

Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance

Published in Technologies (2021-07-24)
aionlincourseaionlincourseaionlincourseaionlincourse
Generate GraphDownload

On This Page

  • TL;DR
  • Abstract
  • Authors
  • Datasets
  • References
TL

TL;DR

CART, along with RS or QT, outperforms all other ML algorithms with 100% accuracy, 100% precision, 99% recall, and 100% F1 score, and the study outcomes demonstrate that the model’s performance varies depending on the data scaling method.

Abstract

Heart disease, one of the main reasons behind the high mortality rate around the world, requires a sophisticated and expensive diagnosis process. In the recent past, much literature has demonstrated machine learning approaches as an opportunity to efficiently diagnose heart disease patients. However, challenges associated with datasets such as missing data, inconsistent data, and mixed data (containing inconsistent missing data both as numerical and categorical) are often obstacles in medical diagnosis. This inconsistency led to a higher probability of misprediction and a misled result. Data preprocessing steps like feature reduction, data conversion, and data scaling are employed to form a standard dataset—such measures play a crucial role in reducing inaccuracy in final prediction. This paper aims to evaluate eleven machine learning (ML) algorithms—Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Classification and Regression Trees (CART), Naive Bayes (NB), Support Vector Machine (SVM), XGBoost (XGB), Random Forest Classifier (RF), Gradient Boost (GB), AdaBoost (AB), Extra Tree Classifier (ET)—and six different data scaling methods—Normalization (NR), Standscale (SS), MinMax (MM), MaxAbs (MA), Robust Scaler (RS), and Quantile Transformer (QT) on a dataset comprising of information of patients with heart disease. The result shows that CART, along with RS or QT, outperforms all other ML algorithms with 100% accuracy, 100% precision, 99% recall, and 100% F1 score. The study outcomes demonstrate that the model’s performance varies depending on the data scaling method.

Authors

M. Ahsan

1 Paper

M. Mahmud

1 Paper

P. Saha

1 Paper

References38 items

1

Evaluating the Performance of Eigenface, Fisherface, and Local Binary Pattern Histogram-Based Facial Recognition Methods under Various Weather Conditions

2

Detecting SARS-CoV-2 From Chest X-Ray Using Artificial Intelligence

3

An IoT Framework for Heart Disease Prediction Based on MDCNN Classifier

4

Multilayer perceptron based deep neural network for early detection of coronary heart disease

5

Deep MLP-CNN Model Using Mixed-Data to Distinguish between COVID-19 and Non-COVID-19 Patients

Research Impact

324

Citations

38

References

0

Datasets

5

Kishor Datta Gupta

1 Paper

Z. Siddique

1 Paper

6

COVID-19 Symptoms Detection Based on NasNetMobile with Explainable AI Using Various Imaging Modalities

7

Analysis of the Effect of Data Scaling on the Performance of the Machine Learning Algorithm for Plant Identification

8

An online-learning-based evolutionary many-objective algorithm

9

AnD: A many-objective evolutionary algorithm with angle-based selection and shift-based density estimation

10

A Genetic Algorithm Approach to Optimize Dispatching for A Microgrid Energy System with Renewable Energy Sources

11

An efficient Neuroevolution Approach for Heart Disease Detection

12

Effect of normalization methods on the performance of supervised learning algorithms applied to HTSeq-FPKM-UQ data sets: 7SK RNA expression as a predictor of survival in patients with colon adenocarcinoma

13

A proposal for distinguishing between bacterial and viral meningitis using genetic programming and decision trees

14

Improving Heart Disease Prediction Using Feature Selection Approaches

15

K-Nearest Neighbour Model Optimized by Particle Swarm Optimization and Ant Colony Optimization for Heart Disease Classification

16

Applications of Data Mining Techniques in Healthcare and Prediction of Heart Attacks

17

Improvement of heart attack prediction by the feature selection methods

18

A novel memetic algorithm with a deterministic parameter control for efficient berth scheduling at marine container terminals

19

A Many-Objective Evolutionary Algorithm with Angle-Based Selection and Shift-Based Density Estimation

20

Heart Failure: Diagnosis, Severity Estimation and Prediction of Adverse Events Through Machine Learning Techniques

21

Comparative study of Data Mining Approaches for prediction Heart Diseases

22

Feature selection for medical diagnosis : Evaluation for cardiovascular diseases

23

Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction

24

Decision support system for heart disease based on support vector machine and Artificial Neural Network

25

Effective Diagnosis of Heart Disease through Bagging Approach

26

Medical Decision Support Systems

27

Applications and Evaluations of Bio-Inspired Approaches in Cloud Security: A Review

28

An Optimization Model and Solution Algorithms for the Vehicle Routing Problem With a “Factory-in-a-Box”

29

Comparison of Temporal and Non-Temporal Features Effect on Machine Learning Models Quality and Interpretability for Chronic Heart Failure Patients

30

An ensemble based on distances for a kNN method for heart disease diagnosis

31

Handling imbalanced data: SMOTE vs. random undersampling

32

Systematic Analysis of Applied Data Mining Based Optimization Algorithms in Clinical Attribute Extraction and Classification for Diagnosis of Cardiac Patients

33

Early Prediction of Heart Diseases Using Data Mining Techniques

34

Integrating Decision Tree and K-Means Clustering with Different Initial Centroid Selection Methods in the Diagnosis of Heart Disease Patients

35

Heart Disease Diagnosis using Support Vector Machine

36

SVM Based Decision Support System for Heart Disease Classification with Integer-Coded Genetic Algorithm to Select Critical Features

37

Advances in Health care Technology Care Shaping the Future of Medical

38

Telematics and informatics

Authors

Field of Study

Computer Science

Journal Information

Name

Technologies

Venue Information

Name

Technologies

Type

journal

URL

http://www.e-helvetica.nb.admin.ch/directAccess?callnumber=bel-318028

Alternate Names

  • Technol (basel
  • Technologies (Basel)