Heart Disease Prediction using Machine Learning — A Step-by-Step Guide with Python | by Navya Jain

By Navya Jain

Heart problems stays one of many main causes of mortality globally. Well timed prediction and analysis are essential for efficient intervention and saving lives. This mission leverages machine studying strategies to foretell the probability of coronary heart illness utilizing a dataset comprising numerous medical attributes.

We’ll carry out exploratory knowledge evaluation, visualize key patterns, preprocess the dataset, practice a number of machine studying fashions, and assess their efficiency to establish the best method.

Github Repository: https://github.com/navyajain7105/Heart-Disease-Prediction-using-Machine-Learning

We use a publicly accessible Coronary heart Illness Dataset from the UCI repository. It comprises 303 affected person information and the next key options:

age, intercourse, cp (chest ache sort)
trestbps (resting blood strain)
chol (ldl cholesterol)
fbs (fasting blood sugar)
restecg, thalach, exang, and so on.
goal: 1 = illness current, 0 = no illness

Dataset: https://drive.google.com/file/d/1Yq2jySiAgl8tWx_NM7X0S1xgTF39Vdif/view?usp=sharing

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn import svm
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from keras.fashions import Sequential
from keras.layers import Dense

from google.colab import information
uploaded = information.add()
dataset = pd.read_csv("coronary heart.csv")
dataset.form
dataset.head(5) # to get 5 rows from prime
dataset.describe()
dataset.data()

We verify for lacking values and knowledge varieties. The dataset is already clear — no null values!

We wish to see how balanced the courses are:

y = dataset["target"]# seaborn as sns countplot means barplot
sns.countplot(x=y)
target_temp = dataset.goal.value_counts()
# counts worth of every uniue worth
print(target_temp)

Output: Round 165 sufferers have coronary heart illness, whereas 138 don’t — pretty balanced.

print(dataset.corr()["target"].abs().sort_values(ascending=False))

This reveals relationships between variables — like exang, cp, oldpeak, and thalach having robust correlation with goal.

I’ve used a wide range of Machine Studying algorithms, carried out in Python, to foretell the presence of coronary heart illness in a affected person. This can be a classification downside, with enter options as a wide range of parameters, and the goal variable as a binary variable, predicting whether or not coronary heart illness is current or not.

Prepare and Check Cut up

from sklearn.model_selection import train_test_splitpredictors = dataset.drop("goal",axis=1)
# eradicating final column
goal = dataset["target"]
X_train,X_test,Y_train,Y_test = train_test_split(predictors,goal,test_size=0.20,random_state=0)

Logistic Regression:

lr = LogisticRegression()lr.match(X_train,Y_train)
# The mannequin learns the connection between the options (predictors) and the goal.
Y_pred_lr = lr.predict(X_test)
# Makes predictions on the check knowledge (X_test) utilizing the skilled mannequin and shops the leads to Y_pred_lr.

Naive Bayes:

nb = GaussianNB()nb.match(X_train,Y_train)
Y_pred_nb = nb.predict(X_test)

SVM:

sv = svm.SVC(kernel='linear')
# Initializes an SVM classifier(SVC) with a linear kernel. This implies the choice boundary is a straight line (hyperplane) that separates the courses.
sv.match(X_train, Y_train)
# finds optimum hyperplane
Y_pred_svm = sv.predict(X_test)

Ok Nearest Neighbors:

knn = KNeighborsClassifier(n_neighbors=7)
# KNN classifier with 7 neighbors (i.e., the mannequin will use the 7 nearest neighbors to categorise a knowledge level).
knn.match(X_train,Y_train)
Y_pred_knn=knn.predict(X_test)

Choice Tree:

max_accuracy = 0# implementing a loop to tune the random state of the Choice Tree classifier to search out the one that offers the best accuracy.
for x in vary(200):
dt = DecisionTreeClassifier(random_state=x)
# For every worth of x, you are creating a brand new DecisionTreeClassifier with a unique random_state to verify how the mannequin performs with completely different initializations.
dt.match(X_train,Y_train)
Y_pred_dt = dt.predict(X_test)
current_accuracy = spherical(accuracy_score(Y_pred_dt,Y_test)*100,2)
if(current_accuracy>max_accuracy):
max_accuracy = current_accuracy
best_x = x
#print(max_accuracy)
#print(best_x)
dt = DecisionTreeClassifier(random_state=best_x)
dt.match(X_train,Y_train)
Y_pred_dt = dt.predict(X_test)

Random Forest:

max_accuracy = 0for x in vary(2000):
rf = RandomForestClassifier(random_state=x)
rf.match(X_train,Y_train)
Y_pred_rf = rf.predict(X_test)
current_accuracy = spherical(accuracy_score(Y_pred_rf,Y_test)*100,2)
if(current_accuracy>max_accuracy):
max_accuracy = current_accuracy
best_x = x
#print(max_accuracy)
#print(best_x)
rf = RandomForestClassifier(random_state=best_x)
rf.match(X_train,Y_train)
Y_pred_rf = rf.predict(X_test)

Neural Community:

mannequin = Sequential()
# Initializes a sequential mannequin. This implies layers are added one after one other, which is the commonest sort of neural community structure.mannequin.add(Dense(11,activation='relu',input_dim=13))
# First hidden layer:
# Dense(11): This implies the layer has 11 neurons.
# activation='relu': The activation perform is ReLU (Rectified Linear Unit), which is often utilized in hidden layers to introduce non-linearity and assist the mannequin study complicated patterns.
# input_dim=13: Specifies that the enter to the mannequin can have 13 options (this matches your dataset’s variety of options).
mannequin.add(Dense(1,activation='sigmoid'))
# Output layer:
# Dense(1): A single output neuron, which is typical for binary classification duties (e.g., predicting 0 or 1).
# activation='sigmoid': The sigmoid activation is used within the output layer for binary classification. It squashes the output between 0 and 1, representing the chance of the constructive class.
mannequin.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
# Compile the mannequin:
# loss='binary_crossentropy': The binary cross-entropy loss perform is used for binary classification issues. It calculates the distinction between the true and predicted possibilities for the constructive class.
# optimizer='adam': Adam optimizer is an environment friendly gradient-based optimization algorithm that adapts the educational price throughout coaching.
# metrics=['accuracy']: You are monitoring accuracy as a metric throughout coaching and analysis.
mannequin.match(X_train,Y_train,epochs=300)

rounded = [round(x[0]) for x in Y_pred_nn]
# For binary classification, the place you are attempting to transform predicted possibilities (e.g., 0.72) into arduous class labels (0 or 1).
Y_pred_nn = rounded

The accuracy for every mannequin — whether or not it’s Logistic Regression, Choice Tree, Neural Community, or another — is calculated utilizing the identical format:

score_model = spherical(accuracy_score(Y_pred_model, Y_test) * 100, 2)print("The accuracy rating achieved utilizing Mannequin is: "+str(score_model)+" %")

We evaluate mannequin performances utilizing accuracy.

The accuracy rating achieved utilizing Logistic Regression is: 85.25 %
The accuracy rating achieved utilizing Naive Bayes is: 85.25 %
The accuracy rating achieved utilizing Help Vector Machine is: 81.97 %
The accuracy rating achieved utilizing Ok-Nearest Neighbors is: 67.21 %
The accuracy rating achieved utilizing Choice Tree is: 81.97 %
The accuracy rating achieved utilizing Random Forest is: 90.16 %
The accuracy rating achieved utilizing Neural Community is: 81.97 %

This mission demonstrates the facility of machine studying in healthcare. With a clear dataset, correct preprocessing, and mannequin tuning, we will obtain over 90% by Random Forest, adopted by 85% accuracy by Logistic Regression and Naive Bayes in predicting coronary heart illness.

Source link

Is Your AI Whispering Secrets? How Scientists Are Teaching Chatbots to Forget Dangerous Tricks | by Andreas Maier | Jul, 2025

Blazing-Fast ML Model Serving with FastAPI + Redis (Boost 10x Speed!) | by Sarayavalasaravikiran | AI Simplified in Plain English | Jul, 2025

From Training to Drift Monitoring: End-to-End Fraud Detection in Python | by Aakash Chavan Ravindranath, Ph.D | Jul, 2025

Revisiting Benchmarking of Tabular Reinforcement Learning Methods

I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

Amazon and eBay to pay ‘fair share’ for e-waste recycling

Artificial Intelligence Concerns & Predictions For 2025

Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

Most Popular

Will it contribute to employee burnout?

Demystifying AI: Understanding What Lies Beyond Machine Learning | by Chandra Prakash Tekwani | May, 2025

Revolutionizing Traffic Management: Introducing Wisepl’s Precision Vehicle Queue Detection Annotation Service | by Wisepl | May, 2025

Our Picks