Experimenting with ML: Trying out Different Algorithms for One Simple Task | by Ayush Rane

On this article, we’ll stroll by means of methods to code and experiment with completely different machine studying algorithms for coronary heart illness prediction. This step-by-step information will cowl knowledge loading, implementing varied classification fashions, and evaluating their efficiency. Whether or not you’re a newbie or seeking to refine your ML abilities, this information will make it easier to apply a number of algorithms to a real-world dataset in an excellent easy and simple manner.

Downloading the Dataset

Step one to create this mannequin is to search out and obtain a dataset. For this venture, we will likely be utilizing the favored “Coronary heart Illness Dataset” csv file on Kaggle. Kaggle is only a platform that has over 1000’s of various datasets for programmers to make use of.

The hyperlink to the dataset will be discovered right here: https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset.

2. Loading the Dataset into Colab

Subsequent, we have to load the dataset into an IDE. For this venture, I’m utilizing Google Colab. There are two essential methods to load the info:

Importing the file on to Colab
Importing to Google Drive after which mounting it in Colab

I extremely suggest mounting from Drive. While you add the file on to Colab, the dataset is saved in a short lived session, that means that each time the runtime disconnects, you’ll must undergo the tedious technique of re-uploading it. Nevertheless, by mounting your Google Drive, you possibly can entry the dataset anytime with a single command, with no having to attend for reuploads.

Right here’s the method for downloading the dataset to Drive. After downloading and unzipping the csv in your pc, add it to a folder in Drive (I named my folder “Coronary heart Illness Undertaking”).

It ought to look one thing like this:

Now, it’s time to mount the file in Colab. To effectively retailer and visualize the dataset, we’ll use a Pandas DataFrame. To do that, we first have to import each the Pandas library and the Google Drive module.

Right here’s the code to set it up:

After operating this cell, we will likely be prepared to make use of Pandas and Google Drive library capabilities. The drive.mount(“/content material/drive”) line permits Colab to entry your Google Drive, in order that it will probably truly be mounted.

To retailer the dataset as a Pandas DataFrame, all we have now to do is run this following line of code:

We set the DataFrame variable as df, and by utilizing pd.read_csv, the compiler is aware of to transform the csv right into a DataFrame object.

Right here’s what the DataFrame appears like:

The columns signify options, or well being knowledge for every affected person that results in the ‘goal’ column, the place 1 stands for “has coronary heart illness” and 0 stands for “doesn’t have coronary heart illness”.

3. Establishing the Information

Now that we’ve efficiently loaded and saved the dataset, it’s time to arrange it for mannequin coaching. Information preparation sometimes includes a number of preprocessing steps, equivalent to dealing with lacking values, encoding categorical variables, and normalizing knowledge factors. However since we’re coping with a really clear and streamlined dataset, we will skip all of those steps, and get to splitting the info into coaching and testing units. The coaching set is what the mannequin will be taught from, and the testing set will likely be what we what efficiency will likely be evaluated on.

To start, we have to set the variable X to all of the function knowledge (the inputs) that result in the goal column, which signifies whether or not or not the affected person has coronary heart illness. This permits the mannequin to be taught from the options. Consequently, we set the y variable to the goal knowledge for every affected person, which represents the result we wish the mannequin to foretell (e.g., coronary heart illness current or not).

Right here’s the code to implement this:

axis = 1 simply instructs to not embody goal column (1 refers to vertical path) within the X variable.

After defining X and y variables, we will break up into coaching and testing units, utilizing the next code and the scikit-learn library.

Right here’s what every a part of this code means:

X_train and y_train signify the coaching knowledge that the mannequin will be taught from.

X_test and y_test signify the testing knowledge that the mannequin will likely be evaluated on.

test_size=0.2: This implies we allocate 20% of the info for testing, and the remaining 80% is used for coaching.

random_state=42: This ensures that the info break up is constant each time you run the code, so the outcomes are reproducible.

4. Implementing and Testing the Fashions

Lastly, it’s time for the very best half — coding the fashions. We will likely be testing out 6 completely different algorithms —Logistic Regression, Assist Vector Machines, Random Forests, XGBoost, Naive Bayes, and Determination Timber. Code and the accuracy worth for every algorithm will likely be introduced. If you would like deeper element about how precisely a few of these algorithms work, try my different put up: “A Newbie’s Information to Machine Studying Algorithms: Understanding the Key Strategies”.

Linear Regression

Right here’s the code to implement Logistic Regression to categorise sufferers likeliness for Coronary heart Illness:

Let’s break it down.

Importing Libraries:
LogisticRegression is imported from sklearn for creating the mannequin.
accuracy_score is imported to judge mannequin efficiency.
Creating the Mannequin:
LR = LogisticRegression() creates an occasion of the Logistic Regression mannequin.
Coaching the Mannequin:
LR.match(X_train, y_train) trains the mannequin on the coaching knowledge (options and goal).
Making Predictions:
y_pred = LR.predict(X_test) generates predictions on the take a look at knowledge.
Evaluating Accuracy:
accuracy = accuracy_score(y_test, y_pred) calculates what number of predictions match the precise outcomes.
Output:
print(f"Accuracy: {accuracy}") shows the accuracy of the mannequin’s efficiency on the take a look at knowledge.

The premise of this code, from calling the match() operate to the variables used for coaching the mannequin, are the very same for all of the algorithms. The one distinction is subbing within the mannequin of selection, like utilizing Logistic Regression() for logistic regression, SVC() for help vector machines, or RandomForestClassifier() for random forests. This constant construction lets you simply experiment with completely different fashions by simply swapping out the classifier whereas retaining the remainder of the code intact. With that being mentioned, right here is the code for the remainder of the algorithms.

Assist Vector Machine

Random Forests

XGBoost

Naive Bayes

Determination Timber

5. Evaluating Efficiency

After operating and evaluating all of the algorithms, we will observe how every one performs based mostly on the guts illness prediction job. On this particular case, Random Forest got here out on high, providing the very best accuracy. Random Forests are likely to carry out nicely with advanced datasets like this one as a result of they will deal with a mixture of options, seize non-linear relationships, and scale back overfitting by averaging a number of choice bushes.

It’s necessary to notice, nonetheless, that mannequin efficiency can differ relying on the info, drawback at hand, and hyperparameters used. Whereas Random Forest labored finest for this job, in different eventualities or with various kinds of knowledge, fashions like Logistic Regression, SVM, or Naive Bayes’ would possibly carry out higher, particularly for easier, extra linear issues. At all times take into account experimenting with a number of fashions and tuning hyperparameters to search out the very best match in your particular job.

6. Conclusion

On this information, we experimented with completely different machine studying fashions for predicting coronary heart illness. We explored Logistic Regression, SVM, Random Forests, XGBoost, Naive Bayes, and Determination Timber. The overall course of for coaching and testing remained the identical throughout all fashions — simply swapping out the algorithm.

Every mannequin has its strengths and works in another way relying on the info and job at hand. There’s no particular “finest algorithm” so it’s all the time a good suggestion to strive a number of fashions, and see what works finest in your particular drawback and dataset.

And with that, you’ve reached the tip of this text. Bear in mind to maintain these ideas in thoughts as you experiment with your personal initiatives!

Source link

Optimizing ML Costs with Azure Machine Learning | by Joshua Fox | Aug, 2025

Top Tools and Skills for AI/ML Engineers in 2025 | by Raviishankargarapti | Aug, 2025

How to Fine-Tune Large Language Models for Real-World Applications | by Aurangzeb Malik | Aug, 2025

Unfiltered Roleplay AI Chatbots with Pictures – My Top Picks

I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

Amazon and eBay to pay ‘fair share’ for e-waste recycling

Artificial Intelligence Concerns & Predictions For 2025

Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

Most Popular

Rethinking How We Teach Programming in the Age of Generative AI | by Cristian Guillen | Jun, 2025

That Time Our AI Turned Corporate Spy (And How We Caught It) | by Sneha Rani | Jun, 2025

TPOT-Clustering. Clustering performance is highly… | by Matheus Camilo | May, 2025

Our Picks

Unfiltered Roleplay AI Chatbots with Pictures – My Top Picks

Optimizing ML Costs with Azure Machine Learning | by Joshua Fox | Aug, 2025

Why Teams Rely on Data Structures

Experimenting with ML: Trying out Different Algorithms for One Simple Task | by Ayush Rane | Apr, 2025

Related Posts