GDGOC COMSATS Attock ML/DL Fellowship: Week 4 — Building a Content-Based Movie Recommendation System. | by Mushaf Khalil

In Week 4 of the GDGOC COMSATS Attock ML/DL Fellowship, our staff launched into an thrilling challenge: creating a content-based film suggestion system. This endeavour allowed us to use our machine-learning data to a real-world software, enhancing our understanding and expertise.

We have been lucky to be guided by our mentor, Nimra Waqar, a devoted and educated chief within the subject of machine studying and deep studying. Her experience and help have been instrumental in navigating the complexities of the challenge and guaranteeing our staff’s success.

We aimed to create a system that recommends films based mostly on content material similarity. We utilized the TMDB 5000 Film Dataset, specializing in options like genres, key phrases, forged, and crew to find out similarities between movies.

The challenge was a collaborative effort amongst 4 staff members:

Mushaf Khalil: Led the advice engine growth, together with vectorization and similarity computations.
Eman Noor: Managed knowledge preprocessing and have engineering.
Maaz Masood: Developed the Streamlit software for person interplay.
Meher Ali: Dealt with the backend and frontend integration.

Every member’s contribution was important to the challenge’s success.

Information Preprocessing

We merged the flicks and credit datasets on the ‘title’ column and extracted related data from nested JSON fields. By combining the overview, genres, key phrases, forged, and director right into a single ‘tags’ characteristic, we created a complete illustration of every film’s content material. Textual content cleansing processes, reminiscent of changing to lowercase and eradicating cease phrases, have been utilized to standardize the info.

Function Extraction and Similarity Computation

Utilizing TF-IDF vectorization, we remodeled the ‘tags’ into numerical vectors, capturing the significance of phrases in every film’s context. Cosine similarity was then calculated between these vectors to measure the closeness between movies, enabling efficient suggestions.

Mannequin Serialization

To streamline deployment, we serialized the required elements:

import pickle
# Bundle the flicks DataFrame and similarity matrix
data_bundle = {
'films': films,
'similarity': similarity
}
# Save to a single pickle file
with open('movie_recommender_data.pkl', 'wb') as f:
pickle.dump(data_bundle, f)

This strategy facilitated environment friendly loading of the mannequin throughout deployment.

We developed an interactive net software utilizing Streamlit to offer customers with an intuitive interface for locating film suggestions.

Options

Film Choice: A dropdown menu permits customers to pick a film title from the dataset.
Suggestion Show: Upon choice, the system shows the highest 5 related films.
Visible Enhancements: Included film posters and temporary overviews to complement the person expertise.

Pattern Code Snippet

import streamlit as st
import pickle
# Load serialized knowledge
with open('movie_recommender_data.pkl', 'rb') as f:
knowledge = pickle.load(f)
films = knowledge['movies']
similarity = knowledge['similarity']
def advocate(movie_title):
if movie_title not in films['title'].values:
return "Film not discovered within the dataset."
movie_index = films[movies['title'] == movie_title].index[0]
distances = listing(enumerate(similarity[movie_index]))
beneficial = sorted(distances, key=lambda x: x[1], reverse=True)[1:6]
consequence = []
for i in beneficial:
consequence.append(films.iloc[i[0]]['title'])
return consequence
# Streamlit UI
st.title('Film Suggestion System')
selected_movie = st.selectbox('Choose a film:', films['title'].values)
if st.button('Advocate'):
suggestions = advocate(selected_movie)
for film in suggestions:
st.write(film)

This setup offers customers with an intuitive interface to find films much like their preferences.

Information Preprocessing is Essential: The standard of suggestions closely relies on the standard of knowledge preprocessing and have extraction.
Mannequin Optimization: Balancing between mannequin complexity and efficiency is crucial to make sure responsiveness within the net software.
Consumer Expertise Issues: A easy and intuitive person interface enhances person engagement and satisfaction.
Collaboration Enhances Studying: Working as a staff allowed us to share data, divide duties successfully, and study from one another’s experiences.

Wanting forward, we goal to include the next enhancements:

Incorporate Collaborative Filtering: Combining content-based filtering with collaborative filtering might enhance suggestion accuracy.
Embrace Consumer Rankings: Integrating person scores can assist personalize suggestions additional.
Improve UI: Displaying film posters, genres, and overviews can present customers with extra context for every suggestion.
Deploy on Cloud Platforms: Internet hosting the applying on cloud platforms like Heroku or AWS could make it accessible to a broader viewers.

I lengthen my honest gratitude to our mentor, Nimra Waqar, for her invaluable steerage and help all through the challenge. Her insights and encouragement have been instrumental in our success. I additionally admire the collaborative efforts of my fellow staff members, whose dedication and experience made this challenge a rewarding expertise.

Source link

Why PDF Extraction Still Feels LikeHack

🚗 Predicting Car Purchase Amounts with Neural Networks in Keras (with Code & Dataset) | by Smruti Ranjan Nayak | Jul, 2025

Reinforcement Learning in the Age of Modern AI | by @pramodchandrayan | Jul, 2025

Musk’s X appoints ‘king of virality’ in bid to boost growth

I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

Amazon and eBay to pay ‘fair share’ for e-waste recycling

Artificial Intelligence Concerns & Predictions For 2025

Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

Most Popular

Remembering Former IEEE President Emerson Pugh

Artificial Intelligence Concerns & Predictions For 2025

Beginner’s Guide to Creating a S3 Storage on AWS

Our Picks

Musk’s X appoints ‘king of virality’ in bid to boost growth

Why Entrepreneurs Should Stop Obsessing Over Growth

Implementing IBCS rules in Power BI

GDGOC COMSATS Attock ML/DL Fellowship: Week 4 — Building a Content-Based Movie Recommendation System. | by Mushaf Khalil | Apr, 2025

Information Preprocessing

Function Extraction and Similarity Computation

Mannequin Serialization

Options

Pattern Code Snippet

Related Posts