Naive Bayes Multi-Classifiers for Mixed Data Types | by Kuriko

Allow us to discover the multi-classifier strategy.

Objective

Predict the value vary (0–3 (high-end)) from given smartphone specs

Method

Analyze enter options distribution varieties
Select classifiers
Mix and finalize the outcomes
Consider the outcomes

Dataset

Mobile Price Classification, Kaggle

3,000 datasets
21 columns:
‘battery_power’, ‘blue’, ‘clock_speed’, ‘dual_sim’, ‘fc’, ‘four_g’, ‘int_memory’, ‘m_dep’, ‘mobile_wt’, ‘n_cores’, ‘computer’, ‘px_height’, ‘px_width’,‘ram’, ‘sc_h’, ‘sc_w’, ‘talk_time’, ‘three_g’, ‘touch_screen’, ‘wifi’, ‘price_range’, ‘id’

Visualizing knowledge

After eradicating pointless column (`id`), we’ll plot frequency histograms and Quantile-Quantile plots (Q-Q plot) over the traditional distribution by enter options:

Frequency Histogram and QQ Plots by Enter Characteristic

After resampling, we secured 250K knowledge factors per class:

Creating practice/take a look at knowledge

X = df.drop('price_range', axis=1)
y = df['price_range']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)
print(X_train.form, y_train.form, X_test.form, y_test.form)

(898186, 20) (898186,) (99799, 20) (99799,)

We’ll practice the next fashions based mostly on our (x|y) distributions:

GaussianNB
BernoulliNB
MultinomialNB

First, classifying the enter options into binary, multinomial, and gaussian distribution:

binary = [
'blue', 
'dual_sim',
'four_g', 
'three_g',
'touch_screen',
'wifi'
]
multinomial = [
'fc', 
'pc', 
'sc_h',
'sc_w'
]
gaussian = df.copy().drop(columns=[*target, *binary, *categorical, *multinomial], axis='columns').columns

Prepare every NB mannequin with corresponding enter options

def train_nb_classifier(mannequin, X_train, X_test, model_name):
mannequin.match(X_train, y_train)
chances = mannequin.predict_proba(X_test)
y_pred = np.argmax(chances, axis=1)
accuracy = accuracy_score(y_test, y_pred)
print(f'--------- {model_name} ---------')
print(f"Averaged Chance Ensemble Accuracy: {accuracy:.4f}")
print(classification_report(y_test, y_pred))
return y_pred, chances, mannequin

# gaussian
scaler = MinMaxScaler()
X_train_gaussian_scaled = scaler.fit_transform(X_train[gaussian])
X_test_gaussian_scaled = scaler.remodel(X_test[gaussian])
y_pred_gnb, prob_gnb, gnb  = train_nb_classifier(mannequin=GaussianNB(), X_train=X_train_gaussian_scaled, X_test=X_test_gaussian_scaled, model_name='Gaussian')# bernoulli
y_pred_bnb, prob_bnb, bnb  = train_nb_classifier(mannequin=BernoulliNB(), X_train=X_train[binary], X_test=X_test[binary], model_name='Bernoulli')# multinomial
y_pred_mnb, prob_mnb, mnb = train_nb_classifier(mannequin=MultinomialNB(), X_train=X_train[multinomial], X_test=X_test[multinomial], model_name='Multinomial')

Word that we solely remodeled the gaussian knowledge to keep away from skewing different knowledge varieties.

Combining outcomes

Combining the outcomes utilizing common and weighted common:

# mixed (common)
prob_averaged = (prob_gnb + prob_bnb + prob_mnb) / 3
y_pred_averaged = np.argmax(prob_averaged, axis=1)
accuracy = accuracy_score(y_test, y_pred_averaged)
print('--------- Common ---------')
print(f"Averaged Chance Ensemble Accuracy: {accuracy:.4f}")
print(classification_report(y_test, y_pred_averaged))

# mixed (weight common)
weight_gnb = 0.9 # larger weight
weight_bnb = 0.05
weight_mnb = 0.05
prob_weighted_average = (weight_gnb * prob_gnb + weight_bnb * prob_bnb + weight_mnb * prob_mnb)
y_pred_weighted = np.argmax(prob_weighted_average, axis=1)
accuracy_weighted = accuracy_score(y_test, y_pred_weighted)
print('--------- Weighted Common ---------')
print(f"Weighted Averaged Ensemble Accuracy: {accuracy_weighted:.4f}")
print(classification_report(y_test, y_pred_weighted))

Accuracy experiences (common, weight common)

Stacking

Optionally, we’ll stack the outcomes with Logistic Regression as meta-learner.

LR is without doubt one of the widespread meta-learner choices as a result of its simplicity, interpretability, effectiveness with likelihood inputs from classifiers, and regularization.

X_meta_test = np.hstack((prob_gnb, prob_bnb, prob_mnb))

prob_train_gnb = gnb.predict_proba(X_train_gaussian_scaled)
prob_train_bnb = bnb.predict_proba(X_train[binary])
prob_train_mnb = mnb.predict_proba(X_train[multinomial])
X_meta_train = np.hstack((prob_train_gnb, prob_train_bnb, prob_train_mnb))
meta_learner = LogisticRegression(random_state=42, solver='liblinear', multi_class='auto')
meta_learner.match(X_meta_train, y_train)
y_pred_stacked = meta_learner.predict(X_meta_test)
prob_stacked  = meta_learner.predict_proba(X_meta_test)
accuracy_stacked = accuracy_score(y_test, y_pred_stacked)
print('--------- Meta learner (logistic regression) ---------')
print(f"Stacked Ensemble Accuracy: {accuracy_stacked:.4f}")
print(classification_report(y_test, y_pred_stacked))

Stacking performs the perfect, whereas Multinomial and Bernoulli individually weren’t environment friendly predictors of the ultimate end result.

That is primarily as a result of argmax operation the place the mannequin chooses a single class with the best likelihood as its remaining determination.

Within the course of, the underlying likelihood distributions from Multinomial and Bernoulli are disregarded. Therefore, these particular person fashions will not be “environment friendly” on their very own to provide one extremely assured prediction.

But, once we mixed the outcomes with the meta-learner, it exploited extra info on such distributions from Multinomial and Bernoulli in a stacking ensemble.

Source link

Why PDF Extraction Still Feels LikeHack

🚗 Predicting Car Purchase Amounts with Neural Networks in Keras (with Code & Dataset) | by Smruti Ranjan Nayak | Jul, 2025

Reinforcement Learning in the Age of Modern AI | by @pramodchandrayan | Jul, 2025

Why PDF Extraction Still Feels LikeHack

I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

Amazon and eBay to pay ‘fair share’ for e-waste recycling

Artificial Intelligence Concerns & Predictions For 2025

Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

Most Popular

Rubin Observatory: How It Works, and First Images

Step-by-Step Guide to Text Cleaning with Python 🧼🧹 | by Pawan Yadav | Jun, 2025

Trump Announces Major AI Infrastructure Investment Involving Softbank, OpenAI, and Oracle | by Olena Chyrkova | Major Digest | Jan, 2025

Our Picks

Why PDF Extraction Still Feels LikeHack

GenAI Will Fuel People’s Jobs, Not Replace Them. Here’s Why

Millions of websites to get ‘game-changing’ AI bot blocker

Naive Bayes Multi-Classifiers for Mixed Data Types | by Kuriko | May, 2025

Visualizing knowledge

Prepare every NB mannequin with corresponding enter options

Combining outcomes

Related Posts