สร้างโมเดลเพิ่มความแม่นยำในการพยากรณ์ยอดการใช้จ่าย: จากการถดถอยเชิงเส้นสู่เทคนิคการถดถอยขั้นสูง (Enhancing Spending Predictions: From Linear to Advanced Regression) | by Data Project 67

สรุปสิ่งที่พบในตัวแปรหลังการทำ EDAจาก Chart ด้านบน

ความสัมพันธ์ระหว่าง Earnings และ Household Measurement
จากการตรวจสอบพบว่า มีความสัมพันธ์ในเชิงลบ (Unfavourable Correlation) คือ ยิ่งรายได้ (Earnings) สูงขึ้น กลับพบว่าขนาดของครอบครัว (Household Measurement) จะมีแนวโน้มลดลง นี่อาจสะท้อนถึงแนวโน้มของคนที่มีรายได้สูงที่อาจเลือกที่จะมีครอบครัวขนาดเล็กหรือไม่มีลูก ซึ่งอาจมีผลต่อพฤติกรรมการใช้จ่าย เช่น การซื้อไวน์ในบางกรณี
ผลกระทบของ Household Measurement ต่อการซื้อไวน์
จาก Bar plot พบว่า มีแนวโน้มที่คนในครอบครัวใหญ่จะซื้อไวน์น้อยลง โดยอาจเป็นเพราะครอบครัวใหญ่มีค่าใช้จ่ายอื่น ๆ ที่จำเป็นมากขึ้น เช่น ค่าเลี้ยงดูเด็ก หรือค่าใช้จ่ายที่เกี่ยวข้องกับสมาชิกในครอบครัว ดังนั้นพวกเขาจึงมีแนวโน้มที่จะใช้จ่ายกับสินค้าฟุ่มเฟือยอย่างไวน์น้อยลง
ความสัมพันธ์ระหว่างอายุและการใช้จ่ายในไวน์
จาก Violin plot พบว่าผู้ที่มียอดใช้จ่ายสูงสุดในไวน์มักจะเป็น กลุ่มคนสูงอายุและเด็ก ขณะที่ วัยกลางคน มีแนวโน้มที่จะซื้อไวน์น้อยที่สุด ซึ่งอาจสะท้อนถึงความชื่นชอบในไวน์ที่แตกต่างกันตามช่วงอายุ เช่น คนสูงอายุอาจมองว่าไวน์เป็นเครื่องดื่มที่เหมาะสมสำหรับการพบปะสังสรรค์หรืองานเฉลิมฉลอง ส่วนเด็กอาจมีการใช้จ่ายในไวน์จากการเป็นส่วนหนึ่งของกิจกรรมต่าง ๆ ที่มีผู้ใหญ่เป็นส่วนใหญ่
Correlation Matrix
ผลการวิเคราะห์ Correlation Matrix แสดงให้เห็นว่า Earnings มีความสัมพันธ์เชิงบวกสูงกับการใช้จ่ายในไวน์ (Corr = 0.73) ซึ่งหมายความว่า ยิ่งมีรายได้สูงขึ้น ยิ่งมีแนวโน้มใช้จ่ายกับไวน์มากขึ้น ในขณะที่ตัวแปรอื่น ๆ ส่วนใหญ่แสดงค่าความสัมพันธ์ในเชิงลบ (Unfavourable Correlation) ซึ่งสะท้อนว่าเมื่อค่าหนึ่งเพิ่มขึ้น อีกค่าหนึ่งจะลดลง

การสรุปผล EDA
จากการวิเคราะห์ข้อมูลเบื้องต้นพบว่า ข้อมูลมีแนวโน้มที่เกี่ยวข้องกับการทำนายการใช้จ่ายในไวน์ โดยเฉพาะในกรณีของ Earnings ที่มีความสัมพันธ์เชิงบวกกับการใช้จ่ายในไวน์ และตัวแปรบางตัวที่มีความสัมพันธ์เชิงลบกับยอดใช้จ่ายในไวน์ อย่างไรก็ตามบางตัวแปร เช่น Household Measurement อาจไม่ส่งผลต่อการทำนายการซื้อไวน์ได้อย่างมีนัยสำคัญ เพราะอาจมีความสัมพันธ์ในเชิงลบที่ไม่ตรงกับพฤติกรรมการซื้อไวน์

คำถามที่น่าสนใจ
จากการวิเคราะห์นี้ มีคำถามที่น่าสนใจคือ การใช้ข้อมูลที่มีความสัมพันธ์เชิงลบกับการซื้อไวน์ในการสร้างโมเดล Regression อาจจะไม่เหมาะสม หรือ จะมีวิธีการอื่นที่สามารถปรับปรุงโมเดลให้ดีขึ้นได้หรือไม่? เนื่องจากบางตัวแปรอาจไม่ส่งผลต่อการทำนายการใช้จ่ายในไวน์อย่างมีนัยสำคัญ ดังนั้น การเลือกตัวแปรและการเตรียมข้อมูลก่อนที่จะสร้างโมเดล Regression อาจต้องพิจารณาให้รอบคอบเพื่อให้ได้ผลลัพธ์ที่มีความแม่นยำสูงสุด. ซึ่งเราจะมาคำตอบกันในการทำ Mannequin ต่อไป

4 ) Characteristic Engineering

Characteristic Choice: การเลือกเฉพาะคอลัมน์ที่มีแนวโน้มความสัมพันธ์เชิงเหตุผลเพื่อใช้ในการทำนาย แทนการใช้เฉพาะคอลัมน์ที่มีความสัมพันธ์เชิงสถิติ ซึ่งจะช่วยให้โมเดลมีประสิทธิภาพและความหมายมากขึ้น

# rename Col for simpler to beneath standing
df =df.rename(columns={"MntWines": "Wines","MntFruits":"Fruits","MntMeatProducts":"Meat","MntFishProducts":"Fish","MntSweetProducts":"Sweets","MntGoldProds":"Gold"})# Characteristic choice
# Dropping among the redundant options
to_drop = ['Marital_Status',"Dt_Customer", "Year_Birth", "ID",'Fruits', 'Meat', 'Fish', 'Sweets', 'Gold', 'NumDealsPurchases',
'NumWebPurchases', 'NumCatalogPurchases', 'NumStorePurchases','NumWebVisitsMonth', 'AcceptedCmp3', 'AcceptedCmp4', 'AcceptedCmp5',
'AcceptedCmp1', 'AcceptedCmp2', 'Complain', 'Response','Recency',
'Age_Group', 'age_seg']
df = df.drop(to_drop, axis=1)

Prepare-Take a look at Cut up: แบ่งข้อมูลเป็นชุดฝึก (practice) และชุดทดสอบ (take a look at) ในอัตราส่วน 80:20 ซึ่งเป็นการแบ่งที่เหมาะสมเพื่อให้มีข้อมูลเพียงพอในการฝึกโมเดลและทดสอบประสิทธิภาพของโมเดล

# Create Characteristic and Goal
X = ds.drop(columns=['Wines'])
y = ds['Wines']# Prepare take a look at cut up
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=52)
print(X_train.form, X_test.form, y_train.form, y_test.form)
print(X_train.columns)
print(X_test.columns)
print(y_train.identify)
print(y_test.identify)

Characteristic Transformation: ใช้การ Ordinal Encoding สำหรับคอลัมน์ [‘Education’, ‘Living_With’] เนื่องจากข้อมูลในคอลัมน์เหล่านี้มีลักษณะเป็นลำดับ (เช่น การศึกษา: diploma, grasp’s, doctoral) และจำนวนคนที่อาศัยอยู่ร่วมกัน (Residing With ; Accomplice or Alone) ซึ่งสามารถจัดเป็นลำดับได้ โดยเปลี่ยนค่าเป็น 1, 2 ตามลำดับ

# Ordinal encoding
column_transformer = ColumnTransformer(
[('encoder', OrdinalEncoder(), ['Education', 'Living_With'])],
the rest='passthrough')X_train = column_transformer.fit_transform(X_train)
X_test = column_transformer.rework(X_test)

Polynomial Options: ใช้การเพิ่มคุณสมบัติพหุนามร่วมกับการเรียนรู้เครื่อง (Machine Studying) โดยใช้ Pipeline เพื่อทำให้กระบวนการมีประสิทธิภาพและง่ายต่อการจัดการ โดยจะปรากฎในส่วนของ Half ต่อไปครับ

5 ) การสร้างโมเดลทำนาย (Construct Mannequin Machine Studying)

วันนี้เราจะมาทำโมเดล Machine Studying (ML) ด้วยการใช้หลายประเภทของ Regression ได้แก่ Ridge Regression, Lasso Regression, ElasticNet Regression, และ Easy Linear Regression โมเดลเหล่านี้ใช้สำหรับการทำนายค่าตัวแปรเชิงปริมาณ (Steady variable) ตามที่กล่าวไปด้านบน

สื่งหนึ่งที่ควรรู้ก่อนการสร้างโมเดล หากสร้างแล้วโมเดลจะ ส่งผลอะไรมาบ้าง ? และ ใช้ Analysis ยังไง ?

การประเมินประสิทธิภาพของโมเดลการทำนายสามารถทำได้โดยใช้ R² (R-squared) และ MSE (Imply Squared Error)

R² (R-squared): คือค่าอธิบายความสามารถของโมเดลในการอธิบายความแปรปรวนของข้อมูล ตัวเลขนี้อยู่ระหว่าง 0 ถึง 1 หากค่าใกล้ 1 หมายความว่าโมเดลสามารถอธิบายข้อมูลได้ดีมาก

การตีความ: ค่า R² สูงแสดงว่าโมเดลสามารถทำนายได้ดี แต่หากค่า R² ต่ำแสดงว่าโมเดลอาจไม่เหมาะสมกับข้อมูล

MSE (Imply Squared Error): คือการคำนวณค่าเฉลี่ยของค่าผิดพลาด (errors) ที่ถูกยกกำลังสอง ซึ่งหมายความว่าโมเดลที่มี MSE ต่ำจะมีความแม่นยำสูง

การตีความ: MSE ต่ำแสดงว่าโมเดลทำนายได้แม่นยำ ขณะที่ MSE สูงแสดงว่าโมเดลมีค่าผิดพลาดมาก

การใช้ R² และ MSE ช่วยให้เราสามารถตัดสินใจได้ว่าโมเดลไหนเหมาะสมที่สุดในการนำไปใช้งานจริง

Step to Construct ML with Regression & Comparability of all finest apply Mannequin

Pattern Linear Regression

Code Rationalization (Linear Regression)

Mannequin Initialization and Coaching:

linear_model = LinearRegression(): สร้างโมเดล Linear Regression
linear_model.match(X_train, y_train): ฝึกโมเดลด้วย Coaching Set (X_train, y_train)

Coaching Set Prediction:

y_pred_train = linear_model.predict(X_train): ทำนายค่าผลลัพธ์ของ Coaching Set

Take a look at Set Prediction:

y_pred_linear = linear_model.predict(X_test): ทำนายค่าผลลัพธ์ของ Take a look at Set

Output :

แสดงค่า R² และ MSE สำหรับทั้ง Coaching และ Take a look at Set:
r2_train_LR และ mse_train สำหรับ Coaching Set
r2_linear และ mse_linear สำหรับ Take a look at Set

# Initialize and practice the linear regression mannequin
linear_model= LinearRegression()
linear_model.match(X_train, y_train)# Calculate predictions for the coaching set
y_pred_train = linear_model.predict(X_train)
# Consider the mannequin on the coaching set
mse_train = mean_squared_error(y_train, y_pred_train)
r2_train_LR = r2_score(y_train, y_pred_train)
# Make predictions on the take a look at set
y_pred_linear = linear_model.predict(X_test)
# Consider the linear regression mannequin
mse_linear = mean_squared_error(y_test, y_pred_linear)
r2_linear = r2_score(y_test, y_pred_linear)
# Print scores
print(f"Linear Regression - Coaching Set R-squared: {r2_train_LR}")
print(f"Linear Regression - Take a look at Set R-squared: {r2_linear}")
print(f"Linear Regression - Coaching Set Imply Squared Error: {mse_train}")
print(f"Linear Regression - Take a look at Set Imply Squared Error: {mse_linear}")

End result : Pattern Linear Regression Linear Regression - Coaching Set R-squared: 0.5424295157336835
Linear Regression - Take a look at Set R-squared: 0.5579093141546283
Linear Regression - Coaching Set Imply Squared Error: 50584.56923120886
Linear Regression - Take a look at Set Imply Squared Error: 54807.136822404645

Create the residual Chart

Residual Chart คือกราฟที่ใช้ในการแสดงความแตกต่างระหว่างค่าที่ทำนายจากโมเดล (predicted worth) กับค่าจริง (precise worth) ในชุดข้อมูล ซึ่งเรียกความแตกต่างนี้ว่า Residuals หรือ ค่าผิดพลาด (error) โดยสูตรการคำนวณ residual คือ

การสร้าง Residual Chart ช่วยให้เรามองเห็นว่าโมเดลมีความแม่นยำแค่ไหน และตรวจสอบว่ามีการเบี่ยงเบนหรือความผิดพลาดที่ไม่ได้รับการแก้ไขในบางจุดหรือไม่ ถ้าค่าผิดพลาดกระจายตัวอย่างสม่ำเสมอและไม่มีรูปแบบที่ชัดเจนในกราฟ ก็แสดงว่าโมเดลทำนายได้ดี แต่ถ้ามีรูปแบบหรือเทรนบางอย่างใน residual chart เช่น ค่าผิดพลาดที่สูงหรือต่ำผิดปกติ อาจหมายความว่าโมเดลมีการผิดพลาดในบางจุด และจำเป็นต้องปรับปรุงโมเดลต่อไป

# Calculate residuals
residuals_train = y_train - y_pred_train
residuals_test = y_test - y_pred_linear# Create subplots for residuals
fig, ax = plt.subplots(2, 1, figsize=(12, 8), sharex=True)
# Plot 1: Residuals for coaching set
ax[0].scatter(y_pred_train, y_train, coloration='#20a39e', alpha=0.6, label='True values',s=40, edgecolors='black')
ax[0].plot(y_pred_train, y_pred_train, coloration='crimson', label='Predicted line')
ax[0].vlines(y_pred_train, y_pred_train, y_train, coloration='purple', linewidth=0.7, alpha=0.6, label='Residuals')
ax[0].set_title(f"Linear Regression Coaching Set: R squared = {r2_train_LR:.2f}")
ax[0].set_ylabel("True values")
ax[0].legend(loc='higher left')
ax[0].grid(False)
ax[1].scatter(y_pred_linear, y_test, coloration='#20a39e', alpha=0.6, label='True values',s=40, edgecolors='black')
ax[1].plot(y_pred_linear, y_pred_linear, coloration='crimson', label='Predicted line')
ax[1].vlines(y_pred_linear, y_pred_linear, y_test, coloration='purple', linewidth=0.7, alpha=0.6, label='Residuals')
ax[1].set_title(f"Linear Regression Take a look at Set: R squared = {r2_linear:.2f}")
ax[1].set_ylabel("True values")
ax[1].legend(loc='higher left')
ax[1].grid(False)
# Finalize and show plot
plt.tight_layout()
plt.present()

Linear Regression Finest and diploma Polynomial

Code Rationalization Linear Regression Finest and diploma Polynomial :

การกำหนด Pipeline

'poly': เพิ่มฟีเจอร์เชิงพหุนาม (polynomial options) ด้วย PolynomialFeatures()
'scaler': มาตรฐานข้อมูล (StandardScaler) เพื่อให้ข้อมูลทุกฟีเจอร์มีค่าเฉลี่ยเป็น 0 และส่วนเบี่ยงเบนมาตรฐานเป็น 1
'mannequin': ใช้โมเดล LinearRegression สำหรับการปรับเส้นตรง

การตั้งค่า Parameter Grid

param_grid ถูกใช้สำหรับการค้นหา polynomial diploma ที่เหมาะสมที่สุดในช่วง 1 ถึง 5 โดยใช้ GridSearchCV

การใช้ GridSearchCV

Cross-Validation (CV): ใช้ cv=5 เพื่อแบ่งข้อมูลฝึกออกเป็น 5 กลุ่ม และคำนวณคะแนน R² เพื่อหาค่าที่ดีที่สุด
verbose=1: แสดงข้อความสถานะระหว่างการประมวลผล

Finest Parameters and CV Rating:

best_params: คืนค่าพารามิเตอร์ที่เหมาะสมที่สุด (ค่าของ polynomial diploma)
best_r2_cv: คืนคะแนน R² จากการตรวจสอบไขว้ (CV) ที่ดีที่สุด
best_model: ได้รับโมเดลที่ดีที่สุดจาก grid_search.best_estimator_

Mannequin Analysis on Take a look at Set

การพยากรณ์ (Predictions)
y_test_pred = best_model.predict(X_test_LR)
ทำการพยากรณ์ข้อมูลทดสอบ และคำนวณค่าต่าง ๆ:
r2_test: ค่า R² ของชุดข้อมูลทดสอบ
mse_linear_test: ค่า Imply Squared Error (MSE) ของชุดข้อมูลทดสอบ

OutPut :

Polynomial Diploma ที่ดีที่สุด (best_params['poly__degree'])
คะแนน R² จากการตรวจสอบไขว้ที่ดีที่สุด (best_r2_cv)
ค่า R² และ MSE สำหรับชุดข้อมูลฝึกและทดสอบ

X_train_LR = X_train
X_test_LR = X_test# Outline a pipeline
pipeline = Pipeline([
('poly', PolynomialFeatures()),   # Add polynomial features
('scaler', StandardScaler()),     # Standardize features
('model', LinearRegression())     # Linear regression model
])
# Outline parameter grid (solely looking for polynomial diploma)
param_grid = {
'poly__degree': vary(1, 6)  # Take a look at polynomial levels from 1 to five
}
# Initialize GridSearchCV
grid_search = GridSearchCV(pipeline, param_grid, scoring='r2', cv=5, verbose=1)
# Match GridSearchCV
grid_search.match(X_train_LR, y_train)
# Get the most effective parameters and rating
best_params = grid_search.best_params_
best_r2_cv = grid_search.best_score_  # Finest CV rating
# Get the most effective mannequin
best_model = grid_search.best_estimator_
# Consider on coaching set
y_train_pred = best_model.predict(X_train_LR)
r2_train = r2_score(y_train, y_train_pred)
mse_linear_train = mean_squared_error(y_train, y_train_pred)
# Consider on take a look at set
y_test_pred = best_model.predict(X_test_LR)
r2_test = r2_score(y_test, y_test_pred)
mse_linear_test = mean_squared_error(y_test, y_test_pred)
# Output outcomes
print(f"Finest Polynomial Diploma: {best_params['poly__degree']}")
print(f"Finest Cross-Validated R^2: {best_r2_cv:.4f}")
print("--"*20)
print(f"Prepare Set R^2: {r2_train:.4f}")
print(f"Take a look at Set R^2: {r2_test:.4f}")
print("--"*20)
print(f"Linear Regression - Coaching Set Imply Squared Error: {mse_linear_train}")
print(f"Linear Regression - Take a look at Set Imply Squared Error: {mse_linear_test}")

End result : Linear Regression Finest and diploma PolynomialBecoming 5 folds for every of 5 candidates, totalling 25 matches
Finest Polynomial Diploma: 2
Finest Cross-Validated R^2: 0.5533
----------------------------------------
Prepare Set R^2: 0.5763
Take a look at Set R^2: 0.5728
----------------------------------------
Linear Regression - Coaching Set Imply Squared Error: 46840.69345603836
Linear Regression - Take a look at Set Imply Squared Error: 52965.68672782048

Ridge Regression Finest and diploma Polynomial + Finest apply in alpha

Code Rationalization Ridge Regression :

Finest Parameters and CV Rating:

best_params: ค่าของ polynomial diploma ที่ให้คะแนน R² สูงที่สุดจากการตรวจสอบไขว้ (cross-validation)
best_r2_cv: ค่าคะแนน R² ที่ดีที่สุดจากการตรวจสอบไขว้ ซึ่งแสดงถึงประสิทธิภาพของโมเดลกับข้อมูลที่มองไม่เห็น

Finest Mannequin Choice:

best_model = grid_search.best_estimator_: เลือก pipeline ที่มีค่าของ polynomial diploma ที่ดีที่สุด ซึ่งผ่านการตรวจสอบแล้วว่าทำงานได้ดีที่สุด

Mannequin Analysis on Take a look at Set:

การพยากรณ์ (Predictions):
y_test_pred = best_model.predict(X_test_LR)
โมเดลที่ดีที่สุดจะถูกนำไปทำนายข้อมูลชุดทดสอบ (take a look at set)

Output:

ค่าของ polynomial diploma ที่ดีที่สุด และค่าคะแนน R² ที่สัมพันธ์กับค่า diploma นั้น
คะแนน R² และค่า Imply Squared Error (MSE) สำหรับชุดข้อมูลฝึก (coaching set) และชุดข้อมูลทดสอบ (take a look at set)

X_train_rd=X_train
X_test_rd=X_test# Outline a pipeline
pipeline_RR = Pipeline([
('poly', PolynomialFeatures()),   # Add polynomial features
('scaler', StandardScaler()),     # Standardize features
('model', Ridge())                # Ridge Regression Model
])
# Outline parameter grid (solely looking for polynomial diploma and Ridge alpha)
alpha_values_rd = np.logspace(-3, 3, 13)
param_grid_RR = {
'poly__degree': vary(1, 6),               # Search over polynomial levels
'model__alpha': alpha_values_rd              # Search over Ridge alpha values
}
# Initialize GridSearchCV
grid_search_rd = GridSearchCV(pipeline_RR, param_grid_RR, scoring='r2', cv=5, verbose=1)
# Match GridSearchCV
grid_search_rd.match(X_train_rd, y_train)
# Get the most effective parameters and rating
best_params_rd = grid_search_rd.best_params_
best_r2_cv_rd = grid_search_rd.best_score_  # Finest CV rating
# Get the most effective mannequin
best_model_rd = grid_search_rd.best_estimator_
# Consider on coaching set
y_train_pred_rd = best_model_rd.predict(X_train_rd)  # Use the right X_train (X_train_rd)
r2_train_rd = r2_score(y_train, y_train_pred_rd)
mse_linear_train_rd = mean_squared_error(y_train, y_train_pred_rd)
# Consider on take a look at set
y_test_pred_rd = best_model_rd.predict(X_test_rd)  # Use the right X_test (X_test_rd)
r2_test_rd = r2_score(y_test, y_test_pred_rd)
mse_linear_test_rd = mean_squared_error(y_test, y_test_pred_rd)
# Output outcomes
print(f"Finest Polynomial Diploma: {best_params_rd['poly__degree']}")
print(f"Finest Alpha (Ridge): {best_params_rd['model__alpha']}")
print(f"Finest Cross-Validated R^2: {best_r2_cv_rd:.4f}")
print("--"*20)
print(f"Prepare Set R^2: {r2_train_rd:.4f}")
print(f"Take a look at Set R^2: {r2_test_rd:.4f}")
print("--"*20)
print(f"Ridge - Coaching Set Imply Squared Error: {mse_linear_train_rd}")
print(f"Ridge - Take a look at Set Imply Squared Error: {mse_linear_test_rd}")

End result : Ridge Regression Finest and diploma Polynomial + Finest apply in alphaBecoming 5 folds for every of 65 candidates, totalling 325 matches
Finest Polynomial Diploma: 3
Finest Alpha (Ridge): 3.1622776601683795
Finest Cross-Validated R^2: 0.5710
----------------------------------------
Prepare Set R^2: 0.6087
Take a look at Set R^2: 0.6065
----------------------------------------
Ridge - Coaching Set Imply Squared Error: 43257.876572318484
Ridge - Take a look at Set Imply Squared Error: 48784.603014733155

Lasso Regression Finest and diploma Polynomial + Finest apply in alpha

Code Rationalization Lasso Regression :

Suppress Warnings: ใช้ warnings.filterwarnings เพื่อปิดการแจ้งเตือน ConvergenceWarning สำหรับ Lasso

Parameter Grid: กำหนดพารามิเตอร์ที่ต้องการค้นหา:

poly__degree: ระดับของ Polynomial (1-5)
model__alpha: ค่า alpha ของ Lasso (logarithmic scale, 10−310^{-3} ถึง 10310^3)

Pipeline: รวม 3 ขั้นตอน:

เพิ่ม Polynomial Options (PolynomialFeatures)
ปรับข้อมูลให้อยู่ในสเกลเดียวกัน (StandardScaler)
ใช้ Lasso Regression (Lasso)

GridSearchCV: ใช้ Cross-validation (CV=5) เพื่อค้นหาค่า diploma และ alpha ที่เหมาะสมที่สุด

Coaching: ใช้ข้อมูล X_train_LS และ y_train กับ GridSearchCV เพื่อเลือกโมเดลที่ดีที่สุด

Finest Parameters & Mannequin:

เก็บค่า poly__degree และ alpha ที่ดีที่สุดใน best_params_LS
เก็บโมเดลที่ดีที่สุดใน best_model_LS

Mannequin Analysis:

คำนวณ R² และ Imply Squared Error (MSE) สำหรับชุด Coaching และ Take a look at

Output :

Finest Polynomial Diploma
Finest Alpha สำหรับ Lasso
Cross-Validated R² ที่ดีที่สุด
R² และ MSE ของ Coaching/Take a look at Units

from sklearn.exceptions import ConvergenceWarning
import warnings# Suppress convergence warnings for Lasso
warnings.filterwarnings("ignore", class=ConvergenceWarning)
# Outline the parameter grid for each polynomial diploma and alpha
alpha_values = np.logspace(-3, 3, 13)
param_grid_LS = {
'poly__degree': [1, 2, 3, 4, 5],         # Polynomial levels to look
'model__alpha': alpha_values            # Vary of alpha values for Lasso
}
# Create a pipeline with PolynomialFeatures, StandardScaler, and Lasso
pipeline_LS = Pipeline([
('poly', PolynomialFeatures()),   # Add polynomial features
('scaler', StandardScaler()),     # Standardize features
('model', Lasso())                # Lasso Regression Model
])
# Initialize GridSearchCV
grid_search_LS = GridSearchCV(pipeline_LS, param_grid_LS, scoring='r2', cv=5, verbose=1,n_jobs=-1)
# Match GridSearchCV
grid_search_LS.match(X_train_LS, y_train)
# Get the most effective parameters and rating
best_params_LS = grid_search_LS.best_params_
best_r2_cv_LS = grid_search_LS.best_score_
# Get the most effective mannequin
best_model_LS = grid_search_LS.best_estimator_
# Consider on coaching set
y_train_pred_LS = best_model_LS.predict(X_train_LS)
r2_train_LS = r2_score(y_train, y_train_pred_LS)
mse_linear_train_LS = mean_squared_error(y_train, y_train_pred_LS)
# Consider on take a look at set
y_test_pred_LS = best_model_LS.predict(X_test_LS)
r2_test_LS = r2_score(y_test, y_test_pred_LS)
mse_linear_test_LS = mean_squared_error(y_test, y_test_pred_LS)
# Output outcomes
print(f"Finest Polynomial Diploma: {best_params_LS['poly__degree']}")
print(f"Finest Alpha (Lasso): {best_params_LS['model__alpha']}")
print(f"Finest Cross-Validated R^2: {best_r2_cv_LS:.4f}")
print("--" * 20)
print(f"Prepare Set R^2: {r2_train_LS:.4f}")
print(f"Take a look at Set R^2: {r2_test_LS:.4f}")
print("--" * 20)
print(f"Lasso - Coaching Set Imply Squared Error: {mse_linear_train_LS}")
print(f"Lasso - Take a look at Set Imply Squared Error: {mse_linear_test_LS}")

End result : Lasso Regression Finest and diploma Polynomial + Finest apply in alphaBecoming 5 folds for every of 65 candidates, totalling 325 matches
Finest Polynomial Diploma: 3
Finest Alpha (Lasso): 0.31622776601683794
Finest Cross-Validated R^2: 0.5745
----------------------------------------
Prepare Set R^2: 0.6030
Take a look at Set R^2: 0.6106
----------------------------------------
Lasso - Coaching Set Imply Squared Error: 43893.6151843199
Lasso - Take a look at Set Imply Squared Error: 48272.49323282247

Elastic Web Regression Finest and diploma Polynomial + Finest apply in alpha and L1

Code Rationalization Code Elastic Web Regression:

Pipeline:

ใช้ PolynomialFeatures เพื่อเพิ่ม Characteristic แบบ Polynomial
ใช้ StandardScaler เพื่อปรับ Characteristic ให้อยู่ในสเกลเดียวกัน
ใช้ ElasticNet เป็นโมเดลหลัก พร้อมตั้งค่า max_iter=10000

Parameter Grid:

poly__degree: ระดับ Polynomial (1-3)
elasticnet__alpha: ค่า Alpha (Regularization Power, 10−410^{-4}10−4 ถึง 10110^1101)
elasticnet__l1_ratio: อัตราส่วนระหว่าง L1 และ L2 Regularization (0.1-0.9)

RandomizedSearchCV:

ใช้สุ่มค่าพารามิเตอร์ (n_iter=20) เพื่อลดเวลาในการค้นหา
ใช้ Cross-validation (CV=5) เพื่อเลือกพารามิเตอร์ที่ดีที่สุด

Coaching:ใช้ข้อมูล X_train_EN และ y_train เพื่อค้นหาโมเดล ElasticNet ที่เหมาะสมที่สุด

Finest Parameters:

ค่า Polynomial Diploma, Alpha, และ l1_ratio ที่ดีที่สุดเก็บใน best_params_EN
คะแนน R² สูงสุดจาก Cross-validation เก็บใน best_r2_cv_EN

Mannequin Analysis:

คำนวณ R² และ Imply Squared Error (MSE) สำหรับ Coaching Set และ Take a look at Set

Output:

Finest Polynomial Diploma
Finest Alpha และ l1_ratio ของ ElasticNet
Finest Cross-Validated R²
R² และ MSE สำหรับ Coaching และ Take a look at Set

# Outline pipeline
pipeline_EN = Pipeline([
('poly', PolynomialFeatures()),
('scaler', StandardScaler()),
('elasticnet', ElasticNet(max_iter=10000))  # ElasticNet model
])# Diminished hyperparameter values
alpha_values_EN = np.logspace(-4, 1, 6)
l1_ratios = np.linspace(0.1, 0.9, 3)
param_grid_EN = {
'poly__degree': vary(1, 4),
'elasticnet__alpha': alpha_values_EN,
'elasticnet__l1_ratio': l1_ratios
}
# Initialize RandomizedSearchCV with fewer iterations
random_search_EN = RandomizedSearchCV(pipeline_EN, param_distributions=param_grid_EN,
n_iter=20, scoring='r2', cv=5, verbose=1, n_jobs=-1, random_state=42)
# Match RandomizedSearchCV
random_search_EN.match(X_train_EN, y_train)
# Get the most effective parameters and rating
best_params_EN = random_search_EN.best_params_
best_r2_cv_EN = random_search_EN.best_score_  # Finest CV rating
# Get the most effective mannequin
best_model_EN = random_search_EN.best_estimator_
# Consider on coaching set
y_train_pred_EN = best_model_EN.predict(X_train_EN)
r2_train_EN = r2_score(y_train, y_train_pred_EN)
mse_train_EN = mean_squared_error(y_train, y_train_pred_EN)
# Consider on take a look at set
y_test_pred_EN = best_model_EN.predict(X_test_EN)
r2_test_EN = r2_score(y_test, y_test_pred_EN)
mse_test_EN = mean_squared_error(y_test, y_test_pred_EN)
# Output outcomes
print(f"Finest Polynomial Diploma: {best_params_EN['poly__degree']}")
print(f"Finest Alpha (ElasticNet): {best_params_EN['elasticnet__alpha']}")
print(f"Finest l1_ratio (ElasticNet): {best_params_EN['elasticnet__l1_ratio']}")
print(f"Finest Cross-Validated R^2: {best_r2_cv_EN:.4f}")
print("--" * 20)
print(f"Prepare Set R^2: {r2_train_EN:.4f}")
print(f"Take a look at Set R^2: {r2_test_EN:.4f}")
print("--" * 20)
print(f"ElasticNet - Coaching Set Imply Squared Error: {mse_train_EN}")
print(f"ElasticNet - Take a look at Set Imply Squared Error: {mse_test_EN}")

End result : Lasso Regression Finest and diploma Polynomial + Finest apply in alphaBecoming 5 folds for every of 20 candidates, totalling 100 matches
Finest Polynomial Diploma: 3
Finest Alpha (ElasticNet): 0.01
Finest l1_ratio (ElasticNet): 0.9
Finest Cross-Validated R^2: 0.5708
----------------------------------------
Prepare Set R^2: 0.6104
Take a look at Set R^2: 0.6087
----------------------------------------
ElasticNet - Coaching Set Imply Squared Error: 43066.133192243346
ElasticNet - Take a look at Set Imply Squared Error: 48513.001244107516

6 ) Analysis (Discover Finest Mannequin)

หลังจากทำการสร้างโมเดล Regression มาหลายรูปแบบแล้ว จะมาลองวัดผลโดยเอา โมเดลมาเทียบกัน

model_results = []# Linear Regression
model_results.append({
'mannequin': 'Linear Regression',
'mse_train': mse_linear_train,
'mse_test': mse_linear_test,
'r2_train': r2_train,
'r2_test': r2_test
})
# Ridge Regression
model_results.append({
'mannequin': 'Ridge Regression',
'mse_train': mse_linear_train_rd,
'mse_test': mse_linear_test_rd,
'r2_train': r2_train_rd,
'r2_test': r2_test_rd
})
# Lasso Regression
model_results.append({
'mannequin': 'Lasso Regression',
'mse_train': mse_linear_train_LS,
'mse_test': mse_linear_test_LS,
'r2_train': r2_train_LS,
'r2_test': r2_test_LS
})
# Elastic Web Regression
model_results.append({
'mannequin': 'Elastic Web Regression',
'mse_train': mse_train_EN,
'mse_test': mse_test_EN,
'r2_train': r2_train_EN,
'r2_test': r2_test_EN
})
results_df = pd.DataFrame(model_results)
print(results_df)


End result
mannequin     mse_train      mse_test  r2_train   r2_test
0       Linear Regression  46840.693456  52965.686728  0.576295  0.572763
1        Ridge Regression  43257.876572  48784.603015  0.608704  0.606489
2        Lasso Regression  43893.615184  48272.493233  0.602954  0.610620
3  Elastic Web Regression  43066.133192  48513.001244  0.610439  0.608680

7 ) Conclusion

วิเคราะห์จากค่า R² และ MSE

Linear Regression + Polynomial : R2=0.572R² = 0.572R2=0.572: โมเดลสามารถอธิบายความแปรปรวนของข้อมูลได้เพียง 57.2% ซึ่งยังไม่เพียงพอสำหรับการพยากรณ์ที่แม่นยำในบริบทที่ซับซ้อน MSE=52965.68: ค่าความคลาดเคลื่อนเฉลี่ยยังคงสูง แสดงว่ามีข้อจำกัดในความแม่นยำ
Ridge, Lasso, Elastic Web Regression: มีค่า R2R²R2 สูงกว่า Linear Regression เล็กน้อย (ประมาณ 0.606–0.61) และค่าความคลาดเคลื่อน (MSE) ลดลง
Lasso Regression มีประสิทธิภาพดีที่สุดในที่นี้ โดยมี R2=0.61R² = 0.61R2=0.61 และ MSE=48272.49 = 48272.49

สรุปเบื้องต้น: Lasso Regression เป็นโมเดลที่ดีที่สุดในที่นี้ โดยให้ค่าความแม่นยำที่สูงกว่าเล็กน้อยและค่าความคลาดเคลื่อนต่ำที่สุด

Source link

Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025

Why PDF Extraction Still Feels LikeHack

🚗 Predicting Car Purchase Amounts with Neural Networks in Keras (with Code & Dataset) | by Smruti Ranjan Nayak | Jul, 2025

Cuba’s Energy Crisis: A Systemic Breakdown

I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

Amazon and eBay to pay ‘fair share’ for e-waste recycling

Artificial Intelligence Concerns & Predictions For 2025

Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

Most Popular

How to Turn Your Side Hustle Into a 6-Figure Business

How to get people to eat more veggies? Add meat

Friends’ Garage Side Hustle Earned $220,000 in Just 3 Months

Our Picks