สร้างโมเดลเพิ่มความแม่นยำในการพยากรณ์ยอดการใช้จ่าย: จากการถดถอยเชิงเส้นสู่เทคนิคการถดถอยขั้นสูง (Enhancing Spending Predictions: From Linear to Advanced Regression) | by Data Project 67

สรุปสิ่งที่พบในตัวแปรหลังการทำ EDAจาก Chart ด้านบน

ความสัมพันธ์ระหว่าง Earnings และ Household Measurement
จากการตรวจสอบพบว่า มีความสัมพันธ์ในเชิงลบ (Unfavourable Correlation) คือ ยิ่งรายได้ (Earnings) สูงขึ้น กลับพบว่าขนาดของครอบครัว (Household Measurement) จะมีแนวโน้มลดลง นี่อาจสะท้อนถึงแนวโน้มของคนที่มีรายได้สูงที่อาจเลือกที่จะมีครอบครัวขนาดเล็กหรือไม่มีลูก ซึ่งอาจมีผลต่อพฤติกรรมการใช้จ่าย เช่น การซื้อไวน์ในบางกรณี
ผลกระทบของ Household Measurement ต่อการซื้อไวน์
จาก Bar plot พบว่า มีแนวโน้มที่คนในครอบครัวใหญ่จะซื้อไวน์น้อยลง โดยอาจเป็นเพราะครอบครัวใหญ่มีค่าใช้จ่ายอื่น ๆ ที่จำเป็นมากขึ้น เช่น ค่าเลี้ยงดูเด็ก หรือค่าใช้จ่ายที่เกี่ยวข้องกับสมาชิกในครอบครัว ดังนั้นพวกเขาจึงมีแนวโน้มที่จะใช้จ่ายกับสินค้าฟุ่มเฟือยอย่างไวน์น้อยลง
ความสัมพันธ์ระหว่างอายุและการใช้จ่ายในไวน์
จาก Violin plot พบว่าผู้ที่มียอดใช้จ่ายสูงสุดในไวน์มักจะเป็น กลุ่มคนสูงอายุและเด็ก ขณะที่ วัยกลางคน มีแนวโน้มที่จะซื้อไวน์น้อยที่สุด ซึ่งอาจสะท้อนถึงความชื่นชอบในไวน์ที่แตกต่างกันตามช่วงอายุ เช่น คนสูงอายุอาจมองว่าไวน์เป็นเครื่องดื่มที่เหมาะสมสำหรับการพบปะสังสรรค์หรืองานเฉลิมฉลอง ส่วนเด็กอาจมีการใช้จ่ายในไวน์จากการเป็นส่วนหนึ่งของกิจกรรมต่าง ๆ ที่มีผู้ใหญ่เป็นส่วนใหญ่
Correlation Matrix
ผลการวิเคราะห์ Correlation Matrix แสดงให้เห็นว่า Earnings มีความสัมพันธ์เชิงบวกสูงกับการใช้จ่ายในไวน์ (Corr = 0.73) ซึ่งหมายความว่า ยิ่งมีรายได้สูงขึ้น ยิ่งมีแนวโน้มใช้จ่ายกับไวน์มากขึ้น ในขณะที่ตัวแปรอื่น ๆ ส่วนใหญ่แสดงค่าความสัมพันธ์ในเชิงลบ (Unfavourable Correlation) ซึ่งสะท้อนว่าเมื่อค่าหนึ่งเพิ่มขึ้น อีกค่าหนึ่งจะลดลง

การสรุปผล EDA
จากการวิเคราะห์ข้อมูลเบื้องต้นพบว่า ข้อมูลมีแนวโน้มที่เกี่ยวข้องกับการทำนายการใช้จ่ายในไวน์ โดยเฉพาะในกรณีของ Earnings ที่มีความสัมพันธ์เชิงบวกกับการใช้จ่ายในไวน์ และตัวแปรบางตัวที่มีความสัมพันธ์เชิงลบกับยอดใช้จ่ายในไวน์ อย่างไรก็ตามบางตัวแปร เช่น Household Measurement อาจไม่ส่งผลต่อการทำนายการซื้อไวน์ได้อย่างมีนัยสำคัญ เพราะอาจมีความสัมพันธ์ในเชิงลบที่ไม่ตรงกับพฤติกรรมการซื้อไวน์

คำถามที่น่าสนใจ
จากการวิเคราะห์นี้ มีคำถามที่น่าสนใจคือ การใช้ข้อมูลที่มีความสัมพันธ์เชิงลบกับการซื้อไวน์ในการสร้างโมเดล Regression อาจจะไม่เหมาะสม หรือ จะมีวิธีการอื่นที่สามารถปรับปรุงโมเดลให้ดีขึ้นได้หรือไม่? เนื่องจากบางตัวแปรอาจไม่ส่งผลต่อการทำนายการใช้จ่ายในไวน์อย่างมีนัยสำคัญ ดังนั้น การเลือกตัวแปรและการเตรียมข้อมูลก่อนที่จะสร้างโมเดล Regression อาจต้องพิจารณาให้รอบคอบเพื่อให้ได้ผลลัพธ์ที่มีความแม่นยำสูงสุด. ซึ่งเราจะมาคำตอบกันในการทำ Mannequin ต่อไป

4 ) Characteristic Engineering

Characteristic Choice: การเลือกเฉพาะคอลัมน์ที่มีแนวโน้มความสัมพันธ์เชิงเหตุผลเพื่อใช้ในการทำนาย แทนการใช้เฉพาะคอลัมน์ที่มีความสัมพันธ์เชิงสถิติ ซึ่งจะช่วยให้โมเดลมีประสิทธิภาพและความหมายมากขึ้น

# rename Col for simpler to beneath standing
df =df.rename(columns={"MntWines": "Wines","MntFruits":"Fruits","MntMeatProducts":"Meat","MntFishProducts":"Fish","MntSweetProducts":"Sweets","MntGoldProds":"Gold"})# Characteristic choice
# Dropping among the redundant options
to_drop = ['Marital_Status',"Dt_Customer", "Year_Birth", "ID",'Fruits', 'Meat', 'Fish', 'Sweets', 'Gold', 'NumDealsPurchases',
'NumWebPurchases', 'NumCatalogPurchases', 'NumStorePurchases','NumWebVisitsMonth', 'AcceptedCmp3', 'AcceptedCmp4', 'AcceptedCmp5',
'AcceptedCmp1', 'AcceptedCmp2', 'Complain', 'Response','Recency',
'Age_Group', 'age_seg']
df = df.drop(to_drop, axis=1)

Prepare-Take a look at Cut up: แบ่งข้อมูลเป็นชุดฝึก (practice) และชุดทดสอบ (take a look at) ในอัตราส่วน 80:20 ซึ่งเป็นการแบ่งที่เหมาะสมเพื่อให้มีข้อมูลเพียงพอในการฝึกโมเดลและทดสอบประสิทธิภาพของโมเดล

# Create Characteristic and Goal
X = ds.drop(columns=['Wines'])
y = ds['Wines']# Prepare take a look at cut up
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=52)
print(X_train.form, X_test.form, y_train.form, y_test.form)
print(X_train.columns)
print(X_test.columns)
print(y_train.identify)
print(y_test.identify)

Characteristic Transformation: ใช้การ Ordinal Encoding สำหรับคอลัมน์ [‘Education’, ‘Living_With’] เนื่องจากข้อมูลในคอลัมน์เหล่านี้มีลักษณะเป็นลำดับ (เช่น การศึกษา: diploma, grasp’s, doctoral) และจำนวนคนที่อาศัยอยู่ร่วมกัน (Residing With ; Accomplice or Alone) ซึ่งสามารถจัดเป็นลำดับได้ โดยเปลี่ยนค่าเป็น 1, 2 ตามลำดับ

# Ordinal encoding
column_transformer = ColumnTransformer(
[('encoder', OrdinalEncoder(), ['Education', 'Living_With'])],
the rest='passthrough')X_train = column_transformer.fit_transform(X_train)
X_test = column_transformer.rework(X_test)

Polynomial Options: ใช้การเพิ่มคุณสมบัติพหุนามร่วมกับการเรียนรู้เครื่อง (Machine Studying) โดยใช้ Pipeline เพื่อทำให้กระบวนการมีประสิทธิภาพและง่ายต่อการจัดการ โดยจะปรากฎในส่วนของ Half ต่อไปครับ

5 ) การสร้างโมเดลทำนาย (Construct Mannequin Machine Studying)

วันนี้เราจะมาทำโมเดล Machine Studying (ML) ด้วยการใช้หลายประเภทของ Regression ได้แก่ Ridge Regression, Lasso Regression, ElasticNet Regression, และ Easy Linear Regression โมเดลเหล่านี้ใช้สำหรับการทำนายค่าตัวแปรเชิงปริมาณ (Steady variable) ตามที่กล่าวไปด้านบน

สื่งหนึ่งที่ควรรู้ก่อนการสร้างโมเดล หากสร้างแล้วโมเดลจะ ส่งผลอะไรมาบ้าง ? และ ใช้ Analysis ยังไง ?

การประเมินประสิทธิภาพของโมเดลการทำนายสามารถทำได้โดยใช้ R² (R-squared) และ MSE (Imply Squared Error)

R² (R-squared): คือค่าอธิบายความสามารถของโมเดลในการอธิบายความแปรปรวนของข้อมูล ตัวเลขนี้อยู่ระหว่าง 0 ถึง 1 หากค่าใกล้ 1 หมายความว่าโมเดลสามารถอธิบายข้อมูลได้ดีมาก

การตีความ: ค่า R² สูงแสดงว่าโมเดลสามารถทำนายได้ดี แต่หากค่า R² ต่ำแสดงว่าโมเดลอาจไม่เหมาะสมกับข้อมูล

MSE (Imply Squared Error): คือการคำนวณค่าเฉลี่ยของค่าผิดพลาด (errors) ที่ถูกยกกำลังสอง ซึ่งหมายความว่าโมเดลที่มี MSE ต่ำจะมีความแม่นยำสูง

การตีความ: MSE ต่ำแสดงว่าโมเดลทำนายได้แม่นยำ ขณะที่ MSE สูงแสดงว่าโมเดลมีค่าผิดพลาดมาก

การใช้ R² และ MSE ช่วยให้เราสามารถตัดสินใจได้ว่าโมเดลไหนเหมาะสมที่สุดในการนำไปใช้งานจริง

Step to Construct ML with Regression & Comparability of all finest apply Mannequin

Pattern Linear Regression

Code Rationalization (Linear Regression)

Mannequin Initialization and Coaching:

linear_model = LinearRegression(): สร้างโมเดล Linear Regression
linear_model.match(X_train, y_train): ฝึกโมเดลด้วย Coaching Set (X_train, y_train)

Coaching Set Prediction:

y_pred_train = linear_model.predict(X_train): ทำนายค่าผลลัพธ์ของ Coaching Set

Take a look at Set Prediction:

y_pred_linear = linear_model.predict(X_test): ทำนายค่าผลลัพธ์ของ Take a look at Set

Output :

แสดงค่า R² และ MSE สำหรับทั้ง Coaching และ Take a look at Set:
r2_train_LR และ mse_train สำหรับ Coaching Set
r2_linear และ mse_linear สำหรับ Take a look at Set

# Initialize and practice the linear regression mannequin
linear_model= LinearRegression()
linear_model.match(X_train, y_train)# Calculate predictions for the coaching set
y_pred_train = linear_model.predict(X_train)
# Consider the mannequin on the coaching set
mse_train = mean_squared_error(y_train, y_pred_train)
r2_train_LR = r2_score(y_train, y_pred_train)
# Make predictions on the take a look at set
y_pred_linear = linear_model.predict(X_test)
# Consider the linear regression mannequin
mse_linear = mean_squared_error(y_test, y_pred_linear)
r2_linear = r2_score(y_test, y_pred_linear)
# Print scores
print(f"Linear Regression - Coaching Set R-squared: {r2_train_LR}")
print(f"Linear Regression - Take a look at Set R-squared: {r2_linear}")
print(f"Linear Regression - Coaching Set Imply Squared Error: {mse_train}")
print(f"Linear Regression - Take a look at Set Imply Squared Error: {mse_linear}")

End result : Pattern Linear Regression Linear Regression - Coaching Set R-squared: 0.5424295157336835
Linear Regression - Take a look at Set R-squared: 0.5579093141546283
Linear Regression - Coaching Set Imply Squared Error: 50584.56923120886
Linear Regression - Take a look at Set Imply Squared Error: 54807.136822404645

Create the residual Chart

Residual Chart คือกราฟที่ใช้ในการแสดงความแตกต่างระหว่างค่าที่ทำนายจากโมเดล (predicted worth) กับค่าจริง (precise worth) ในชุดข้อมูล ซึ่งเรียกความแตกต่างนี้ว่า Residuals หรือ ค่าผิดพลาด (error) โดยสูตรการคำนวณ residual คือ

การสร้าง Residual Chart ช่วยให้เรามองเห็นว่าโมเดลมีความแม่นยำแค่ไหน และตรวจสอบว่ามีการเบี่ยงเบนหรือความผิดพลาดที่ไม่ได้รับการแก้ไขในบางจุดหรือไม่ ถ้าค่าผิดพลาดกระจายตัวอย่างสม่ำเสมอและไม่มีรูปแบบที่ชัดเจนในกราฟ ก็แสดงว่าโมเดลทำนายได้ดี แต่ถ้ามีรูปแบบหรือเทรนบางอย่างใน residual chart เช่น ค่าผิดพลาดที่สูงหรือต่ำผิดปกติ อาจหมายความว่าโมเดลมีการผิดพลาดในบางจุด และจำเป็นต้องปรับปรุงโมเดลต่อไป

# Calculate residuals
residuals_train = y_train - y_pred_train
residuals_test = y_test - y_pred_linear# Create subplots for residuals
fig, ax = plt.subplots(2, 1, figsize=(12, 8), sharex=True)
# Plot 1: Residuals for coaching set
ax[0].scatter(y_pred_train, y_train, coloration='#20a39e', alpha=0.6, label='True values',s=40, edgecolors='black')
ax[0].plot(y_pred_train, y_pred_train, coloration='crimson', label='Predicted line')
ax[0].vlines(y_pred_train, y_pred_train, y_train, coloration='purple', linewidth=0.7, alpha=0.6, label='Residuals')
ax[0].set_title(f"Linear Regression Coaching Set: R squared = {r2_train_LR:.2f}")
ax[0].set_ylabel("True values")
ax[0].legend(loc='higher left')
ax[0].grid(False)
ax[1].scatter(y_pred_linear, y_test, coloration='#20a39e', alpha=0.6, label='True values',s=40, edgecolors='black')
ax[1].plot(y_pred_linear, y_pred_linear, coloration='crimson', label='Predicted line')
ax[1].vlines(y_pred_linear, y_pred_linear, y_test, coloration='purple', linewidth=0.7, alpha=0.6, label='Residuals')
ax[1].set_title(f"Linear Regression Take a look at Set: R squared = {r2_linear:.2f}")
ax[1].set_ylabel("True values")
ax[1].legend(loc='higher left')
ax[1].grid(False)
# Finalize and show plot
plt.tight_layout()
plt.present()

Linear Regression Finest and diploma Polynomial

Code Rationalization Linear Regression Finest and diploma Polynomial :

การกำหนด Pipeline

'poly': เพิ่มฟีเจอร์เชิงพหุนาม (polynomial options) ด้วย PolynomialFeatures()
'scaler': มาตรฐานข้อมูล (StandardScaler) เพื่อให้ข้อมูลทุกฟีเจอร์มีค่าเฉลี่ยเป็น 0 และส่วนเบี่ยงเบนมาตรฐานเป็น 1
'mannequin': ใช้โมเดล LinearRegression สำหรับการปรับเส้นตรง

การตั้งค่า Parameter Grid

param_grid ถูกใช้สำหรับการค้นหา polynomial diploma ที่เหมาะสมที่สุดในช่วง 1 ถึง 5 โดยใช้ GridSearchCV

การใช้ GridSearchCV

Cross-Validation (CV): ใช้ cv=5 เพื่อแบ่งข้อมูลฝึกออกเป็น 5 กลุ่ม และคำนวณคะแนน R² เพื่อหาค่าที่ดีที่สุด
verbose=1: แสดงข้อความสถานะระหว่างการประมวลผล

Finest Parameters and CV Rating:

best_params: คืนค่าพารามิเตอร์ที่เหมาะสมที่สุด (ค่าของ polynomial diploma)
best_r2_cv: คืนคะแนน R² จากการตรวจสอบไขว้ (CV) ที่ดีที่สุด
best_model: ได้รับโมเดลที่ดีที่สุดจาก grid_search.best_estimator_

Mannequin Analysis on Take a look at Set

การพยากรณ์ (Predictions)
y_test_pred = best_model.predict(X_test_LR)
ทำการพยากรณ์ข้อมูลทดสอบ และคำนวณค่าต่าง ๆ:
r2_test: ค่า R² ของชุดข้อมูลทดสอบ
mse_linear_test: ค่า Imply Squared Error (MSE) ของชุดข้อมูลทดสอบ

OutPut :

Polynomial Diploma ที่ดีที่สุด (best_params['poly__degree'])
คะแนน R² จากการตรวจสอบไขว้ที่ดีที่สุด (best_r2_cv)
ค่า R² และ MSE สำหรับชุดข้อมูลฝึกและทดสอบ

X_train_LR = X_train
X_test_LR = X_test# Outline a pipeline
pipeline = Pipeline([
('poly', PolynomialFeatures()),   # Add polynomial features
('scaler', StandardScaler()),     # Standardize features
('model', LinearRegression())     # Linear regression model
])
# Outline parameter grid (solely looking for polynomial diploma)
param_grid = {
'poly__degree': vary(1, 6)  # Take a look at polynomial levels from 1 to five
}
# Initialize GridSearchCV
grid_search = GridSearchCV(pipeline, param_grid, scoring='r2', cv=5, verbose=1)
# Match GridSearchCV
grid_search.match(X_train_LR, y_train)
# Get the most effective parameters and rating
best_params = grid_search.best_params_
best_r2_cv = grid_search.best_score_  # Finest CV rating
# Get the most effective mannequin
best_model = grid_search.best_estimator_
# Consider on coaching set
y_train_pred = best_model.predict(X_train_LR)
r2_train = r2_score(y_train, y_train_pred)
mse_linear_train = mean_squared_error(y_train, y_train_pred)
# Consider on take a look at set
y_test_pred = best_model.predict(X_test_LR)
r2_test = r2_score(y_test, y_test_pred)
mse_linear_test = mean_squared_error(y_test, y_test_pred)
# Output outcomes
print(f"Finest Polynomial Diploma: {best_params['poly__degree']}")
print(f"Finest Cross-Validated R^2: {best_r2_cv:.4f}")
print("--"*20)
print(f"Prepare Set R^2: {r2_train:.4f}")
print(f"Take a look at Set R^2: {r2_test:.4f}")
print("--"*20)
print(f"Linear Regression - Coaching Set Imply Squared Error: {mse_linear_train}")
print(f"Linear Regression - Take a look at Set Imply Squared Error: {mse_linear_test}")

End result : Linear Regression Finest and diploma PolynomialBecoming 5 folds for every of 5 candidates, totalling 25 matches
Finest Polynomial Diploma: 2
Finest Cross-Validated R^2: 0.5533
----------------------------------------
Prepare Set R^2: 0.5763
Take a look at Set R^2: 0.5728
----------------------------------------
Linear Regression - Coaching Set Imply Squared Error: 46840.69345603836
Linear Regression - Take a look at Set Imply Squared Error: 52965.68672782048

Ridge Regression Finest and diploma Polynomial + Finest apply in alpha

Code Rationalization Ridge Regression :

Finest Parameters and CV Rating:

best_params: ค่าของ polynomial diploma ที่ให้คะแนน R² สูงที่สุดจากการตรวจสอบไขว้ (cross-validation)
best_r2_cv: ค่าคะแนน R² ที่ดีที่สุดจากการตรวจสอบไขว้ ซึ่งแสดงถึงประสิทธิภาพของโมเดลกับข้อมูลที่มองไม่เห็น

Finest Mannequin Choice:

best_model = grid_search.best_estimator_: เลือก pipeline ที่มีค่าของ polynomial diploma ที่ดีที่สุด ซึ่งผ่านการตรวจสอบแล้วว่าทำงานได้ดีที่สุด

Mannequin Analysis on Take a look at Set:

การพยากรณ์ (Predictions):
y_test_pred = best_model.predict(X_test_LR)
โมเดลที่ดีที่สุดจะถูกนำไปทำนายข้อมูลชุดทดสอบ (take a look at set)

Output:

ค่าของ polynomial diploma ที่ดีที่สุด และค่าคะแนน R² ที่สัมพันธ์กับค่า diploma นั้น
คะแนน R² และค่า Imply Squared Error (MSE) สำหรับชุดข้อมูลฝึก (coaching set) และชุดข้อมูลทดสอบ (take a look at set)

X_train_rd=X_train
X_test_rd=X_test# Outline a pipeline
pipeline_RR = Pipeline([
('poly', PolynomialFeatures()),   # Add polynomial features
('scaler', StandardScaler()),     # Standardize features
('model', Ridge())                # Ridge Regression Model
])
# Outline parameter grid (solely looking for polynomial diploma and Ridge alpha)
alpha_values_rd = np.logspace(-3, 3, 13)
param_grid_RR = {
'poly__degree': vary(1, 6),               # Search over polynomial levels
'model__alpha': alpha_values_rd              # Search over Ridge alpha values
}
# Initialize GridSearchCV
grid_search_rd = GridSearchCV(pipeline_RR, param_grid_RR, scoring='r2', cv=5, verbose=1)
# Match GridSearchCV
grid_search_rd.match(X_train_rd, y_train)
# Get the most effective parameters and rating
best_params_rd = grid_search_rd.best_params_
best_r2_cv_rd = grid_search_rd.best_score_  # Finest CV rating
# Get the most effective mannequin
best_model_rd = grid_search_rd.best_estimator_
# Consider on coaching set
y_train_pred_rd = best_model_rd.predict(X_train_rd)  # Use the right X_train (X_train_rd)
r2_train_rd = r2_score(y_train, y_train_pred_rd)
mse_linear_train_rd = mean_squared_error(y_train, y_train_pred_rd)
# Consider on take a look at set
y_test_pred_rd = best_model_rd.predict(X_test_rd)  # Use the right X_test (X_test_rd)
r2_test_rd = r2_score(y_test, y_test_pred_rd)
mse_linear_test_rd = mean_squared_error(y_test, y_test_pred_rd)
# Output outcomes
print(f"Finest Polynomial Diploma: {best_params_rd['poly__degree']}")
print(f"Finest Alpha (Ridge): {best_params_rd['model__alpha']}")
print(f"Finest Cross-Validated R^2: {best_r2_cv_rd:.4f}")
print("--"*20)
print(f"Prepare Set R^2: {r2_train_rd:.4f}")
print(f"Take a look at Set R^2: {r2_test_rd:.4f}")
print("--"*20)
print(f"Ridge - Coaching Set Imply Squared Error: {mse_linear_train_rd}")
print(f"Ridge - Take a look at Set Imply Squared Error: {mse_linear_test_rd}")

End result : Ridge Regression Finest and diploma Polynomial + Finest apply in alphaBecoming 5 folds for every of 65 candidates, totalling 325 matches
Finest Polynomial Diploma: 3
Finest Alpha (Ridge): 3.1622776601683795
Finest Cross-Validated R^2: 0.5710
----------------------------------------
Prepare Set R^2: 0.6087
Take a look at Set R^2: 0.6065
----------------------------------------
Ridge - Coaching Set Imply Squared Error: 43257.876572318484
Ridge - Take a look at Set Imply Squared Error: 48784.603014733155

Lasso Regression Finest and diploma Polynomial + Finest apply in alpha

Code Rationalization Lasso Regression :

Suppress Warnings: ใช้ warnings.filterwarnings เพื่อปิดการแจ้งเตือน ConvergenceWarning สำหรับ Lasso

Parameter Grid: กำหนดพารามิเตอร์ที่ต้องการค้นหา:

poly__degree: ระดับของ Polynomial (1-5)
model__alpha: ค่า alpha ของ Lasso (logarithmic scale, 10−310^{-3} ถึง 10310^3)

Pipeline: รวม 3 ขั้นตอน:

เพิ่ม Polynomial Options (PolynomialFeatures)
ปรับข้อมูลให้อยู่ในสเกลเดียวกัน (StandardScaler)
ใช้ Lasso Regression (Lasso)

GridSearchCV: ใช้ Cross-validation (CV=5) เพื่อค้นหาค่า diploma และ alpha ที่เหมาะสมที่สุด

Coaching: ใช้ข้อมูล X_train_LS และ y_train กับ GridSearchCV เพื่อเลือกโมเดลที่ดีที่สุด

Finest Parameters & Mannequin:

เก็บค่า poly__degree และ alpha ที่ดีที่สุดใน best_params_LS
เก็บโมเดลที่ดีที่สุดใน best_model_LS

Mannequin Analysis:

คำนวณ R² และ Imply Squared Error (MSE) สำหรับชุด Coaching และ Take a look at

Output :

Finest Polynomial Diploma
Finest Alpha สำหรับ Lasso
Cross-Validated R² ที่ดีที่สุด
R² และ MSE ของ Coaching/Take a look at Units

from sklearn.exceptions import ConvergenceWarning
import warnings# Suppress convergence warnings for Lasso
warnings.filterwarnings("ignore", class=ConvergenceWarning)
# Outline the parameter grid for each polynomial diploma and alpha
alpha_values = np.logspace(-3, 3, 13)
param_grid_LS = {
'poly__degree': [1, 2, 3, 4, 5],         # Polynomial levels to look
'model__alpha': alpha_values            # Vary of alpha values for Lasso
}
# Create a pipeline with PolynomialFeatures, StandardScaler, and Lasso
pipeline_LS = Pipeline([
('poly', PolynomialFeatures()),   # Add polynomial features
('scaler', StandardScaler()),     # Standardize features
('model', Lasso())                # Lasso Regression Model
])
# Initialize GridSearchCV
grid_search_LS = GridSearchCV(pipeline_LS, param_grid_LS, scoring='r2', cv=5, verbose=1,n_jobs=-1)
# Match GridSearchCV
grid_search_LS.match(X_train_LS, y_train)
# Get the most effective parameters and rating
best_params_LS = grid_search_LS.best_params_
best_r2_cv_LS = grid_search_LS.best_score_
# Get the most effective mannequin
best_model_LS = grid_search_LS.best_estimator_
# Consider on coaching set
y_train_pred_LS = best_model_LS.predict(X_train_LS)
r2_train_LS = r2_score(y_train, y_train_pred_LS)
mse_linear_train_LS = mean_squared_error(y_train, y_train_pred_LS)
# Consider on take a look at set
y_test_pred_LS = best_model_LS.predict(X_test_LS)
r2_test_LS = r2_score(y_test, y_test_pred_LS)
mse_linear_test_LS = mean_squared_error(y_test, y_test_pred_LS)
# Output outcomes
print(f"Finest Polynomial Diploma: {best_params_LS['poly__degree']}")
print(f"Finest Alpha (Lasso): {best_params_LS['model__alpha']}")
print(f"Finest Cross-Validated R^2: {best_r2_cv_LS:.4f}")
print("--" * 20)
print(f"Prepare Set R^2: {r2_train_LS:.4f}")
print(f"Take a look at Set R^2: {r2_test_LS:.4f}")
print("--" * 20)
print(f"Lasso - Coaching Set Imply Squared Error: {mse_linear_train_LS}")
print(f"Lasso - Take a look at Set Imply Squared Error: {mse_linear_test_LS}")

End result : Lasso Regression Finest and diploma Polynomial + Finest apply in alphaBecoming 5 folds for every of 65 candidates, totalling 325 matches
Finest Polynomial Diploma: 3
Finest Alpha (Lasso): 0.31622776601683794
Finest Cross-Validated R^2: 0.5745
----------------------------------------
Prepare Set R^2: 0.6030
Take a look at Set R^2: 0.6106
----------------------------------------
Lasso - Coaching Set Imply Squared Error: 43893.6151843199
Lasso - Take a look at Set Imply Squared Error: 48272.49323282247

Elastic Web Regression Finest and diploma Polynomial + Finest apply in alpha and L1

Code Rationalization Code Elastic Web Regression:

Pipeline:

ใช้ PolynomialFeatures เพื่อเพิ่ม Characteristic แบบ Polynomial
ใช้ StandardScaler เพื่อปรับ Characteristic ให้อยู่ในสเกลเดียวกัน
ใช้ ElasticNet เป็นโมเดลหลัก พร้อมตั้งค่า max_iter=10000

Parameter Grid:

poly__degree: ระดับ Polynomial (1-3)
elasticnet__alpha: ค่า Alpha (Regularization Power, 10−410^{-4}10−4 ถึง 10110^1101)
elasticnet__l1_ratio: อัตราส่วนระหว่าง L1 และ L2 Regularization (0.1-0.9)

RandomizedSearchCV:

ใช้สุ่มค่าพารามิเตอร์ (n_iter=20) เพื่อลดเวลาในการค้นหา
ใช้ Cross-validation (CV=5) เพื่อเลือกพารามิเตอร์ที่ดีที่สุด

Coaching:ใช้ข้อมูล X_train_EN และ y_train เพื่อค้นหาโมเดล ElasticNet ที่เหมาะสมที่สุด

Finest Parameters:

ค่า Polynomial Diploma, Alpha, และ l1_ratio ที่ดีที่สุดเก็บใน best_params_EN
คะแนน R² สูงสุดจาก Cross-validation เก็บใน best_r2_cv_EN

Mannequin Analysis:

คำนวณ R² และ Imply Squared Error (MSE) สำหรับ Coaching Set และ Take a look at Set

Output:

Finest Polynomial Diploma
Finest Alpha และ l1_ratio ของ ElasticNet
Finest Cross-Validated R²
R² และ MSE สำหรับ Coaching และ Take a look at Set

# Outline pipeline
pipeline_EN = Pipeline([
('poly', PolynomialFeatures()),
('scaler', StandardScaler()),
('elasticnet', ElasticNet(max_iter=10000))  # ElasticNet model
])# Diminished hyperparameter values
alpha_values_EN = np.logspace(-4, 1, 6)
l1_ratios = np.linspace(0.1, 0.9, 3)
param_grid_EN = {
'poly__degree': vary(1, 4),
'elasticnet__alpha': alpha_values_EN,
'elasticnet__l1_ratio': l1_ratios
}
# Initialize RandomizedSearchCV with fewer iterations
random_search_EN = RandomizedSearchCV(pipeline_EN, param_distributions=param_grid_EN,
n_iter=20, scoring='r2', cv=5, verbose=1, n_jobs=-1, random_state=42)
# Match RandomizedSearchCV
random_search_EN.match(X_train_EN, y_train)
# Get the most effective parameters and rating
best_params_EN = random_search_EN.best_params_
best_r2_cv_EN = random_search_EN.best_score_  # Finest CV rating
# Get the most effective mannequin
best_model_EN = random_search_EN.best_estimator_
# Consider on coaching set
y_train_pred_EN = best_model_EN.predict(X_train_EN)
r2_train_EN = r2_score(y_train, y_train_pred_EN)
mse_train_EN = mean_squared_error(y_train, y_train_pred_EN)
# Consider on take a look at set
y_test_pred_EN = best_model_EN.predict(X_test_EN)
r2_test_EN = r2_score(y_test, y_test_pred_EN)
mse_test_EN = mean_squared_error(y_test, y_test_pred_EN)
# Output outcomes
print(f"Finest Polynomial Diploma: {best_params_EN['poly__degree']}")
print(f"Finest Alpha (ElasticNet): {best_params_EN['elasticnet__alpha']}")
print(f"Finest l1_ratio (ElasticNet): {best_params_EN['elasticnet__l1_ratio']}")
print(f"Finest Cross-Validated R^2: {best_r2_cv_EN:.4f}")
print("--" * 20)
print(f"Prepare Set R^2: {r2_train_EN:.4f}")
print(f"Take a look at Set R^2: {r2_test_EN:.4f}")
print("--" * 20)
print(f"ElasticNet - Coaching Set Imply Squared Error: {mse_train_EN}")
print(f"ElasticNet - Take a look at Set Imply Squared Error: {mse_test_EN}")

End result : Lasso Regression Finest and diploma Polynomial + Finest apply in alphaBecoming 5 folds for every of 20 candidates, totalling 100 matches
Finest Polynomial Diploma: 3
Finest Alpha (ElasticNet): 0.01
Finest l1_ratio (ElasticNet): 0.9
Finest Cross-Validated R^2: 0.5708
----------------------------------------
Prepare Set R^2: 0.6104
Take a look at Set R^2: 0.6087
----------------------------------------
ElasticNet - Coaching Set Imply Squared Error: 43066.133192243346
ElasticNet - Take a look at Set Imply Squared Error: 48513.001244107516

6 ) Analysis (Discover Finest Mannequin)

หลังจากทำการสร้างโมเดล Regression มาหลายรูปแบบแล้ว จะมาลองวัดผลโดยเอา โมเดลมาเทียบกัน

model_results = []# Linear Regression
model_results.append({
'mannequin': 'Linear Regression',
'mse_train': mse_linear_train,
'mse_test': mse_linear_test,
'r2_train': r2_train,
'r2_test': r2_test
})
# Ridge Regression
model_results.append({
'mannequin': 'Ridge Regression',
'mse_train': mse_linear_train_rd,
'mse_test': mse_linear_test_rd,
'r2_train': r2_train_rd,
'r2_test': r2_test_rd
})
# Lasso Regression
model_results.append({
'mannequin': 'Lasso Regression',
'mse_train': mse_linear_train_LS,
'mse_test': mse_linear_test_LS,
'r2_train': r2_train_LS,
'r2_test': r2_test_LS
})
# Elastic Web Regression
model_results.append({
'mannequin': 'Elastic Web Regression',
'mse_train': mse_train_EN,
'mse_test': mse_test_EN,
'r2_train': r2_train_EN,
'r2_test': r2_test_EN
})
results_df = pd.DataFrame(model_results)
print(results_df)


End result
mannequin     mse_train      mse_test  r2_train   r2_test
0       Linear Regression  46840.693456  52965.686728  0.576295  0.572763
1        Ridge Regression  43257.876572  48784.603015  0.608704  0.606489
2        Lasso Regression  43893.615184  48272.493233  0.602954  0.610620
3  Elastic Web Regression  43066.133192  48513.001244  0.610439  0.608680

7 ) Conclusion

วิเคราะห์จากค่า R² และ MSE

Linear Regression + Polynomial : R2=0.572R² = 0.572R2=0.572: โมเดลสามารถอธิบายความแปรปรวนของข้อมูลได้เพียง 57.2% ซึ่งยังไม่เพียงพอสำหรับการพยากรณ์ที่แม่นยำในบริบทที่ซับซ้อน MSE=52965.68: ค่าความคลาดเคลื่อนเฉลี่ยยังคงสูง แสดงว่ามีข้อจำกัดในความแม่นยำ
Ridge, Lasso, Elastic Web Regression: มีค่า R2R²R2 สูงกว่า Linear Regression เล็กน้อย (ประมาณ 0.606–0.61) และค่าความคลาดเคลื่อน (MSE) ลดลง
Lasso Regression มีประสิทธิภาพดีที่สุดในที่นี้ โดยมี R2=0.61R² = 0.61R2=0.61 และ MSE=48272.49 = 48272.49

สรุปเบื้องต้น: Lasso Regression เป็นโมเดลที่ดีที่สุดในที่นี้ โดยให้ค่าความแม่นยำที่สูงกว่าเล็กน้อยและค่าความคลาดเคลื่อนต่ำที่สุด

Source link

🚗 Predicting Car Purchase Amounts with Neural Networks in Keras (with Code & Dataset) | by Smruti Ranjan Nayak | Jul, 2025

Reinforcement Learning in the Age of Modern AI | by @pramodchandrayan | Jul, 2025

Finding the right tool for the job: Visual Search for 1 Million+ Products | by Elliot Ford | Kingfisher-Technology | Jul, 2025

Cloudflare will now block AI bots from crawling its clients’ websites by default

I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

Amazon and eBay to pay ‘fair share’ for e-waste recycling

Artificial Intelligence Concerns & Predictions For 2025

Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

Most Popular

How Elon Musk Uses Internet Slang to Marshal His Army of Online Fans

Uber brings forward trialling driverless taxis in UK

Features of AI Girlfriend Chatbots

Our Picks