Close Menu
    Trending
    • STOP Building Useless ML Projects – What Actually Works
    • Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025
    • The New Career Crisis: AI Is Breaking the Entry-Level Path for Gen Z
    • Musk’s X appoints ‘king of virality’ in bid to boost growth
    • Why Entrepreneurs Should Stop Obsessing Over Growth
    • Implementing IBCS rules in Power BI
    • What comes next for AI copyright lawsuits?
    • Why PDF Extraction Still Feels LikeHack
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Machine Learning»Total Energy Demand Prediction Using Machine Learning | by HalizaApalwan | Apr, 2025
    Machine Learning

    Total Energy Demand Prediction Using Machine Learning | by HalizaApalwan | Apr, 2025

    Team_AIBS NewsBy Team_AIBS NewsApril 26, 2025No Comments12 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Electrical energy demand is without doubt one of the important indicators in sustaining the steadiness of the electrical energy system. Every day fluctuations influenced by climate, seasonal elements, and human actions make whole demand prediction a important problem in supporting extra environment friendly vitality planning and administration.

    This venture implements numerous machine studying approaches to construct a complete vitality demand (MW) prediction mannequin based mostly on historic knowledge. Varied machine studying strategies are examined and their efficiency is objectively in comparison with determine probably the most correct mannequin in predicting whole demand based mostly on the traits of the out there knowledge.

    Time

    Time area represents attributes associated to time data.
    Options on this area are essential to seize seasonality patterns or time developments that may have an effect on vitality demand variations.

    Climate

    • temp2(c), temp2_max(c), temp2_min(c), temp2_ave(c): ambient temperature (in Celsius).
    • suface_pressure: floor air stress (hectopascals / Pascals).
    • wind_speed50_max(m/s) : wind velocity at a peak of fifty meters.
    • prectotcorr : that is more than likely the corrected precipitation whole (in mm or kg/m²).

    Power

    • total_demand(mw) : whole electrical energy demand in megawatts.
    • max_generation : most whole electrical energy technology in megawatts.
    df.data()

    RangeIndex: 1362 entries, 0 to 1361
    Knowledge columns (whole 16 columns):
    # Column Non-Null Depend Dtype
    --- ------ -------------- -----
    0 date_1 1362 non-null object
    1 day 1362 non-null int64
    2 month 1362 non-null int64
    3 12 months 1362 non-null int64
    4 temp2(c) 1362 non-null float64
    5 temp2_max(c) 1362 non-null float64
    6 temp2_min(c) 1362 non-null float64
    7 temp2_ave(c) 1362 non-null float64
    8 suface_pressure(pa) 1362 non-null float64
    9 wind_speed50_max(m/s) 1362 non-null float64
    10 wind_speed50_min(m/s) 1362 non-null float64
    11 wind_speed50_ave(m/s) 1362 non-null float64
    12 prectotcorr 1362 non-null float64
    13 total_demand(mw) 1343 non-null float64
    14 date_2 1362 non-null object
    15 max_generation(mw) 1339 non-null float64
    dtypes: float64(11), int64(3), object(2)
    reminiscence utilization: 170.4+ KB

    Variety of rows: 1362 entries (knowledge rows)

    Variety of columns: 15 options

    Knowledge kind:

    • int64 as many as 3 columns
    • float64 as many as 10 columns
    • object as many as 2 columns

    There’s lacking worth in total_demand(mw) and max_generation

    Checking Duplicate Knowledge, Lacking Values, and Distinctive

    print("Knowledge duplicated:", df.duplicated().sum())

    # Mengecek jumlah NaN per kolom
    print(df.isna().sum())

    # Cek persentase lacking worth hanya di kolom tertentu
    cols_to_check = ['total_demand(mw)', 'max_generation(mw)']
    missing_percent = df[cols_to_check].isna().sum() / len(df) * 100
    print(missing_percent)

    Knowledge duplicated: 0
    date_1 0
    day 0
    month 0
    12 months 0
    temp2(c) 0
    temp2_max(c) 0
    temp2_min(c) 0
    temp2_ave(c) 0
    suface_pressure(pa) 0
    wind_speed50_max(m/s) 0
    wind_speed50_min(m/s) 0
    wind_speed50_ave(m/s) 0
    prectotcorr 0
    total_demand(mw) 19
    date_2 0
    max_generation(mw) 23
    dtype: int64
    total_demand(mw) 1.395007
    max_generation(mw) 1.688693
    dtype: float64

    Deleting knowledge containing lacking values ​​as a result of the quantity just isn’t too giant (<2%)

    df = df.dropna()
    df
    print(df.isna().sum())
    print("Distinctive Worth")
    for col in df.columns:
    print(f"{col}: {df[col].nunique()} distinctive values")
    Distinctive Worth
    date_1: 1328 distinctive values
    day: 31 distinctive values
    month: 12 distinctive values
    12 months: 4 distinctive values
    temp2(c): 764 distinctive values
    temp2_max(c): 836 distinctive values
    temp2_min(c): 790 distinctive values
    temp2_ave(c): 972 distinctive values
    suface_pressure(pa): 234 distinctive values
    wind_speed50_max(m/s): 589 distinctive values
    wind_speed50_min(m/s): 504 distinctive values
    wind_speed50_ave(m/s): 798 distinctive values
    prectotcorr: 704 distinctive values
    total_demand(mw): 644 distinctive values
    date_2: 1328 distinctive values
    max_generation(mw): 1184 distinctive values
    zero_counts = (df == 0).sum()
    print(zero_counts)
    date_1                     0
    day 0
    month 0
    12 months 0
    temp2(c) 0
    temp2_max(c) 0
    temp2_min(c) 0
    temp2_ave(c) 0
    suface_pressure(pa) 0
    wind_speed50_max(m/s) 0
    wind_speed50_min(m/s) 0
    wind_speed50_ave(m/s) 0
    prectotcorr 427
    total_demand(mw) 0
    date_2 0
    max_generation(mw) 0
    dtype: int64

    Within the prectotcorr characteristic, it nonetheless is smart if there’s a worth of 0, it could possibly be that the situation is certainly dry.

    Boxplot visualization with IQR earlier than outlier dealing with

    # Daftar kolom yang ingin divisualisasikan
    kolom_2 = [
    'day',
    'month',
    'year',
    'temp2(c)',
    'temp2_max(c)',
    'temp2_min(c)',
    'temp2_ave(c)',
    'suface_pressure(pa)',
    'wind_speed50_max(m/s)',
    'wind_speed50_min(m/s)',
    'wind_speed50_ave(m/s)',
    'prectotcorr',
    'total_demand(mw)',
    'max_generation(mw)'
    ]

    # Membuat palet warna
    color_palette = sns.color_palette("Set1", len(kolom_2))

    plt.determine(figsize=(15, 10))

    # Menghitung IQR untuk mendeteksi outliers
    Q1 = df[kolom_2].quantile(0.25)
    Q3 = df[kolom_2].quantile(0.75)
    IQR = Q3 - Q1

    # Menentukan batas bawah dan atas untuk outliers
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR

    # Record untuk menyimpan jumlah dan persentase outlier per kolom
    outliers_counts = []
    outliers_percent = []

    # Loop untuk membuat boxplot
    for i, column in enumerate(kolom_2, 1):
    plt.subplot((len(kolom_2) + 2) // 3, 3, i) # Mengatur grid subplot secara lebih aman
    sns.boxplot(knowledge=df, x=column, shade=color_palette[i - 1]) # Membuat boxplot per kolom

    # Menentukan knowledge outliers
    outliers = df[(df[column] < lower_bound[column]) |
    (df[column] > upper_bound[column])]

    # Menyimpan jumlah dan persentase outlier per kolom
    outliers_counts.append(len(outliers))
    outliers_percent.append(spherical(len(outliers) / len(df) * 100, 2))

    plt.xlabel(column)
    plt.title(f'{column} IQR Boxplot')

    # Menyesuaikan structure agar tidak saling tumpang tindih
    plt.tight_layout()
    plt.present()

    # Menampilkan jumlah dan persentase outliers dalam bentuk teks
    for col, depend, pct in zip(kolom_2, outliers_counts, outliers_percent):
    print(f'{col}: {depend} outliers ({pct}%)'

    day: 0 outliers (0.0%) month: 0 outliers (0.0%) 12 months: 0 outliers (0.0%) temp2(c): 6 outliers (0.45%) temp2_max(c): 62 outliers (4.67%) temp2_min(c): 0 outliers (0.0%) temp2_ave(c): 3 outliers (0.23%) suface_pressure(pa): 0 outliers (0.0%) wind_speed50_max(m/s): 24 outliers (1.81%) wind_speed50_min(m/s): 4 outliers (0.3%) wind_speed50_ave(m/s): 13 outliers (0.98%) prectotcorr: 93 outliers (7.0%) total_demand(mw): 0 outliers (0.0%) max_generation(mw): 1 outliers (0.08%)

    Dealing with Outlier

    from scipy.stats.mstats import winsorize
    import numpy as np
    import pandas as pd

    # # Copy dataframe ke variabel baru
    df_cleaned = df.copy()

    # 1. Clip fitur dengan outlier < 1%
    clip_cols = [
    'temp2(c)',
    'temp2_ave(c)',
    'wind_speed50_min(m/s)',
    'wind_speed50_ave(m/s)'
    ]

    for col in clip_cols:
    Q1 = df_cleaned[col].quantile(0.25)
    Q3 = df_cleaned[col].quantile(0.75)
    IQR = Q3 - Q1
    decrease = Q1 - 1.5 * IQR
    higher = Q3 + 1.5 * IQR
    df_cleaned[col] = df_cleaned[col].clip(decrease, higher)

    Clipping is chosen as a result of outliers in these options are comparatively minor and never too removed from the vary of affordable values. With clipping, we restrict excessive values ​​which can be too excessive or too low however nonetheless retain a lot of the data from the info.

    • `temp2(c): Temperature can have regular fluctuations, however very low or very excessive is unrealistic.
    • temp2_ave(c): Excessive each day temperature averages will be clipped to take care of a sensible temperature distribution.
    • wind_speed50_min(m/s) and wind_speed50_ave(m/s) : Wind speeds which can be too low or too excessive can have an effect on the evaluation outcomes, so they’re clipped to take care of knowledge stability.
    from scipy.stats.mstats import winsorize

    # Melakukan winsorize pada kedua kolom secara terpisah
    df_cleaned['temp2_max(c)'] = winsorize(df_cleaned['temp2_max(c)'], limits=[0.05, 0.05])
    df_cleaned['wind_speed50_max(m/s)'] = winsorize(df_cleaned['wind_speed50_max(m/s)'], limits=[0.05, 0.05])

    # Menampilkan statistik deskriptif sebelum dan sesudah winsorize
    print("Earlier than:")
    print(df[['temp2_max(c)', 'wind_speed50_max(m/s)']].describe())

    print("nAfter:")
    print(df_cleaned[['temp2_max(c)', 'wind_speed50_max(m/s)']].describe())

    Earlier than:
    temp2_max(c) wind_speed50_max(m/s)
    depend 1328.000000 1328.000000
    imply 30.185399 5.849149
    std 4.073221 1.992202
    min 17.800000 1.470000
    25% 27.980000 4.440000
    50% 30.800000 5.740000
    75% 32.322500 7.000000
    max 41.960000 21.300000

    After:
    temp2_max(c) wind_speed50_max(m/s)
    depend 1328.000000 1328.000000
    imply 30.177944 5.780203
    std 3.765384 1.670201
    min 22.490000 3.120000
    25% 27.980000 4.440000
    50% 30.800000 5.740000
    75% 32.322500 7.000000
    max 37.080000 9.030000

    Winsorizing is used for options which have important outliers that could be too removed from the conventional vary, however are nonetheless thought-about affordable within the context of the general knowledge. With winsorizing, we exchange the outliers with values ​​which can be nearer to the higher or decrease limits of what’s affordable, with out utterly eradicating them.

    • temp2_max(c): Every day most temperatures will be fairly excessive, however some excessive values ​​should be modified in order that they don’t have an effect on the mannequin.
    • wind_speed50_max(m/s): Most wind velocity may also present excessive values, however extreme outliers might be trimmed to extra lifelike values.
    df_cleaned['prectotcorr_log'] = np.log1p(df_cleaned['prectotcorr'])

    The explanations for utilizing logarithmic transformation on prectotcorr are:

    • Important Outliers: prectotcorr has many outliers (93 or 7%) that may considerably have an effect on the mannequin. Logarithmic transformation helps scale back the influence of those excessive values.
    • Skewed Distribution: The prectotcorr knowledge tends to be skewed, so logarithmic transformation helps steadiness the distribution and make it extra regular.
    • Mannequin Stability: By lowering excessive variations, logarithmic transformation makes the mannequin extra secure and simpler to study the principle patterns.

    Boxplot visualization with IQR after outlier dealing with

    day: 0 outliers (0.0%) month: 0 outliers (0.0%) 12 months: 0 outliers (0.0%) temp2(c): 0 outliers (0.0%) temp2_max(c): 0 outliers (0.0%) temp2_min(c): 0 outliers (0.0%) temp2_ave(c): 0 outliers (0.0%) suface_pressure(pa): 0 outliers (0.0%) wind_speed50_max(m/s): 0 outliers (0.0%) wind_speed50_min(m/s): 0 outliers (0.0%) wind_speed50_ave(m/s): 0 outliers (0.0%) prectotcorr_log: 0 outliers (0.0%) total_demand(mw): 0 outliers (0.0%) max_generation(mw): 1 outliers (0.08%)
    • Excessive Temperature Correlation

    The temperature values (temp2(c), temp2_max(c), temp2_min(c), temp2_ave(c)) are extremely correlated with one another, exhibiting practically good correlation (>0.90).

    • Relationship Between Electrical energy Demand and Generator Capability

    total_demand(mw) and max_generation(mw) have a correlation of 0.90. This implies that prime electrical energy demand is often adopted by a rise in technology capability.

    • Rainfall Correlation and Its Transformation

    prectotcorr and prectotcorr_log are strongly correlated (0.83). The log-transformed model (prectotcorr_log) is used because of its extra secure distribution, which is nearer to regular.

    • Air Strain vs. Minimal Temperature

    temp2_min(c) and suface_pressure(pa) present a powerful unfavourable correlation (-0.86). This means that greater air stress is usually related to decrease minimal temperatures, seemingly because of particular climate patterns.

    • Redundancy in Wind Velocity Knowledge

    Wind variables (wind_speed50_max, wind_speed50_min, wind_speed50_ave) are extremely correlated with one another (>0.90), suggesting redundancy within the knowledge.

    Practice fitur with MinMaxScaler

    practice = df_cleaned.copy()

    # Normalisasi fitur agar berada dalam rentang 0-1
    scaler = MinMaxScaler()
    scaled_data = scaler.fit_transform(practice.drop([
    'date_1', 'date_2', 'prectotcorr', # Kolom temporal
    'total_demand(mw)'# Kolom target
    ], axis=1))

    y = practice['total_demand(mw)'].values

    Lengthy Quick Time period Reminiscence

    LSTM (Lengthy Quick-Time period Reminiscence) is used on this mannequin due to its primary potential to deal with time sequence knowledge, the place earlier values ​​have an affect on future values. The information used exhibits seasonal patterns or time-dependent developments, making LSTM efficient in studying such temporal relationships. By utilizing time_steps = 10, this mannequin makes use of the earlier 10 occasions to foretell the long run worth of total_demand(mw), making it very related for data-based prediction functions like this.

    The characteristic knowledge is normalized utilizing MinMaxScaler to be within the vary of 0–1. This course of is essential as a result of it helps the mannequin to work extra effectively with knowledge that has a constant scale. Normalization prevents options with a bigger vary of values ​​from dominating the mannequin’s studying, whereas additionally dashing up the convergence of the coaching course of. After the info is normalized, train-test break up is used to divide the dataset into coaching and testing elements, guaranteeing that the mannequin will be evaluated correctly and avoiding overfitting on the coaching knowledge.


    # Membuat dataset untuk LSTM dalam bentuk [samples, time steps, features]
    def create_lstm_dataset(knowledge, goal, time_steps=1):
    X, y = [], []
    for i in vary(len(knowledge) - time_steps):
    X.append(knowledge[i:(i + time_steps)])
    y.append(goal[i + time_steps])
    return np.array(X), np.array(y)

    time_steps = 10
    X_lstm, y_lstm = create_lstm_dataset(scaled_data, y, time_steps)

    # Practice-Check Cut up
    X_train_lstm, X_test_lstm, y_train_lstm, y_test_lstm = train_test_split(X_lstm, y_lstm, test_size=0.2, random_state=42)

    # Membangun mannequin LSTM
    lstm_model = Sequential()
    lstm_model.add(LSTM(items=50, return_sequences=False, input_shape=(X_train_lstm.form[1], X_train_lstm.form[2])))
    lstm_model.add(Dense(items=1))

    lstm_model.compile(optimizer='adam', loss='mean_squared_error')

    # Melatih mannequin LSTM
    lstm_model.match(X_train_lstm, y_train_lstm, epochs=10, batch_size=32, verbose=1)

    # Prediksi dan Evaluasi
    y_pred_lstm = lstm_model.predict(X_test_lstm)
    mae_lstm = mean_absolute_error(y_test_lstm, y_pred_lstm)

    print(f'Imply Absolute Error (LSTM): {mae_lstm}')

    Imply Absolute Error (LSTM): 10108.685760461924

    Practice fitur with out Scaler

    # Persiapan Knowledge
    practice = df_cleaned.copy()

    # Pisahkan Fitur dan Goal
    X = practice.drop([
    'date_1', 'date_2', 'prectotcorr', # Kolom temporal
    'total_demand(mw)'# Kolom target
    ], axis=1)
    y = practice['total_demand(mw)']

    # Practice-Check Cut up
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Menyamakan urutan kolom antara X_train dan X_test
    X_test = X_test[X_train.columns]

    Random Forest

    The Random Forest mannequin was chosen due to its potential to deal with knowledge with numerous kinds of options with out requiring a lot extra preprocessing. The options utilized in coaching have been chosen by eradicating temporal columns (date_1, date_2 ) and changing the prectotcorr variable with its extra secure transformation type. Random Forest is appropriate for software as a result of it is ready to seize advanced non-linear relationships between variables resembling temperature, wind velocity, air stress, and electrical energy manufacturing capability, which have dynamic dependence patterns on whole electrical energy demand total_demand(mw) . This mannequin makes use of an ensemble approach based mostly on many determination timber, thereby minimizing the chance of overfitting whereas rising the generalization of predictions on new knowledge.

    # Practice Random Forest
    rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
    rf_model.match(X_train, y_train)

    # Prediksi dan Evaluasi
    y_pred_rf = rf_model.predict(X_test)
    mae_rf = mean_absolute_error(y_test, y_pred_rf)

    print(f'Imply Absolute Error (Random Forest): {mae_rf}')
    print("R² Rating:", rf_model.rating(X_test, y_test))

    Imply Absolute Error (Random Forest): 211.4306390977444
    R² Rating: 0.9541438801905043

    XGBoost

    model_xgb = XGBRegressor(
    n_estimators=200,
    learning_rate=0.05,
    random_state=42
    )
    model_xgb.match(X_train, y_train)

    # Evaluasi

    y_pred_xgb = model_xgb.predict(X_test)

    print("XGBoost Efficiency:")
    print("XGBoost MAE:", mean_absolute_error(y_test, y_pred_xgb))
    print("R² Rating:", model_xgb.rating(X_test, y_test))

    XGBoost Efficiency:
    XGBoost MAE: 229.0411349418468
    R² Rating: 0.9299105962251898

    Decission Tree

    Choice Bushes are used as a result of they’re straightforward to interpret and might seize non-linear relationships in knowledge with out requiring particular pre-processing.
    This mannequin divides the info based mostly on the circumstances that scale back probably the most impurities, so it may well instantly reveal essential patterns within the options of the dataset.

    # Normalisasi (match di coaching, remodel di coaching & take a look at)
    scaler = StandardScaler()
    X_train_norm = scaler.fit_transform(X_train)
    X_test_norm = scaler.remodel(X_test)

    # Choice Tree Regressor
    tree = DecisionTreeRegressor(random_state=42)
    tree.match(X_train_norm, y_train)

    # Prediksi
    y_pred = tree.predict(X_test_norm)

    # Evaluasi
    mae = mean_absolute_error(y_test, y_pred)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    r2 = r2_score(y_test, y_pred)

    print("Choice Tree Regressor Efficiency:")
    print(f"MAE: {mae:.2f}")
    print(f"RMSE: {rmse:.2f}")
    print(f"R² Rating: {r2:.2f}")

    Choice Tree Regressor Efficiency:
    MAE: 259.94
    RMSE: 426.03
    R² Rating: 0.92

    Linear Regression

    This mannequin assumes a linear relationship between options and targets, making it appropriate for measuring how huge a direct relationship there’s between enter variables and total_demand.
    By attempting Linear Regression, it may be seen whether or not the connection sample within the knowledge is straightforward sufficient or requires a extra advanced mannequin.

    model_lr = LinearRegression()
    model_lr.match(X_train, y_train)

    # Evaluasi
    y_pred_lr = model_lr.predict(X_test)
    print("Linear Regression Efficiency")
    print("Linear Regression MAE:", mean_absolute_error(y_test, y_pred_lr))
    print("R² Rating:", model_lr.rating(X_test, y_test))

    Linear Regression Efficiency
    Linear Regression MAE: 317.28977801920007
    R² Rating: 0.9241553777751903

    Stacking Regressor: Ensemble of XGBoost, Linear Regression, and Random Forest


    base_models = [
    ('xgb', XGBRegressor(n_estimators=200, learning_rate=0.05, random_state=42)),
    ('rf', RandomForestRegressor(n_estimators=100, random_state=42)),
    ('lr', LinearRegression())
    ]

    # Buat mannequin stacking
    stack_model = StackingRegressor(
    estimators=base_models,
    final_estimator=LinearRegression()
    )

    # Latih mannequin stacking
    stack_model.match(X_train, y_train)

    # Simpan mannequin yang sudah dilatih
    joblib.dump(stack_model, 'stacking_model.pkl')

    # Evaluasi mannequin stacking
    y_pred_stack = stack_model.predict(X_test)
    print("Stacking MAE:", mean_absolute_error(y_test, y_pred_stack))
    print("Stacking R² Rating:", stack_model.rating(X_test, y_test))

    Stacking MAE: 210.68754271573795
    Stacking R² Rating: 0.9542899856989027

    Hyperparameter Tuning for Stacked Mannequin (XGBoost, Random Forest, and Linear Regression)

    def goal(trial):
    # Hyperparameter dari base fashions
    xgb_n_estimators = trial.suggest_int("xgb_n_estimators", 50, 300)
    xgb_learning_rate = trial.suggest_float("xgb_learning_rate", 0.01, 0.3)
    rf_n_estimators = trial.suggest_int("rf_n_estimators", 50, 200)
    rf_max_depth = trial.suggest_int("rf_max_depth", 3, 15)

    # Definisikan base fashions
    base_models = [
    ('xgb', XGBRegressor(
    n_estimators=xgb_n_estimators,
    learning_rate=xgb_learning_rate,
    random_state=42
    )),
    ('rf', RandomForestRegressor(
    n_estimators=rf_n_estimators,
    max_depth=rf_max_depth,
    random_state=42
    )),
    ('lr', LinearRegression())
    ]

    # Closing estimator bisa kamu ganti juga kalau mau
    final_estimator = RidgeCV()

    # Buat mannequin stacking
    stack_model = StackingRegressor(
    estimators=base_models,
    final_estimator=final_estimator
    )

    # Coaching
    stack_model.match(X_train, y_train)

    # Prediksi dan hitung MAE
    y_pred = stack_model.predict(X_test)
    mae = mean_absolute_error(y_test, y_pred)
    return mae # Tujuan: minimalkan MAE

    # Jalankan Optuna Examine
    examine = optuna.create_study(path="reduce")
    examine.optimize(goal, n_trials=20)
    # Ambil finest hyperparameters dari Optuna
    best_params = examine.best_params

    # Buat ulang base fashions dengan finest params
    best_base_models = [
    ('xgb', XGBRegressor(
    n_estimators=best_params['xgb_n_estimators'],
    learning_rate=best_params['xgb_learning_rate'],
    random_state=42
    )),
    ('rf', RandomForestRegressor(
    n_estimators=best_params['rf_n_estimators'],
    max_depth=best_params['rf_max_depth'],
    random_state=42
    )),
    ('lr', LinearRegression())
    ]

    # Closing estimator bisa diganti jika perlu
    final_estimator = RidgeCV()

    # Buat ulang mannequin stacking dengan parameter terbaik
    best_stack_model = StackingRegressor(
    estimators=best_base_models,
    final_estimator=final_estimator
    )

    # Latih ulang mannequin
    best_stack_model.match(X_train, y_train)

    # Prediksi ulang
    y_pred_best = best_stack_model.predict(X_test)

    # Evaluasi
    mae = mean_absolute_error(y_test, y_pred_best)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred_best)) # RMSE secara guide
    r2 = r2_score(y_test, y_pred_best)

    # Cetak hasil
    print("Finest MAE:", mae)
    print("Finest RMSE:", rmse)
    print(f"R² Rating : {r2:.4f}")
    print("Finest Hyperparameters:", best_params)

    Finest MAE: 209.1082478920435
    Finest RMSE: 331.4287166593046
    R² Rating : 0.9538
    Finest Hyperparameters: {'xgb_n_estimators': 261, 'xgb_learning_rate': 0.14408695908396266, 'rf_n_estimators': 125, 'rf_max_depth': 12}

    Based mostly on the accuracy outcomes of varied fashions, the one with good accuracy is the Stacking mannequin.

    import pandas as pd
    from xgboost import XGBRegressor
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import mean_absolute_error

    test_df = df_cleaned_test

    X_test = test_df.drop([
    'date_1', 'date_2', 'prectotcorr', # Kolom temporal
    'total_demand(mw)'
    ], axis=1, errors='ignore')

    stack_model = joblib.load('stacking_model.pkl')

    y_test_pred = stack_model.predict(X_test)

    test_df['total_demand(mw)'] = y_test_pred

    output_columns = ['date_1', 'total_demand(mw)']
    test_df[output_columns].to_csv('predictions_results.csv', index=False)

    pd.set_option('show.max_rows', None) # Tampilkan semua baris
    print(test_df[['date_1', 'total_demand(mw)']].head(10))

    submissions = pd.DataFrame({'date_1': test_df['date_1'], 'total_demand(mw)': y_test_pred})
    submissions.to_csv('submission.csv', index=False)



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleDevelop a Lifetime of New Skills for Only $20
    Next Article Boost Productivity With This Adjustable Stand With Port Hub for Just $100
    Team_AIBS News
    • Website

    Related Posts

    Machine Learning

    Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025

    July 1, 2025
    Machine Learning

    Why PDF Extraction Still Feels LikeHack

    July 1, 2025
    Machine Learning

    🚗 Predicting Car Purchase Amounts with Neural Networks in Keras (with Code & Dataset) | by Smruti Ranjan Nayak | Jul, 2025

    July 1, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    STOP Building Useless ML Projects – What Actually Works

    July 1, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    JPMorgan CEO Jamie Dimon Regrets Cursing But Stands By RTO

    February 26, 2025

    How Consistency Shapes a Strong Company Culture

    June 11, 2025

    Google Unwinds Diversity Goals, Citing Trump’s DEI Orders

    February 5, 2025
    Our Picks

    STOP Building Useless ML Projects – What Actually Works

    July 1, 2025

    Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025

    July 1, 2025

    The New Career Crisis: AI Is Breaking the Entry-Level Path for Gen Z

    July 1, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.