Close Menu
    Trending
    • Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025
    • The New Career Crisis: AI Is Breaking the Entry-Level Path for Gen Z
    • Musk’s X appoints ‘king of virality’ in bid to boost growth
    • Why Entrepreneurs Should Stop Obsessing Over Growth
    • Implementing IBCS rules in Power BI
    • What comes next for AI copyright lawsuits?
    • Why PDF Extraction Still Feels LikeHack
    • GenAI Will Fuel People’s Jobs, Not Replace Them. Here’s Why
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Machine Learning»Predicting Delivery Times with Machine Learning: From Data Analysis to Neural Networks | by Faraz Ahmed | Mar, 2025
    Machine Learning

    Predicting Delivery Times with Machine Learning: From Data Analysis to Neural Networks | by Faraz Ahmed | Mar, 2025

    Team_AIBS NewsBy Team_AIBS NewsMarch 4, 2025No Comments8 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Concerning the Firm

    Porter is India’s largest market for intra-city logistics, revolutionizing last-mile deliveries throughout varied sectors. As a frontrunner within the nation’s $40 billion intra-city logistics market, Porter has considerably enhanced operational effectivity and livelihood alternatives for over 150,000 driver-partners. The corporate has efficiently fulfilled greater than 5 million buyer orders, providing a seamless and technology-driven logistics expertise.

    Goal

    Environment friendly meals supply is essential for buyer satisfaction, and one of many key features is estimating supply time precisely. Porter collaborates with a number of eating places and has a fleet of supply companions, however predicting supply instances may be advanced resulting from quite a few influencing elements. The target of this undertaking is to construct a machine studying mannequin that may predict the estimated supply time primarily based on:

    • Order particulars: Variety of objects, distinct objects, complete value, and many others.
    • Restaurant data: Market ID, retailer class.
    • Logistics information: Availability of supply companions, excellent orders, and order success capability.

    By leveraging historic information and machine studying strategies, we intention to develop a strong regression mannequin that precisely predicts supply time.

    Ideas Used

    This undertaking entails a number of key ideas from information science and machine studying, together with:

    • Exploratory Knowledge Evaluation (EDA): Understanding information distribution, relationships, and have significance.
    • Function Engineering & Preprocessing: Dealing with lacking values, encoding categorical variables, and have scaling.
    • Regression Modelling: Utilizing algorithms like Neural Networks, XGBoost Regressor, and Linear Regression to foretell supply time.
    • Mannequin Analysis: Evaluating fashions utilizing error metrics equivalent to MAE (Imply Absolute Error) and R-2 (R-Squared).

    Challenges and Concerns

    Some challenges in estimating supply time embody:

    • Actual-time Associate Availability: The variety of free supply companions varies dynamically.
    • Order Complexity: Bigger orders with a number of objects could take longer to arrange.
    • Visitors and Environmental Components: Unpredictable circumstances could influence supply velocity.

    Addressing these elements requires a well-designed predictive mannequin that captures related options and adapts to real-world variations.

    Earlier than constructing our predictive mannequin, it’s essential to discover the dataset and perceive the underlying patterns. This part covers information inspection, preprocessing, and key insights derived from the exploratory evaluation.

    The dataset consists of information the place every row represents a novel meals supply order. The primary options embody:

    • Order Info: Market ID, retailer class, order protocol, variety of objects, and complete value.
    • Logistics Knowledge: Variety of on-shift and busy supply companions, excellent orders.
    • Time Stamps: Order placement time and precise supply time.

    We begin by loading the dataset and checking its construction.

    From the output, we affirm that the dataset incorporates lacking values, categorical variables, and timestamps that want conversion.

    The created_at and actual_delivery_time columns are transformed to datetime format. We then derive further options:

    • Time Taken for Supply: Distinction between actual_delivery_time and created_at, transformed to minutes.
    • Day and Time Options: Extracting day of the week, month, hour, and minute from timestamps to seize time-based traits.
    # Changing date columns to datetime format
    df['created_at'] = pd.to_datetime(df['created_at'])
    df['actual_delivery_time'] = pd.to_datetime(df['actual_delivery_time'])

    # Making a column with the supply time of the order
    df['time_taken'] = df['actual_delivery_time'] - df['created_at']
    df['time_taken'] = df['time_taken'].dt.total_seconds() / 60
    df['time_taken'] = np.spherical(df['time_taken'])

    df['day_of_week'] = df['created_at'].dt.day_of_week
    df['year_o'] = df['created_at'].dt.yr.astype('int64')
    df['month_o'] = df['created_at'].dt.month.astype('int64')
    df['day_o'] = df['created_at'].dt.day.astype('int64')
    df['hour_o'] = df['created_at'].dt.hour.astype('int64')
    df['minute_o'] = df['created_at'].dt.minute.astype('int64')
    df['second_o'] = df['created_at'].dt.second.astype('int64')

    df['year_d'] = df['actual_delivery_time'].dt.yr.astype('int64')
    df['month_d'] = df['actual_delivery_time'].dt.month.astype('int64')
    df['day_d'] = df['actual_delivery_time'].dt.day.astype('int64')
    df['hour_d'] = df['actual_delivery_time'].dt.hour.astype('int64')
    df['minute_d'] = df['actual_delivery_time'].dt.minute.astype('int64')
    df['second_d'] = df['actual_delivery_time'].dt.second.astype('int64')
    df['day_of_week_d'] = df['actual_delivery_time'].dt.day_of_week

    # Dropping the date columns as we've extracted the required data.
    df.drop(['created_at', 'actual_delivery_time'], axis=1, inplace=True)

    To make sure information integrity, lacking values are dealt with as follows:

    • Categorical Options: Crammed utilizing the mode (most frequent worth).
    • Numerical Options: Crammed utilizing the median for robustness in opposition to outliers.
    df['store_primary_category'] = df['store_primary_category'].fillna(df['store_primary_category'].mode()[0])
    df['total_onshift_partners'] = df['total_onshift_partners'].fillna(df['total_onshift_partners'].median())
    df['total_busy_partners'] = df['total_busy_partners'].fillna(df['total_busy_partners'].median())
    df['total_outstanding_orders'] = df['total_outstanding_orders'].fillna(df['total_outstanding_orders'].median())
    df['market_id'] = df['market_id'].fillna(df['market_id'].mode()[0])
    df['order_protocol'] = df['order_protocol'].fillna(df['order_protocol'].mode()[0])

    Distribution of Supply Time

    We plot the histogram of supply instances to grasp its distribution:

    plt.determine(figsize=(15, 6))
    sns.set_style('whitegrid')
    sns.histplot(df['time_taken'])
    plt.xscale('log')
    plt.present()

    Perception: Most deliveries take between 10 to 100 minutes, indicating a right-skewed distribution.

    We visualize the highest 10 restaurant classes with the very best order counts:

    plt.determine(figsize=(12, 8))

    top_categories = df['store_primary_category'].value_counts().nlargest(10)
    sns.barplot(
    x=top_categories.values,
    y=top_categories.index,
    palette='Blues_r'
    )

    for i, v in enumerate(top_categories.values):
    plt.textual content(v + 5, i, str(v), va='heart', fontsize=10, colour='black')

    plt.title('High 10 Retailer Major Classes', fontsize=16)
    plt.xlabel('Depend', fontsize=14)
    plt.ylabel('Class', fontsize=14)
    plt.grid(axis='x', linestyle='--', alpha=0.7)

    plt.tight_layout()
    plt.present()

    We analyze when orders peak in the course of the day and throughout the week.

    Insights:

    • Orders peak throughout early morning hours (2 AM — 4 AM).
    • Fridays, Saturdays, and Sundays observe the very best variety of orders.
    fig, ax = plt.subplots(figsize=(15, 8), nrows=1, ncols=2)
    sns.countplot(information=df, x='hour_d', ax=ax[0], colour='#82b1ff')
    sns.countplot(information=df, x='day_of_week_d', ax=ax[1], colour='#ffcc80')
    days = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
    ax[1].set_xticks(vary(7))
    ax[1].set_xticklabels(days)

    ax[0].set_title('Depend of orders by hour of day')
    ax[1].set_title('Depend of orders by day of week')
    plt.present()

    We look at the variety of orders throughout completely different markets and order protocols.

    Insights:

    • Market ID 2 and 4 obtain the very best orders.
    • Sure order protocols dominate the order placement strategies.
    fig2, ax = plt.subplots(figsize=(15,6), nrows=1, ncols=2)

    sns.countplot(information=df, x='market_id', ax=ax[0], colour='#82b1ff')
    ax[0].set_title('Variety of orders by Market ID')

    sns.countplot(information=df, x='order_protocol', ax=ax[1], colour='#ffcc80')
    ax[1].set_title('Variety of orders by Order Protocol')

    plt.present()

    To grasp supply-demand steadiness, we analyze common on-shift and busy companions per market:

    Perception: Markets 2 and 4 have the very best accomplice availability and excellent orders, indicating excessive demand.

    avg_data_market = pd.DataFrame({
    'Avg. Onshift Companions': df.groupby('market_id')['total_onshift_partners'].imply(),
    'Avg. Busy Companions': df.groupby('market_id')['total_busy_partners'].imply(),
    'Avg. Excellent Orders': df.groupby('market_id')['total_outstanding_orders'].imply()
    }).sort_values('Avg. Excellent Orders', ascending=False)

    avg_data_market.plot(
    type='barh',
    stacked=True,
    figsize=(12, 8),
    colour=['#82b1ff', '#ffcc80', '#a5d6a7']
    )

    plt.title('Common Companions and Orders per Market', fontsize=16)
    plt.xlabel('Common Depend', fontsize=14)
    plt.ylabel('Market ID', fontsize=14)
    plt.legend(title='Metrics', fontsize=10, title_fontsize=12)
    plt.tight_layout()

    plt.present()

    We test how most merchandise value varies throughout completely different markets and order strategies.

    Insights:

    • Market ID 4 receives the highest-value orders, warranting prioritized useful resource allocation.
    • Order Protocol 1 sees high-value transactions, making it a most well-liked technique for costly deliveries.
    avg_data = pd.DataFrame({
    'Avg. Onshift Companions': df.groupby('day_of_week_d')['total_onshift_partners'].imply(),
    'Avg. Busy Companions': df.groupby('day_of_week_d')['total_busy_partners'].imply(),
    'Avg. Excellent Orders': df.groupby('day_of_week_d')['total_outstanding_orders'].imply()
    })

    avg_data.plot(
    type='bar',
    stacked=True,
    figsize=(10, 6),
    colour=['#82b1ff', '#ffcc80', '#a5d6a7']
    )

    plt.title('Avg. Companions and Orders per Day of Week', fontsize=16)
    plt.xlabel('Day of the Week', fontsize=14)
    plt.ylabel('Common Depend', fontsize=14)
    plt.xticks(ticks=vary(7), labels=['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'], rotation=45)
    plt.legend(title='Metrics', fontsize=10, title_fontsize=12)
    plt.tight_layout()

    plt.present()

    We analyze whether or not supply time and order worth fluctuate by day of the week.

    Insights:

    • Weekends see longer supply instances, possible resulting from excessive order quantity.
    • Greater-value orders are additionally positioned on weekends, reinforcing the weekend demand surge.
    plt.determine(figsize = (15,6))
    sns.set_style("whitegrid")
    palette = sns.color_palette("Paired")
    days = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']

    plt.subplot(1,2,1)
    sns.lineplot(information = df_3, x = 'day_of_week_d', y = 'time_taken', colour = 'mediumturquoise')
    ax = plt.gca()
    ax.set_xticks(vary(7))
    ax.set_xticklabels(days)
    ax.set_title('Common time taken by orders by day of week')

    plt.subplot(1,2,2)
    sns.lineplot(information = df_3, x = 'day_of_week_d', y = 'subtotal', colour = 'mediumturquoise')
    ax = plt.gca()
    ax.set_xticks(vary(7))
    ax.set_xticklabels(days)
    ax.set_title('Common subtotal of orders by day of week')

    plt.tight_layout()
    plt.present()

    1. Supply time is skewed, with most orders taking between 10–100 minutes.
    2. Peak order hours are between 2 AM — 4 AM, and weekends see probably the most orders.
    3. Market ID 2 and 4 have the highest demand and accomplice availability.
    4. Costlier orders are positioned through Order Protocol 1 and Market 4.
    5. Supply time will increase on weekends, aligning with high-value orders.

    To grasp the relationships between numerical variables, we first compute the correlation matrix:

    df_corr = df.drop(['market_id','store_primary_category','order_protocol','day_of_week','year_o','month_o','day_o','hour_o','minute_o','second_o','year_d','month_d','day_d','minute_d','second_d'],axis =1).corr()
    plt.determine(figsize=(15, 10))
    sns.heatmap(df_corr, annot=True, cmap='viridis')
    plt.title('Correlation Matrix', fontsize=16)

    Robust correlation (0.94) is noticed between total_onshift_partners, total_busy_partners, and total_outstanding_orders.

    To mitigate multicollinearity, we progressively take away extremely correlated options and recompute the correlation matrix.

    df_corr = df.drop(['market_id','store_primary_category','order_protocol','day_of_week','year_o','month_o','day_o','hour_o','minute_o','second_o','year_d','month_d','day_d','minute_d','second_d', 'total_busy_partners', 'total_outstanding_orders'], axis=1).corr()
    plt.determine(figsize=(15, 10))
    sns.heatmap(df_corr, annot=True, cmap='viridis')
    plt.title('Correlation Matrix', fontsize=16)
    • After removing, we observe no excessive correlations, confirming our choice.
    • We completely drop total_busy_partners and total_outstanding_orders.
    df.drop(['total_busy_partners', 'total_outstanding_orders'], axis=1, inplace=True)

    Outliers can considerably have an effect on mannequin efficiency by forcing the mannequin to be taught excessive datapoints that doesn’t signify the final pattern. We analyze outliers in key numerical options:

    num_cols = ['total_onshift_partners', 'max_item_price','min_item_price', 'subtotal', 'time_taken']

    fig6, ax = plt.subplots(nrows=5, ncols=2, figsize=(15, 15))

    for i, col in enumerate(num_cols):
    sns.distplot(df[col], ax=ax[i, 0], colour='#E6A9EC')
    sns.boxplot(information=df, x=df[col], ax=ax[i, 1], colour='#F08080')
    ax[i, 0].set_title(f"{col} Distribution")
    ax[i, 1].set_title(f"Boxplot of {col}")

    plt.tight_layout()

    • Many numerical options exhibit outliers past the 99th percentile.
    • We resolve to take away values above the 99th percentile as potential outliers.
    p1 = np.percentile(df['total_onshift_partners'], 99)
    p2 = np.percentile(df['max_item_price'], 99)
    p3 = np.percentile(df['subtotal'], 99)
    p4 = np.percentile(df['time_taken'], 99)
    p5 = np.percentile(df['min_item_price'], 99)

    df = df[~(df['total_onshift_partners'] > p1)]
    df = df[~(df['max_item_price'] > p2)]
    df = df[~(df['subtotal'] > p3)]
    df = df[~(df['time_taken'] > p4)]
    df = df[~(df['min_item_price'] > p5)]

    Categorical options should be reworked for machine studying fashions. We apply goal encoding to store_primary_category, changing every class with its imply supply time (time_taken).

    # Performing Goal encoding on 'store_primary_category'.

    df['store_primary_category'] = df.groupby('store_primary_category')['time_taken'].remodel('imply')

    To arrange information for mannequin coaching, we break up it into coaching and take a look at units and apply normal scaling.

    #Prepare set
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    #Validation set
    X_train_val, X_test_val, y_train_val, y_test_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_train_val_scaled = scaler.fit_transform(X_train_val)
    X_test_val_scaled = scaler.remodel(X_test_val)
    X_test_scaled = scaler.remodel(X_test)

    After finishing the preprocessing steps, we proceeded with coaching a number of fashions to foretell supply instances. We explored a Neural Community (NN), XGBoost Regressor, and Linear Regression, evaluating their Imply Absolute Error (MAE) and R² scores to guage efficiency.

    Neural Community Mannequin

    We designed a deep studying mannequin utilizing Keras Sequential API, consisting of a number of dense layers. The structure was as follows:

    • 256 neurons within the enter layer
    • Hidden layers: 128, 64, 32, and 16 neurons
    • LeakyReLU activations for higher gradient stream
    • Batch Normalization to stabilize coaching
    • Linear activation within the output layer
    mannequin = Sequential([
    Dense(256, input_shape=(X_train_val_scaled.shape[1],)),
    LeakyReLU(),

    Dense(128),
    BatchNormalization(),
    LeakyReLU(),

    Dense(64),
    LeakyReLU(),

    Dense(32),
    LeakyReLU(),

    Dense(16),
    BatchNormalization(),
    LeakyReLU(),

    Dense(1, activation="linear")
    ])

    Coaching Technique

    To optimize coaching, we utilized:

    • Studying Fee Scheduling: The educational charge decreases over epochs utilizing an adaptive decay perform.
    • Early Stopping: Prevents overfitting by monitoring validation MAE and stopping coaching when efficiency now not improves.
    • TensorBoard Logging: Used for monitoring the mannequin’s coaching progress.
    def advanced_lr_decay(epoch, lr):
    if epoch < 50:
    return lr * 0.97
    elif epoch < 100:
    return lr * 0.95
    elif epoch < 150:
    return lr * 0.93
    else:
    return lr * 0.90

    scheduler = tf.keras.callbacks.LearningRateScheduler(advanced_lr_decay)

    Mannequin Coaching

    mannequin.compile(optimizer='adam', loss=Huber(delta=1.0), metrics=["mae"])

    early_stop = EarlyStopping(monitor="val_mae", endurance=30, restore_best_weights=True)

    log_dir = "logs/tuning/" + datetime.datetime.now().strftime("%Ypercentmpercentd-%HpercentMpercentS")
    tensorboard_callback = TensorBoard(log_dir=log_dir, histogram_freq=1)
    ModelCheckpointCallback = tf.keras.callbacks.ModelCheckpoint(filepath='best_model_reg.h5',
    monitor='val_accuracy',
    save_best_only=True,
    mode='max')

    historical past = mannequin.match(X_train_val_scaled, y_train_val, epochs=200, batch_size=256, validation_data=(X_test_val_scaled, y_test_val), callbacks=[early_stop,scheduler,tensorboard_callback,ModelCheckpointCallback])

    Coaching Efficiency

    After coaching for 200 epochs, the mannequin achieved:

    • Validation MAE: 0.2326
    • R² Rating: 0.9837

    Loss vs. Epochs Plot

    epochs = historical past.epoch
    loss = historical past.historical past['loss']
    mae = historical past.historical past['mae']
    val_loss = historical past.historical past['val_loss']
    val_mae = historical past.historical past['val_mae']

    plt.determine()
    plt.plot(epochs, loss, label="prepare")
    plt.plot(epochs, val_loss, label="val")
    plt.legend()
    plt.title("Loss VS Epochs")
    plt.present()

    plt.determine()
    plt.plot(epochs, mae, label="prepare")
    plt.plot(epochs, val_mae, label="validation")
    plt.legend()
    plt.title("MAE VS Epochs")
    plt.present()



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous Article5 Industries Using Real-Time Data Visualization
    Next Article Inside the Wild West of AI companionship
    Team_AIBS News
    • Website

    Related Posts

    Machine Learning

    Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025

    July 1, 2025
    Machine Learning

    Why PDF Extraction Still Feels LikeHack

    July 1, 2025
    Machine Learning

    🚗 Predicting Car Purchase Amounts with Neural Networks in Keras (with Code & Dataset) | by Smruti Ranjan Nayak | Jul, 2025

    July 1, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025

    July 1, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    The Rise and Fall of Inflection’s AI Chatbot, Pi

    April 1, 2025

    The Growing Demand for Web3 Professionals & How Certifications Can Help

    March 25, 2025

    PyScript vs. JavaScript: A Battle of Web Titans

    April 2, 2025
    Our Picks

    Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025

    July 1, 2025

    The New Career Crisis: AI Is Breaking the Entry-Level Path for Gen Z

    July 1, 2025

    Musk’s X appoints ‘king of virality’ in bid to boost growth

    July 1, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.