Close Menu
    Trending
    • STOP Building Useless ML Projects – What Actually Works
    • Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025
    • The New Career Crisis: AI Is Breaking the Entry-Level Path for Gen Z
    • Musk’s X appoints ‘king of virality’ in bid to boost growth
    • Why Entrepreneurs Should Stop Obsessing Over Growth
    • Implementing IBCS rules in Power BI
    • What comes next for AI copyright lawsuits?
    • Why PDF Extraction Still Feels LikeHack
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Machine Learning»Identifying Fraudulent Digital Transactions: A Machine Learning Approach | by Henrique Peter | Dec, 2024
    Machine Learning

    Identifying Fraudulent Digital Transactions: A Machine Learning Approach | by Henrique Peter | Dec, 2024

    Team_AIBS NewsBy Team_AIBS NewsDecember 10, 2024No Comments7 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    The purpose of this mission is to establish fraudulent digital transactions, which might trigger vital monetary losses for monetary establishments and inconvenience for his or her customers. To realize this, we developed a machine studying mannequin that outperforms a pre-existing mannequin utilizing the identical dataset. By evaluating the efficiency of each fashions, we not solely consider conventional metrics like precision, F1-score, and AUC-ROC but in addition reveal the monetary implications of the mannequin’s effectiveness. This strategy highlights the potential further earnings the establishment may generate by figuring out fraudulent transactions extra successfully.

    All through this mission, I employed a number of methods, together with Bivariate Exploratory Knowledge Evaluation (EDA) to generate hypotheses and perceive the connection between variables and fraud. Hyperparameter tuning was finished utilizing MLFlow to take care of the very best management over the exams, and SHAP (SHapley Additive exPlanations) was utilized to elucidate the mannequin’s predictions and establish the variables with essentially the most influence.

    If you wish to try the whole code and mission, click on on this link

    Photograph by CardMapr.nl on Unsplash

    Monetary Impacts of Fraud

    Fraudulent transactions pose a direct and devastating monetary influence on establishments. Within the case of this mission, the monetary establishment earns 10% of every authorised transaction. Nevertheless, when a fraudulent transaction is authorised, the establishment loses 100% of the transaction worth. This creates a situation the place every fraudulent transaction not solely prices the establishment the fee but in addition leads to a complete loss equal to the transaction quantity.

    As an illustration, if the establishment processes 100 official transactions of $1,000 every, it earns $10,000. Nevertheless, if a single $1,000 fraudulent transaction is authorised, the loss is $1,000, nullifying the features of 10 official transactions. Thus, efficient fraud detection is important to safeguard the establishment’s monetary well being.

    Photograph by Choong Deng Xiang on Unsplash

    Mannequin Effectivity and Effectiveness

    1. Fraud Detection Fee (Precision): This measures the share of transactions recognized as fraudulent which can be really fraudulent.
      Formulation:True Fraudulent Transactions / Transactions Recognized as Fraudulent × 100
    2. Fraud Protection Fee (Recall): This measures the share of precise fraudulent transactions which can be appropriately recognized by the mannequin.
      Formulation:True Fraudulent Transactions Recognized / Whole Fraudulent Transactions × 100
    3. Mannequin Accuracy: The proportion of all transactions (each fraudulent and non-fraudulent) which can be appropriately categorised.
      Formulation:Accurately Labeled Transactions / Whole Transactions × 100
    4. False Constructive Fee: The proportion of non-fraudulent transactions incorrectly recognized as fraudulent.
      Formulation:False Positives / Whole Non-Fraudulent Transactions × 100
    5. False Damaging Fee: The proportion of fraudulent transactions not detected by the mannequin.
      Formulation:False Negatives / Whole Fraudulent Transactions × 100

    Monetary KPIs

    1. Whole Worth of Fraud Prevented: This represents the full cash saved by blocking fraudulent transactions.
      Formulation:Sum of Values of Blocked Fraudulent Transactions
    2. Price of Undetected Fraud: This represents the cash misplaced on account of undetected fraudulent transactions.
      Formulation:Sum of Values of Undetected Fraudulent Transactions
    3. Fraud Fee: The proportion of transactions which can be fraudulent in relation to the full variety of transactions processed.
      Formulation:Whole Fraudulent Transactions / Whole Transactions × 100

    The preliminary mannequin (score_fraude_modelo) displayed a number of areas for enchancment. A key metric used for analysis was the ROC Curve, as proven beneath:

    ROC Curve:

    • AUC Interpretation: An AUC of 0.73 means that the mannequin has a reasonable means to differentiate between fraudulent and non-fraudulent transactions. The AUC ranges from 0 (no discrimination) to 1 (good discrimination), and a rating of 0.73 is an indication that the mannequin performs higher than random classification however nonetheless has room for enchancment.
    • ROC Curve: The curve exhibits the connection between the True Constructive Fee (sensitivity) and the False Constructive Fee (1 — specificity) at numerous classification thresholds. Whereas the curve is above the diagonal (indicating better-than-random efficiency), a steeper slope would point out higher total efficiency.
    1. Hyperparameter Tuning: By refining the mannequin’s hyperparameters, we will enhance its discriminative energy.
    2. Knowledge Balancing: It’s important to make sure the fraud and non-fraud information are balanced, or make use of methods to deal with imbalances to keep away from mannequin bias.
    3. Characteristic Engineering: Additional work on function transformation or including new options may assist improve mannequin efficiency.

    Efficient preprocessing is essential in machine studying initiatives. For this activity, I used Scikit-learn’s Pipeline to make sure reproducibility and consistency. Key preprocessing steps included:

    • Eradicating irrelevant columns: Columns equivalent to score_fraude_modelo, which served as a baseline mannequin, and data_compra, which added temporal complexity, have been excluded.
    • Dealing with Excessive Cardinality: For columns with excessive cardinality, equivalent to produto, we aggregated much less frequent classes into “Others” to scale back mannequin noise.
    • Coping with Lacking Knowledge: Lacking values within the rating column have been full of the median, whereas lacking values in entrega_doc_2 have been set to zero to indicate “non-delivery.”

    Moreover, encoding methods have been utilized, together with Goal Encoding for high-cardinality variables and One-Scorching Encoding for categorical variables.

    class ColumnDropper(BaseEstimator, TransformerMixin):

    def match(self, X, y=None):
    return self

    def rework(self, X):
    return X.drop(columns=['data_compra', 'produto', 'score_8', 'score_fraude_modelo', 'categoria_produto'], axis = 1)

    class DataProcessor(BaseEstimator, TransformerMixin):

    def match(self, X, y = None):
    return self

    def rework(self, X):
    X_copy = X.copy()

    # Creates the 'was_null' column
    X_copy['was_null'] = X_copy['entrega_doc_2'].isnull().astype(int)

    # Fill nulls with 0
    X_copy['entrega_doc_2'] = X_copy['entrega_doc_2'].fillna('N')
    X_copy['entrega_doc_2'] = X_copy['entrega_doc_2'].apply(lambda x: 1 if x == 'Y' else 0)

    # Processing particular columns
    X_copy['pais'] = X_copy['pais'].apply(lambda x: x if x in ['BR', 'AR'] else 'Outros')
    X_copy['entrega_doc_3'] = X_copy['entrega_doc_3'].apply(lambda x: 1 if x == 'Y' else 0)

    return X_copy

    class ScoreImputer(BaseEstimator, TransformerMixin):

    def __init__(self):
    self.imputers = {}

    def match(self, X, y = None):
    cols = ['score_2', 'score_3', 'score_4', 'score_5', 'score_6', 'score_7', 'score_9', 'score_10']
    for col in cols:
    imputer = SimpleImputer(technique = "median")
    imputer.match(X[[col]])
    self.imputers[col] = imputer
    return self

    def rework(self, X):
    X_copy = X.copy()
    for col, imputer in self.imputers.objects():
    X_copy[col] = imputer.rework(X_copy[[col]])
    return X_copy

    class OneHotFeatureEncoder(BaseEstimator, TransformerMixin):

    def __init__(self):
    self.encoder = OneHotEncoder(sparse_output=False)
    self.cols = ['score_1', 'pais', 'entrega_doc_1', 'entrega_doc_2', 'entrega_doc_3', 'was_null']

    def match(self, X, y=None):
    self.encoder.match(X[self.cols])
    return self

    def rework(self, X):
    onehot_data = self.encoder.rework(X[self.cols])

    # Changing onehot_data array right into a DataFrame
    onehot_df = pd.DataFrame(onehot_data, columns=self.encoder.get_feature_names_out(self.cols))
    onehot_df.index = X.index

    X = X.drop(self.cols, axis=1)
    X = pd.concat([X, onehot_df], axis=1)

    return X

    class KFoldTargetEncoder(BaseEstimator, TransformerMixin):

    def __init__(self):

    self.colnames = 'categoria_produto'
    self.targetName = 'fraude'
    self.n_fold = 5
    self.verbosity = True
    self.discardOriginal_col = False

    def match(self, X, y=None):
    return self

    def rework(self,X):

    assert(kind(self.targetName) == str)
    assert(kind(self.colnames) == str)
    assert(self.colnames in X.columns)
    assert(self.targetName in X.columns)

    mean_of_target = X[self.targetName].imply()
    kf = KFold(n_splits = self.n_fold, shuffle = True, random_state=42)

    col_mean_name = self.colnames + '_' + 'Kfold_Target_Enc'
    X[col_mean_name] = np.nan

    for tr_ind, val_ind in kf.cut up(X):
    X_tr, X_val = X.iloc[tr_ind], X.iloc[val_ind]
    X.loc[X.index[val_ind], col_mean_name] = X_val[self.colnames].map(X_tr.groupby(self.colnames)[self.targetName].imply())

    X[col_mean_name].fillna(mean_of_target, inplace = True)

    if self.verbosity:
    encoded_feature = X[col_mean_name].values

    if self.discardOriginal_col:
    X = X.drop(self.targetName, axis=1)

    return X

    def pipeline(mannequin):

    # Creates the pipeline
    pipe = Pipeline([
    ("dropper", ColumnDropper()),
    ("processor", DataProcessor()),
    ("imputer", ScoreImputer()),
    ("onehot", OneHotFeatureEncoder()),
    ('classifier', model)
    ])

    return pipe

    I initially skilled a number of fashions: Balanced Random Forest, LightGBM, XGBoost, and Resolution Tree. After evaluating their efficiency, I chosen LightGBM and used RandomizedSearchCV to fine-tune hyperparameters. The fashions have been assessed utilizing metrics like log-loss, precision, recall, F1-score, and ROC-AUC.

    Efficiency Comparability (Baseline vs New Mannequin)

    Monetary Metrics (Take a look at Knowledge):

    • Threshold: Decreased from 73 to 57
    • Revenue: Elevated from $80,330 to $86,128
    • Losses: Decreased from $25,353 to $18,070
    • Internet Revenue: Elevated from $54,977 to $68,058

    Confusion Matrix (Take a look at Knowledge):

    • False Negatives: Decreased from 503 to 383
    • Fraud Fee: Remained at 2%, whereas approval charge elevated from 74% to 77%

    Efficiency Metrics:

    • Log Loss: Decreased from 8.6 to 7.3
    • Precision: Improved from 0.13 to 0.17
    • Recall: Improved from 0.67 to 0.75
    • F1-Rating: Elevated from 0.22 to 0.27
    • ROC-AUC: Improved from 0.73 to 0.85

    By calculating the revenue margins, the brand new mannequin yields a month-to-month revenue enhance of roughly $35,906, translating to an annual revenue enhance of $430,872, or a 16% progress in earnings.

    Shifting ahead, the main target can be on making certain the mannequin performs nicely in manufacturing environments and continues to ship enhancements:

    1. Mannequin Validation: Making certain that the mannequin’s efficiency in a laboratory setting aligns with real-world situations.
    2. Actual-Time Processing: Optimizing the mannequin for low-latency environments, probably using cloud-based infrastructure and real-time inference frameworks.

    These steps will assist make sure that the mannequin stays efficient, scalable, and adaptable as fraud detection challenges evolve.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleGoogle unveils ‘mind-boggling’ quantum computing chip
    Next Article Google DeepMind’s new AI model is the best yet at weather forecasting
    Team_AIBS News
    • Website

    Related Posts

    Machine Learning

    Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025

    July 1, 2025
    Machine Learning

    Why PDF Extraction Still Feels LikeHack

    July 1, 2025
    Machine Learning

    🚗 Predicting Car Purchase Amounts with Neural Networks in Keras (with Code & Dataset) | by Smruti Ranjan Nayak | Jul, 2025

    July 1, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    STOP Building Useless ML Projects – What Actually Works

    July 1, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    Why China’s Xiaomi Can Make an Electric Car and Apple Can’t

    March 1, 2025

    Story 11: Introducing SIFT, ORB & Friends – The Superstars of Feature Detection! | by David khaldi | Feb, 2025

    February 10, 2025

    Costco Customers Freak Out About Muffin Price Change

    December 13, 2024
    Our Picks

    STOP Building Useless ML Projects – What Actually Works

    July 1, 2025

    Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025

    July 1, 2025

    The New Career Crisis: AI Is Breaking the Entry-Level Path for Gen Z

    July 1, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.