Identifying Fraudulent Digital Transactions: A Machine Learning Approach | by Henrique Peter

The purpose of this mission is to establish fraudulent digital transactions, which might trigger vital monetary losses for monetary establishments and inconvenience for his or her customers. To realize this, we developed a machine studying mannequin that outperforms a pre-existing mannequin utilizing the identical dataset. By evaluating the efficiency of each fashions, we not solely consider conventional metrics like precision, F1-score, and AUC-ROC but in addition reveal the monetary implications of the mannequin’s effectiveness. This strategy highlights the potential further earnings the establishment may generate by figuring out fraudulent transactions extra successfully.

All through this mission, I employed a number of methods, together with Bivariate Exploratory Knowledge Evaluation (EDA) to generate hypotheses and perceive the connection between variables and fraud. Hyperparameter tuning was finished utilizing MLFlow to take care of the very best management over the exams, and SHAP (SHapley Additive exPlanations) was utilized to elucidate the mannequin’s predictions and establish the variables with essentially the most influence.

If you wish to try the whole code and mission, click on on this link

Monetary Impacts of Fraud

Fraudulent transactions pose a direct and devastating monetary influence on establishments. Within the case of this mission, the monetary establishment earns 10% of every authorised transaction. Nevertheless, when a fraudulent transaction is authorised, the establishment loses 100% of the transaction worth. This creates a situation the place every fraudulent transaction not solely prices the establishment the fee but in addition leads to a complete loss equal to the transaction quantity.

As an illustration, if the establishment processes 100 official transactions of $1,000 every, it earns $10,000. Nevertheless, if a single $1,000 fraudulent transaction is authorised, the loss is $1,000, nullifying the features of 10 official transactions. Thus, efficient fraud detection is important to safeguard the establishment’s monetary well being.

Photograph by Choong Deng Xiang on Unsplash

Mannequin Effectivity and Effectiveness

Fraud Detection Fee (Precision): This measures the share of transactions recognized as fraudulent which can be really fraudulent.
Formulation:True Fraudulent Transactions / Transactions Recognized as Fraudulent × 100
Fraud Protection Fee (Recall): This measures the share of precise fraudulent transactions which can be appropriately recognized by the mannequin.
Formulation:True Fraudulent Transactions Recognized / Whole Fraudulent Transactions × 100
Mannequin Accuracy: The proportion of all transactions (each fraudulent and non-fraudulent) which can be appropriately categorised.
Formulation:Accurately Labeled Transactions / Whole Transactions × 100
False Constructive Fee: The proportion of non-fraudulent transactions incorrectly recognized as fraudulent.
Formulation:False Positives / Whole Non-Fraudulent Transactions × 100
False Damaging Fee: The proportion of fraudulent transactions not detected by the mannequin.
Formulation:False Negatives / Whole Fraudulent Transactions × 100

Monetary KPIs

Whole Worth of Fraud Prevented: This represents the full cash saved by blocking fraudulent transactions.
Formulation:Sum of Values of Blocked Fraudulent Transactions
Price of Undetected Fraud: This represents the cash misplaced on account of undetected fraudulent transactions.
Formulation:Sum of Values of Undetected Fraudulent Transactions
Fraud Fee: The proportion of transactions which can be fraudulent in relation to the full variety of transactions processed.
Formulation:Whole Fraudulent Transactions / Whole Transactions × 100

The preliminary mannequin (score_fraude_modelo) displayed a number of areas for enchancment. A key metric used for analysis was the ROC Curve, as proven beneath:

ROC Curve:

AUC Interpretation: An AUC of 0.73 means that the mannequin has a reasonable means to differentiate between fraudulent and non-fraudulent transactions. The AUC ranges from 0 (no discrimination) to 1 (good discrimination), and a rating of 0.73 is an indication that the mannequin performs higher than random classification however nonetheless has room for enchancment.
ROC Curve: The curve exhibits the connection between the True Constructive Fee (sensitivity) and the False Constructive Fee (1 — specificity) at numerous classification thresholds. Whereas the curve is above the diagonal (indicating better-than-random efficiency), a steeper slope would point out higher total efficiency.

Hyperparameter Tuning: By refining the mannequin’s hyperparameters, we will enhance its discriminative energy.
Knowledge Balancing: It’s important to make sure the fraud and non-fraud information are balanced, or make use of methods to deal with imbalances to keep away from mannequin bias.
Characteristic Engineering: Additional work on function transformation or including new options may assist improve mannequin efficiency.

Efficient preprocessing is essential in machine studying initiatives. For this activity, I used Scikit-learn’s Pipeline to make sure reproducibility and consistency. Key preprocessing steps included:

Eradicating irrelevant columns: Columns equivalent to score_fraude_modelo, which served as a baseline mannequin, and data_compra, which added temporal complexity, have been excluded.
Dealing with Excessive Cardinality: For columns with excessive cardinality, equivalent to produto, we aggregated much less frequent classes into “Others” to scale back mannequin noise.
Coping with Lacking Knowledge: Lacking values within the rating column have been full of the median, whereas lacking values in entrega_doc_2 have been set to zero to indicate “non-delivery.”

Moreover, encoding methods have been utilized, together with Goal Encoding for high-cardinality variables and One-Scorching Encoding for categorical variables.

class ColumnDropper(BaseEstimator, TransformerMixin):def match(self, X, y=None):
return self
def rework(self, X):
return X.drop(columns=['data_compra', 'produto', 'score_8', 'score_fraude_modelo', 'categoria_produto'], axis = 1)
class DataProcessor(BaseEstimator, TransformerMixin):
def match(self, X, y = None):
return self
def rework(self, X):
X_copy = X.copy()
# Creates the 'was_null' column
X_copy['was_null'] = X_copy['entrega_doc_2'].isnull().astype(int)
# Fill nulls with 0
X_copy['entrega_doc_2'] = X_copy['entrega_doc_2'].fillna('N')
X_copy['entrega_doc_2'] = X_copy['entrega_doc_2'].apply(lambda x: 1 if x == 'Y' else 0)
# Processing particular columns
X_copy['pais'] = X_copy['pais'].apply(lambda x: x if x in ['BR', 'AR'] else 'Outros')
X_copy['entrega_doc_3'] = X_copy['entrega_doc_3'].apply(lambda x: 1 if x == 'Y' else 0)
return X_copy
class ScoreImputer(BaseEstimator, TransformerMixin):
def __init__(self):
self.imputers = {}
def match(self, X, y = None):
cols = ['score_2', 'score_3', 'score_4', 'score_5', 'score_6', 'score_7', 'score_9', 'score_10']
for col in cols:
imputer = SimpleImputer(technique = "median")
imputer.match(X[[col]])
self.imputers[col] = imputer
return self
def rework(self, X):
X_copy = X.copy()
for col, imputer in self.imputers.objects():
X_copy[col] = imputer.rework(X_copy[[col]])
return X_copy
class OneHotFeatureEncoder(BaseEstimator, TransformerMixin):
def __init__(self):
self.encoder = OneHotEncoder(sparse_output=False)
self.cols = ['score_1', 'pais', 'entrega_doc_1', 'entrega_doc_2', 'entrega_doc_3', 'was_null']
def match(self, X, y=None):
self.encoder.match(X[self.cols])
return self
def rework(self, X):
onehot_data = self.encoder.rework(X[self.cols])
# Changing onehot_data array right into a DataFrame
onehot_df = pd.DataFrame(onehot_data, columns=self.encoder.get_feature_names_out(self.cols))
onehot_df.index = X.index
X = X.drop(self.cols, axis=1)
X = pd.concat([X, onehot_df], axis=1)
return X
class KFoldTargetEncoder(BaseEstimator, TransformerMixin):
def __init__(self):
self.colnames = 'categoria_produto'
self.targetName = 'fraude'
self.n_fold = 5
self.verbosity = True
self.discardOriginal_col = False
def match(self, X, y=None):
return self
def rework(self,X):
assert(kind(self.targetName) == str)
assert(kind(self.colnames) == str)
assert(self.colnames in X.columns)
assert(self.targetName in X.columns)
mean_of_target = X[self.targetName].imply()
kf = KFold(n_splits = self.n_fold, shuffle = True, random_state=42)
col_mean_name = self.colnames + '_' + 'Kfold_Target_Enc'
X[col_mean_name] = np.nan
for tr_ind, val_ind in kf.cut up(X):
X_tr, X_val = X.iloc[tr_ind], X.iloc[val_ind]
X.loc[X.index[val_ind], col_mean_name] = X_val[self.colnames].map(X_tr.groupby(self.colnames)[self.targetName].imply())
X[col_mean_name].fillna(mean_of_target, inplace = True)
if self.verbosity:
encoded_feature = X[col_mean_name].values
if self.discardOriginal_col:
X = X.drop(self.targetName, axis=1)
return X
def pipeline(mannequin):
# Creates the pipeline
pipe = Pipeline([
("dropper", ColumnDropper()),
("processor", DataProcessor()),
("imputer", ScoreImputer()),
("onehot", OneHotFeatureEncoder()),
('classifier', model)
])
return pipe

I initially skilled a number of fashions: Balanced Random Forest, LightGBM, XGBoost, and Resolution Tree. After evaluating their efficiency, I chosen LightGBM and used RandomizedSearchCV to fine-tune hyperparameters. The fashions have been assessed utilizing metrics like log-loss, precision, recall, F1-score, and ROC-AUC.

Efficiency Comparability (Baseline vs New Mannequin)

Monetary Metrics (Take a look at Knowledge):

Threshold: Decreased from 73 to 57
Revenue: Elevated from $80,330 to $86,128
Losses: Decreased from $25,353 to $18,070
Internet Revenue: Elevated from $54,977 to $68,058

Confusion Matrix (Take a look at Knowledge):

False Negatives: Decreased from 503 to 383
Fraud Fee: Remained at 2%, whereas approval charge elevated from 74% to 77%

Efficiency Metrics:

Log Loss: Decreased from 8.6 to 7.3
Precision: Improved from 0.13 to 0.17
Recall: Improved from 0.67 to 0.75
F1-Rating: Elevated from 0.22 to 0.27
ROC-AUC: Improved from 0.73 to 0.85

By calculating the revenue margins, the brand new mannequin yields a month-to-month revenue enhance of roughly $35,906, translating to an annual revenue enhance of $430,872, or a 16% progress in earnings.

Shifting ahead, the main target can be on making certain the mannequin performs nicely in manufacturing environments and continues to ship enhancements:

Mannequin Validation: Making certain that the mannequin’s efficiency in a laboratory setting aligns with real-world situations.
Actual-Time Processing: Optimizing the mannequin for low-latency environments, probably using cloud-based infrastructure and real-time inference frameworks.

These steps will assist make sure that the mannequin stays efficient, scalable, and adaptable as fraud detection challenges evolve.

Source link

Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025

Why PDF Extraction Still Feels LikeHack

🚗 Predicting Car Purchase Amounts with Neural Networks in Keras (with Code & Dataset) | by Smruti Ranjan Nayak | Jul, 2025

STOP Building Useless ML Projects – What Actually Works

I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

Amazon and eBay to pay ‘fair share’ for e-waste recycling

Artificial Intelligence Concerns & Predictions For 2025

Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

Most Popular

Why China’s Xiaomi Can Make an Electric Car and Apple Can’t

Story 11: Introducing SIFT, ORB & Friends – The Superstars of Feature Detection! | by David khaldi | Feb, 2025

Costco Customers Freak Out About Muffin Price Change

Our Picks

STOP Building Useless ML Projects – What Actually Works

Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025

The New Career Crisis: AI Is Breaking the Entry-Level Path for Gen Z

Identifying Fraudulent Digital Transactions: A Machine Learning Approach | by Henrique Peter | Dec, 2024

Monetary Impacts of Fraud

Mannequin Effectivity and Effectiveness

Monetary KPIs

Efficiency Comparability (Baseline vs New Mannequin)

Related Posts