We have to format the info for the forecasting operate. The important thing columns are:
- Timestamp Column: This represents the time sequence (each day in our case).
- Serie Column: The costumer that the mannequin will verify (we will embody different columns and even not think about any column).
- Goal Column: That is the quantity of each transaction to verify
We will add a Label column (supervised coaching) with identified anomalous knowledge to enhance the accuracy of the mannequin, this column have to be Boolean sort.
CREATE OR REPLACE VIEW LOAD_DATA.TC.VW_CUSTOMER_TERMINAL_TRAINING
AS SELECT
CT.CUSTOMER_ID AS CUSTOMER_ID,
to_timestamp_ntz(CT.TX_DATETIME::DATEAS TX_DATETIME_TS,
CT.TX_AMOUNT
FROM LOAD_DATA.TC.CUSTOMER_TRANSACTIONS CT
JOIN (SELECT CUSTOMER_ID, COUNT(DISTINCT TX_DATETIME::DATE) TX_DATETIME_TS_DATE
FROM LOAD_DATA.TC.CUSTOMER_TRANSACTIONS
WHERE TX_DATETIME::DATE >= '2019-04-01' AND TX_DATETIME::DATE <'2019-06-01'
GROUP BY CUSTOMER_ID
HAVING TX_DATETIME_TS_DATE > 1) DCC ON CT.CUSTOMER_ID = DCC.CUSTOMER_ID
WHERE TX_DATETIME::DATE >= '2019-04-01' AND TX_DATETIME::DATE < '2019-06-01';
GROUP BY 1,2;
CALL MODEL_CUSTOMER_TERMINAL!DETECT_ANOMALIES(
INPUT_DATA => TABLE(VW_CUSTOMER_TERMINAL_TEST),
SERIES_COLNAME => 'CUSTOMER_ID',
TIMESTAMP_COLNAME => 'TX_DATETIME_TS',
TARGET_COLNAME => 'TX_AMOUNT',
CONFIG_OBJECT => {'prediction_interval':0.995}
);
To create the anomaly detection operate, we enter the next:
- INPUT_DATA: The view containing the modeled knowledge.
- SERIES_COLNAME: The column with customer_id.
- TIMESTAMP_COLNAME: The timestamp column.
- TARGET_COLNAME: The transaction quantity knowledge column.
- LABEL_COLUMN: We should combination, if not exists the worth have to be empty.
CREATE OR REPLACE SNOWFLAKE.ML.ANOMALY_DETECTION MODEL_CUSTOMER_TERMINAL(
INPUT_DATA => TABLE(LOAD_DATA.TC.VW_CUSTOMER_TERMINAL_TRAINING),
SERIES_COLNAME => 'CUSTOMER_ID',
TIMESTAMP_COLNAME => 'TX_DATETIME_TS',
TARGET_COLNAME => 'TX_AMOUNT',
LABEL_COLNAME => ''
);
Whit a LARGE warehouse the coaching was carried out in 6:57 minutes for 585.153 rows
We’re gonna verify with take a look at knowledge from similar dataset.
CREATE OR REPLACE VIEW LOAD_DATA.TC.VW_CUSTOMER_TERMINAL_TEST
AS SELECT
CUSTOMER_ID,
to_timestamp_ntz(TX_DATETIME::DATE) AS TX_DATETIME_TS,
TX_AMOUNT
FROM LOAD_DATA.TC.CUSTOMER_TRANSACTIONS
WHERE TX_DATETIME::DATE >= '2019-07-01';
To make use of the mannequin skilled we should use the CALL reserved phrase and move the columns mandatory, we will create an OBJECT that incorporates configuration settings and set prediction_interval key to the share of the observations that must be marked as anomalies, by default the worth related is 0.99 which suggests just one% of the info will marked as anomalies. The worth must be between 0 and 1.
CALL MODEL_CUSTOMER_TERMINAL!DETECT_ANOMALIES(
INPUT_DATA => TABLE(LOAD_DATA.TC.VW_CUSTOMER_TERMINAL_TEST),
SERIES_COLNAME => 'CUSTOMER_ID',
TIMESTAMP_COLNAME => 'TX_DATETIME_TS',
TARGET_COLNAME => 'TX_AMOUNT',
CONFIG_OBJECT => {'prediction_interval':0.995}
);