Feast is designed to handle and serve options for machine studying fashions. Within the mortgage approval challenge, the characteristic retailer handles historic and real-time information for mortgage candidates. Listed here are the important thing parts outlined within the challenge’s feature_repo/
listing:
Entities are the first keys that uniquely establish information in your information. Within the challenge:
- zipcode: An
Entity
withValueType.INT64
, representing a zipcode for location-based options (e.g., metropolis, inhabitants).
zipcode = Entity(title="zipcode", value_type=ValueType.INT64)
- dob_ssn: A composite key (
ValueType.STRING
) combining date of beginning and the final 4 digits of a social safety quantity for credit score historical past options.
dob_ssn = Entity(
title="dob_ssn",
value_type=ValueType.STRING,
description="Date of beginning and final 4 digits of social safety quantity",
)
Knowledge sources outline the place uncooked characteristic information is saved. The challenge makes use of FileSource
with Parquet information:
- zipcode_source: Factors to
information/zipcode_table.parquet
for location-based options like metropolis, state, and inhabitants.
zipcode_source = FileSource(
title="Zipcode supply",
path="information/zipcode_table.parquet",
file_format=ParquetFormat(),
timestamp_field="event_timestamp",
created_timestamp_column="created_timestamp",
)
- credit_history_source: Factors to
information/credit_history.parquet
for credit-related options like missed funds and bankruptcies.
credit_history_source = FileSource(
title="Credit score historical past",
path="information/credit_history.parquet",
file_format=ParquetFormat(),
timestamp_field="event_timestamp",
created_timestamp_column="created_timestamp",
)
Characteristic views group options related to an entity and specify their information supply and TTL (time-to-live). The challenge defines:
- zipcode_features: Contains options like
metropolis
,state
,tax_returns_filed
, andinhabitants
tied to thezipcode
entity, with a 10-year TTL.
zipcode_features = FeatureView(
title="zipcode_features",
entities=[zipcode],
ttl=timedelta(days=3650),
schema=[
Field(name="city", dtype=String),
Field(name="state", dtype=String),
Field(name="location_type", dtype=String),
Field(name="tax_returns_filed", dtype=Int64),
Field(name="population", dtype=Int64),
Field(name="total_wages", dtype=Int64),
],
supply=zipcode_source,
)
- credit_history: Contains credit-related options like
credit_card_due
,mortgage_due
, andbankruptcies
tied to thedob_ssn
entity, with a 90-day TTL.
credit_history = FeatureView(
title="credit_history",
entities=[dob_ssn],
ttl=timedelta(days=90),
schema=[
Field(name="credit_card_due", dtype=Int64),
Field(name="mortgage_due", dtype=Int64),
# ... other fields
Field(name="bankruptcies", dtype=Int64),
],
supply=credit_history_source,
)
On-demand characteristic views compute options at prediction time. The challenge defines total_debt_calc
to calculate the entire debt by summing credit-related dues and the requested mortgage quantity.
@on_demand_feature_view(
sources=[credit_history, input_request],
schema=[Field(name='total_debt_due', dtype=Float64)],
mode="pandas",
)
def total_debt_calc(features_df: pd.DataFrame) -> pd.DataFrame:
df = pd.DataFrame()
df['total_debt_due'] = (
features_df['credit_card_due'] + refreshments_df['mortgage_due'] +
features_df['student_loan_due'] + features_df['vehicle_loan_due'] +
features_df['loan_amnt']
).astype(float)
return df
The input_request
is a RequestSource
for user-provided inputs like loan_amnt
:
input_request = RequestSource(
title="application_data",
schema=[Field(name='loan_amnt', dtype=Int64)]
)
The feature_store.yaml
configures the infrastructure:
- Registry: PostgreSQL (
postgresql+psycopg://postgres@localhost:5432/feast
) shops metadata about entities and have views. - On-line Retailer: Redis (
localhost:6379
) serves real-time options for inference. - Offline Retailer: DuckDB handles historic information for coaching.
- Supplier: Set to
native
for native improvement.
challenge: credit_scoring_local
registry:
registry_type: sql
path: postgresql+psycopg://postgres@localhost:5432/feast
cache_ttl_seconds: 60
supplier: native
online_store:
kind: redis
redis_type: redis
connection_string: "localhost:6379"
offline_store:
kind: duckdb