A giant a part of making ML work in manufacturing is monitoring predictions — not simply whether or not the service is up, however what the mannequin is doing.
We carried out the next light-weight however efficient monitoring stack utilizing native GCP companies:
All incoming prediction requests have been captured utilizing Cloud Logging, together with:
- Enter payloads (options despatched to the mannequin)
- Mannequin responses
- Prediction time (latency)
These logs have been structured to make downstream evaluation simpler.
Utilizing Log Sinks, we routed Cloud Logging information immediately right into a BigQuery dataset. This gave us a queryable historical past of all inference occasions.
What we gained:
- Efficiency development evaluation: How latency is trending over time.
- Knowledge drift monitoring: Evaluating current inputs towards coaching information distributions.
- End result comparability: Matching mannequin predictions to precise floor fact labels (the place obtainable) for evaluating accuracy in manufacturing.
We scheduled BigQuery queries to compute:
- Each day accuracy metrics
- Distribution adjustments in enter options
- Alert thresholds (e.g., accuracy under 85%, enter schema deviation, sudden spike in latency)
With Cloud Monitoring, we created alerting insurance policies tied to those outputs — all with out deploying any customized monitoring agent.
We had no must spin up Prometheus, Grafana, or a separate information pipeline — all the things was achieved with managed companies.