Predictive Maintenance with IoT Data: From Pilot to Production
How a Dutch logistics company went from 40+ unplanned equipment failures per year to 4 — and what the technical journey looked like from sensor integration to deployed ML models.
In 2024, a Dutch logistics company was experiencing 43 unplanned equipment failures per year across its forklift fleet — at an average cost of €8,400 per incident in parts, labour, and lost productivity. By the end of 2025, that number was four. Here is what the technical journey actually looked like.
The Starting Point
The company had 28 forklifts across three warehouses. Each vehicle had OBD-II telemetry (engine hours, throttle position, fault codes), temperature sensors, battery state monitoring, and RFID-based location tracking. Telemetry was logged to a local server and periodically exported to Azure Blob Storage in CSV format. The data existed — the problem was it had never been used for anything beyond reactive fault code alerts.
A common assumption at the start of IoT ML projects is that the hard part is the modelling. It is not. In this case, as in most cases, the hard part was data quality.
Phase 1: Data Audit (Weeks 1–4)
We started by characterising what data actually existed rather than what systems were supposed to produce. Findings: 18 months of usable telemetry (two machines had been replaced, breaking historical continuity); approximately 40% sensor gaps across the fleet from connectivity issues and maintenance windows; inconsistent timestamp formats between two different firmware versions; and no structured record of past failure events beyond Excel logs maintained by the maintenance supervisor.
Digitising the Excel maintenance logs took two weeks. This was the most valuable output of the discovery phase — without labelled failure events, there was no target variable for a supervised model. The lesson: your historical maintenance records are your most important ML asset, and they are almost always in a spreadsheet managed by one person.
Phase 2: Feature Engineering (Weeks 5–8)
Raw telemetry values are weak predictors of failure. What predicts failure is patterns in those values over time. We built: rolling averages and standard deviations of key metrics over 4-hour and 24-hour windows; change rates for battery discharge and temperature; operating hours since last maintenance per component; and interaction terms between load intensity and operating temperature (a known predictor of hydraulic failure in this equipment type).
We ended with 84 features per vehicle per hour. Feature selection using SHAP values reduced this to 31 features with meaningful predictive value without significant accuracy loss.
Phase 3: The Model (Weeks 9–12)
The target variable was binary: does this vehicle experience an unplanned failure within the next 14 days? We evaluated gradient boosting (XGBoost), survival analysis (Cox Proportional Hazards for time-to-failure), and a simple threshold-based rule system as a baseline.
XGBoost with 5-fold cross-validation on time-based splits achieved 87% precision and 79% recall at a 40% probability threshold. More importantly: at this threshold, the false positive rate was approximately 2 per month (acceptable — a maintenance check costs €150), and the false negative rate was 1.2 failures per year that went unpredicted (significantly better than 43).
Phase 4: Deployment (Weeks 13–16)
We deployed the model as a scheduled batch job running nightly via Azure Functions. Output: a daily email to the maintenance supervisor listing vehicles with predicted failure probability above 40%, with the top contributing features for each vehicle. No dashboard, no app — the simplest possible delivery mechanism for the audience.
We also built monitoring: tracking the distribution of predicted probabilities over time to detect model drift, and a simple feedback loop where the maintenance supervisor marks predictions as correct or incorrect — labelled data for the next retraining cycle.
Results After 12 Months
- 4 unplanned failures (down from 43 in 2024)
- €327,000 reduction in failure-related costs
- Average false positive rate: 1.8 per month (flagging a vehicle for inspection that did not fail)
- 2 unpredicted failures — both linked to sudden mechanical damage unrelated to the monitored features
- Model retrained twice with new labelled data, improving recall from 79% to 84%
What We Would Do Differently
Start the data audit earlier and budget longer for it. In hindsight, six weeks for the data work would have been more comfortable than four. The pressure to reach modelling sooner is real but counterproductive — the model is entirely bounded by data quality, and a rushed data audit creates technical debt that slows every subsequent phase.
The ML model took two weeks to build. The data work took six weeks. This ratio is typical and useful to communicate to stakeholders before a project begins.