One model to predict them all (failures that is)
Having done various predictive maintenance scenarios over the course of 3 years, I noticed two common pitfalls. First, the lack of business understanding and the cost of being right vs the cost of being wrong (I will write about that later). Second, the idea of a single model that can predict it all, under all conditions, no matter what. The latter I often see when working novice Data Scientist. They will spend most of their time re-training their model to get their perfect F1 or accuracy metric instead of knowing when to stop and re-think strategy. This problem gets even more severe when you deal with very imbalanced datasets.
Why does this matter?
Each device and each sensor will have its unique data footprint with its noise distribution. If your pool is sufficiently large, you will be able to detect generic trends and make generalised predictions. However, you might have lost the subtle differences between machines and thus lost predictive powers. Maybe if you performed some clustering analyses ab-initio you could have decided that 3 or 5 models would have served your problem much better. Or perhaps if you just already know if your data is an anomaly you can drive business value.
What could be a solution?
As always, there is no single best solution out there. There will still be a trade-off to get a model generalised sufficiently to bring to production. What I did observe so-far, at several customers, is that a combination of weak-learners is outperforming most of these highly specialised models. This effect is often even stronger when we take these “weak” models into field trials and or production. Odds are, your training data never was complete, and thus you instead have the flexibility to deal with this incompleteness by having weak, but multiple, models.