Using Supervised Anomaly Detection Algorithms to Localize Anomalies in Unlabeled Time Series Training Data

Using Supervised Anomaly Detection Algorithms to Localize Anomalies in Unlabeled Time Series Training Data

Guang Wang, Matthew Gerdes

06 August 2025

Machine learning techniques are increasingly being used in the area of multivariate time series anomaly detection. However, their effectiveness—particularly for supervised approaches—is often limited by the scarcity of labeled training data. Identifying anomalies in the unlabeled training data is very challenging without the knowledge of subject matter experts. Therefore, researchers usually assume that anomalies in training data are sparse enough to be negligible—an assumption often violated in real-world scenarios. Conventional attempts to remove anomalies in the training data are based on simplistic outlier detection methods, such as three standard deviation thresholds, or methods intended for univariate analysis, which inadequately handles complex multivariate data. This paper introduces a preprocessing method designed to enhance existing supervised anomaly detection models. Our method employs an existing supervised algorithm to localize faults in unlabeled multivariate training data through a recursive process of partitioning and fault inferencing, progressively narrowing down faults to smaller regions and thereby benefiting supervised detection tasks in multivariate time series. The method is positioned as an ancillary technique intended to benefit a broad set of existing supervised anomaly detection algorithms, as opposed to a standalone anomaly detection technique. The capability of our method is demonstrated on both synthetic and published datasets where the labeled ground-truth defects are available. We show how our method improves the supervised model with the unlabeled training data, resulting in greater anomaly detection accuracy.


Venue : ACM KDD 2025, Toronto, ON, Canada

File Name : KDD_Workshop_2025_RFLP_submission.pdf