Tribuo: Machine Learning with Provenance in Java

Tribuo: Machine Learning with Provenance in Java

Adam Pocock

15 October 2021

Machine Learning models are deployed across a wide range of industries, performing a wide range of tasks. Tracking these models and ensuring they behave appropriately is be- coming increasingly difficult as the number of models increases. Current ML monitoring systems provide provenance and tracking by layering on top of the library that performs the ML computation, allowing room for developer confusion and mistakes. In this paper we introduce Tribuo, a Java ML library which integrates model training, inference, strong type-safety, runtime checking, and automatic provenance recording into a single framework. All Tribuo’s models and evaluations record the full data pipeline of training and testing data, along with the training algorithms, hyperparameters and data transformation steps automatically. This data lives inside the model object and can be persisted separately using common markup formats. Tribuo implements many popular ML algorithms for classification, regression, clustering, multi-label classification and anomaly detection, along with interfaces to XGBoost, TensorFlow and ONNX Runtime. Tribuo’s source code is available at https://github.com/oracle/tribuo under an Apache 2.0 license with documentation and tutorials available at https://tribuo.org.


Venue : ArXiv

File Name : tribuo-jmlr.pdf