Offline inference – Artificial Intelligence terminology

Offline inference Machine Learning; MLOps

noun phrase

Definition: An inference mode in which a trained model generates predictions in advance for a set or batch of unlabeled examples, after which those predictions are cached or stored for later retrieval rather than computed on demand at serving time [Google ML Glossary].

Example in context: “As the latency objective is highly relaxed, maximizing throughput is the primary goal in offline inference.” [Han et al. 2024]

Synonyms: static inference; batch inference

Related terms: online inference; dynamic inference; model serving; cached predictions; batch prediction